Last updated: April 19, 2026
Application No. 17/210,644
Artificial Intelligence Processor Architecture For Dynamic Scaling Of Neural Network Quantization

Non-Final OA §103
Filed
Mar 24, 2021
Examiner
JAYAKUMAR, CHAITANYA R
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
Qualcomm Incorporated
OA Round
3 (Non-Final)
Interview Optional

— +22.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 51 resolved cases, 2023–2026
Examiner Intelligence

JAYAKUMAR, CHAITANYA R View full profile →
Grants only 26% of cases
Career Allow Rate
13 granted / 51 resolved
-29.5% vs TC avg
Strong +22% interview lift
Without
With
+22.5%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
18 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
29.1%
-10.9% vs TC avg
§103
45.6%
+5.6% vs TC avg
§102
8.7%
-31.3% vs TC avg
§112
13.8%
-26.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 51 resolved cases
Office Action

§103
DETAILED ACTION
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12th August 2025  has been entered.
 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This action is in response to the submission filed 12th August 2025  for application 17/210,644.
Currently claims 1, 10, 19, and 28 are amended. Claims 3, 8, 9, 12, 17, 18, 21, 26, 27, and 30 are cancelled. Claims 1, 2, 4-7, 10, 11,  13-16, 19, 20, 22-25, 28, and 29 are pending and have been examined.

Information Disclosure Statement
An information disclosure statement (IDS) was submitted on 15 October 2025. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. 

	

	Response to Arguments
Applicant’s arguments, see pages 8-11, filed 10 July 2025, with respect to the newly amended features as recited in independent claim 1 (and similarly in independent claims 10, 19, and 28), Applicant specifically argues on Page 10 (Paragraph 3) that Wang and Gross fails to describe a method comprising, at least "determining an Al quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the Al processor and throughput of the Al processor" and "dynamically adjusting an Al quantization level for a segment of a neural network in response to the QoS value" as recited in amended claim 1. Gross does not cure the deficiencies of Wang. For at least the reasons discussed above, Applicant respectfully submits that the combination of Wang and Gross fail to disclose all features of claim 1. Therefore, it is respectfully submitted that claim 1 is in condition for allowance. While differing in scope, independent claims 10, 19 and 28 have been amended to recite features that are similar to distinguishing features of claim 1 discussed above. Therefore, it is respectfully submitted that claim 10, 19 and 28 is also in condition for allowance for at least the same reasons. 
Examiner’s Response: Applicant’s arguments have been fully considered but are moot because the new ground of rejection (citing new reference Moazzemi et al (Trends in On-Chip Dynamic Resource Management, 2018) for teaching the newly amended limitations) does not rely on any reference combination applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Applicant’s arguments, see page 11, with respect to the dependent claims, Applicant respectfully submits that pursuant to 35 U.S.C. § 112(d), the dependent claims incorporate by reference all the limitations of the claim to which they refer and include their own patentable features, and are therefore in condition for allowance. Furthermore, although Applicant has not discussed the specific rejections to all of the dependent claims, Applicant does not necessarily agree with the characterizations of the prior art made by the Office. Moreover, because each dependent claim includes its own patentable features, individual consideration of each on its own merits is respectfully requested.
Examiner’s Response: Applicant’s arguments have been fully considered but they are not persuasive because the dependent claims depend from one of the independent claims 1, 10, 19, or 28 and the new combination of cited references teach every element of the amended claims as shown below and explained above. Also, the applicant fails to provide which limitation of which dependent claim applicant does not agree with and any reasons or explanations as to why the applicant does not agree. Hence, the claims are rejected as well.

	

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


Claims 1, 2, 4-6, 28, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al (HAQ: Hardware-Aware Automated Quantization with Mixed Precision, 2019) in view of Gross et al (US 8164434 B2) and further in view of Moazzemi et al (Trends in On-Chip Dynamic Resource Management, 2018).
Regarding claim 1
Wang teaches: dynamically adjusting an Al quantization level for a segment of a neural network ([Page 8613, Column 2, Paragraph 2] To this end, we propose the Hardware-Aware Automated Quantization (HAQ) framework that leverages reinforcement learning to automatically predict the quantization policy given the hardware’s feedback. The RL agent decides the bitwidth of a given neural network in a layer-wise manner. For each layer, the agent receives the layer configuration and statistics as observation, and it then outputs the action which is the bitwidth of weights and activations. We then leverage the hardware accelerator as the environment to obtain the direct feedback from hardware to guide the RL agent to satisfy the resource constraints. [Page 8615] Resource Constraints. In real-world applications, we have limited computation budgets (i.e., latency, energy, and model size). We would like to find the quantization policy with the best performance given the constraint. [Page 8619, Column 2, Paragraph 1] Our framework succeeds in learning to adjust its bitwidth policy under different constraints. Note: Automatically predict given the hardware's feedback corresponds to dynamically adjusting. Layer-wise manner corresponds to a segment of the neural network.);
and processing the segment of the neural network using the adjusted Al quantization level ([Page 8613, Column 2, Paragraph 2] After all layers are quantized, we finetune the quantized model for one more epoch, and feed the validation accuracy after short-term retraining as the reward signal to our RL agent).
However, Wang is not relied upon to teach: A method comprising: receiving temperature data associated with a temperature of an artificial intelligence (Al) processor; determining an AI quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the AI processor and a throughput of the AI processor; and adjust in response to the Qos value.
Gross teaches, in an analogous system: A method comprising: receiving temperature data associated with a temperature of an artificial intelligence (Al) processor ([Column 2, Lines 20-22] During operation, the electronic device receives temperature measurements from thermal sensors in the computer system. [Column 5, Lines 47-55] Referring back to FIG. 1A, analysis device 124 then validates the telemetry data (such as the temperature and/or fan-speed measurements) using a pattern-recognition model with the telemetry data (e.g., temperature and/or fan-speed measurements) as inputs. In particular, the pattern-recognition model may be a nonlinear, nonparametric regression model, such as a multivariate state estimation technique (MSET) and/or a kernel regression model. Note that MSET refers to a class of pattern-recognition techniques. [Column 5, Lines 63-67] Hence, the term MSET as used in this specification can refer to (among other things) any technique outlined in Gribok et al., including: ordinary least squares (OLS), support vector machines (SVM), artificial neural networks (ANNs), MSET, or regularized MSET (RMSET));
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Wang to incorporate the teachings of Gross to use temperature data associated with a temperature of an artificial intelligence (Al) processor. One would have been motivated to do this modification because doing so would give the benefit of validating at least some of telemetry data 440 using one or more pattern-recognition models as taught by Gross [Column 11, Lines 7 and 8].
Moazzemi teaches, in an analogous system: determining an AI quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the AI processor and a throughput of the AI processor ([Abstract, Paragraph 1] We first cover heuristic and optimization methods used to manage resources such as power, energy, temperature, Qualityof-Service (QoS) and reliability of the system. [Page 63, Column 2, Section B] Temperature aware scheduling for multi-threaded processors can reduce hot spots. [Page 64, Column 1, Paragraph 2] We abstractly classify QoS management techniques as performance-bound and/or accuracy-bound, as shown in Figure 2. [Page 64, Column 2, Paragraph 3] Under workload diversity, smart co-location - scheduling an optimized combination of latency and throughput sensitive applications together, exploits under-utilized resources to satisfy QoS of both types of applications. Note: Also see Figure 1 showing QoS and temperature. See Fig. 2 showing QoS and accuracy);
adjust in response to the QoS value ([Page 63, Column 2, Last Paragraph] Runtime QoS management becomes necessary and challenging [Page 64, Column 1, Paragraph 1] with i) variable workload characteristics ii) variable QoS requirements of applications, iii) identification and translation of QoS metrics into system level parameters for provisioning and iv) resource contention and arbitration among concurrent applications. Meeting QoS requirements of applications are largely based on: • the nature of computation - compute, memory and I/O intensity, streaming inputs and batch processing • the nature of end result - numerical, perceptive, soft and hard real-time, and user-interaction. We abstractly classify QoS management techniques as performance-bound and/or accuracy-bound, as shown in Figure 2. We present major underlying approaches and strategies for performance bound QoS guarantees through provisioning compute, memory and network bandwidth resources and accuracy-bound QoS through quality monitoring and control. Note: Run-time QoS management and Fig. 2 showing  calibration corresponds to adjusting).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Wang and Gross to incorporate the teachings of Moazzemi to determine an AI quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the AI processor and a throughput of the AI processor and adjust in response to the QoS value. One would have been motivated to do this modification because doing so would give the benefit of classify QoS management techniques as performance-bound and/or accuracy-bound as taught by Moazzemi [Page 64, Column 1, Paragraph 2].


Regarding claim 2
The system of Wang, Gross, and Moazzemi teaches: The method of claim 1 (as shown above).
Wang further teaches: wherein dynamically adjusting the Al quantization level for the segment of the neural network comprises: increasing the Al quantization level in response to an operating condition information indicating a level that increased constraint of a processing ability of the Al processor, and decreasing the Al quantization level in response to operating condition information indicating a level that decreased constraint of the processing ability of the Al processor ([Page 8619, Column 1, Last Paragraph] Discussions. In Figure 5, we visualize the bitwidth allocation strategy for MobileNet-V2. From this figure, we can observe that our framework assigns more bitwidths to the weights in depthwise convolution layers than pointwise convolution layers. Intuitively, this is because the number of parameters in the former is much smaller than the latter. Comparing Figure 4 and Figure 5, the policies are drastically [Page 8619, Column 2, Paragraph 1] different under different optimization objectives (fewer bitwiths for depthwise convolutions under latency optimization, more bitwidths for depthwise convolutions under model size optimization). Our framework succeeds in learning to adjust its bitwidth policy under different constraints).
However, Wang is not relied upon to teach: the temperature data.
Gross teaches, in an analogous system: the temperature data ([Column 2, Lines 20-22] During operation, the electronic device receives temperature measurements from thermal sensors in the computer system).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Wang to incorporate the teachings of Gross to use temperature data. One would have been motivated to do this modification because doing so would give the benefit of validating the temperature measurements using a pattern-recognition model based at least on the temperature measurements as taught by Gross [Column 2, Lines 24-26].

	
Regarding claim 4
The system of Wang, Gross, and Moazzemi teaches: The method of claim 1 (as shown above).
Wang further teaches: wherein dynamically adjusting the Al quantization level for the segment of the neural network comprises adjusting the Al quantization level for quantizing weight values to be processed by the segment of the neural network ([Page 8614, Column 1, Last but one Paragraph] Our framework further explores the automated quantization for network weights and activations, and it takes the hardware architectures into consideration. [Page 8615, Column 2, Section 3.4, Paragraph 2] Specifically, for each weight value w in the kth layer, we first truncate it into the range of [−c, c], and we then quantize it linearly into ak bits).

Regarding claim 5
The system of Wang, Gross, and Moazzemi teaches: The method of claim 1 (as shown above).
Wang further teaches: wherein dynamically adjusting the Al quantization level for the segment of the neural network comprises adjusting the Al quantization level for quantizing activation values to be processed by the  segment of the neural network ([Page 8614, Column 1, Last but one Paragraph] Our framework further explores the automated quantization for network weights and activations, and it takes the hardware architectures into consideration).

Regarding claim 6
The system of Wang, Gross, and Moazzemi teaches: The method of claim 1 (as shown above).
Wang further teaches: wherein dynamically adjusting the Al quantization level for the segment of the neural network comprises adjusting the Al quantization level for quantizing weight values and activation values to be processed by the segment of the neural network ([Page 8614, Column 1, Last but one Paragraph] Our framework further explores the automated quantization for network weights and activations, and it takes the hardware architectures into consideration. [Page 8615, Column 2, Section 3.4, Paragraph 1] We linearly quantize the weights and activations of each layer).

Regarding claim 28
Wang teaches: means for dynamically adjusting an Al quantization level for a segment of a neural network ([Page 8613, Column 2, Paragraph 2] To this end, we propose the Hardware-Aware Automated Quantization (HAQ) framework that leverages reinforcement learning to automatically predict the quantization policy given the hardware’s feedback. The RL agent decides the bitwidth of a given neural network in a layer-wise manner. For each layer, the agent receives the layer configuration and statistics as observation, and it then outputs the action which is the bitwidth of weights and activations. We then leverage the hardware accelerator as the environment to obtain the direct feedback from hardware to guide the RL agent to satisfy the resource constraints. [Page 8615] Resource Constraints. In real-world applications, we have limited computation budgets (i.e., latency, energy, and model size). We would like to find the quantization policy with the best performance given the constraint. [Page 8619, Column 2, Paragraph 1] Our framework succeeds in learning to adjust its bitwidth policy under different constraints. Note: Automatically predict given the hardware's feedback corresponds to dynamically adjusting. Layer-wise manner corresponds to a segment of the neural network);
and means for processing the segment of the neural network using the adjusted Al quantization level ([Page 8613, Column 2, Paragraph 2] After all layers are quantized, we finetune the quantized model for one more epoch, and feed the validation accuracy after short-term retraining as the reward signal to our RL agent).
However, Wang is not relied upon to teach: An artificial intelligence (AI) processor, comprising: means for receiving AI processor temperature data associated with operation of the AI processor; means for determining an AI quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the AI processor and a throughput of the AI processor; and adjusting in response to the QoS value.
Gross teaches, in an analogous system: An artificial intelligence (AI) processor, comprising: means for receiving AI processor temperature data associated with operation of the AI processor ([Column 2, Lines 20-22] During operation, the electronic device receives temperature measurements from thermal sensors in the computer system. [Column 5, Lines 47-55] Referring back to FIG. 1A, analysis device 124 then validates the telemetry data (such as the temperature and/or fan-speed measurements) using a pattern-recognition model with the telemetry data (e.g., temperature and/or fan-speed measurements) as inputs. In particular, the pattern-recognition model may be a nonlinear, nonparametric regression model, such as a multivariate state estimation technique (MSET) and/or a kernel regression model. Note that MSET refers to a class of pattern-recognition techniques. [Column 5, Lines 63-67] Hence, the term MSET as used in this specification can refer to (among other things) any technique outlined in Gribok et al., including: ordinary least squares (OLS), support vector machines (SVM), artificial neural networks (ANNs), MSET, or regularized MSET (RMSET));
adjusting in response to the temperature data ([Column 11, Lines 4-6] Then, optional resampling module 432 may resample and/or de-quantize at least some of telemetry data 440, such as temperature measurements).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Wang to incorporate the teachings of Gross to use an artificial intelligence (AI) processor, comprising: means for receiving AI processor temperature data associated with operation of the AI processor and adjusting in response to the temperature data. One would have been motivated to do this modification because doing so would give the benefit of validating at least some of telemetry data 440 using one or more pattern-recognition models as taught by Gross [Column 11, Lines 7 and 8].
Moazzemi teaches, in an analogous system: means for determining an AI quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the AI processor and a throughput of the AI processor ([Abstract, Paragraph 1] We first cover heuristic and optimization methods used to manage resources such as power, energy, temperature, Qualityof-Service (QoS) and reliability of the system. [Page 63, Column 2, Section B] Temperature aware scheduling for multi-threaded processors can reduce hot spots. [Page 64, Column 1, Paragraph 2] We abstractly classify QoS management techniques as performance-bound and/or accuracy-bound, as shown in Figure 2. [Page 64, Column 2, Paragraph 3] Under workload diversity, smart co-location - scheduling an optimized combination of latency and throughput sensitive applications together, exploits under-utilized resources to satisfy QoS of both types of applications. Note: Also see Figure 1 showing QoS and temperature. See Fig. 2 showing QoS and accuracy);
adjusting in response to the QoS value ([Page 63, Column 2, Last Paragraph] Runtime QoS management becomes necessary and challenging [Page 64, Column 1, Paragraph 1] with i) variable workload characteristics ii) variable QoS requirements of applications, iii) identification and translation of QoS metrics into system level parameters for provisioning and iv) resource contention and arbitration among concurrent applications. Meeting QoS requirements of applications are largely based on: • the nature of computation - compute, memory and I/O intensity, streaming inputs and batch processing • the nature of end result - numerical, perceptive, soft and hard real-time, and user-interaction. We abstractly classify QoS management techniques as performance-bound and/or accuracy-bound, as shown in Figure 2. We present major underlying approaches and strategies for performance bound QoS guarantees through provisioning compute, memory and network bandwidth resources and accuracy-bound QoS through quality monitoring and control. Note: Run-time QoS management and Fig. 2 showing  calibration corresponds to adjusting).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Wang and Gross to incorporate the teachings of Moazzemi to determine an AI quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the AI processor and a throughput of the AI processor and adjusting in response to the QoS value. One would have been motivated to do this modification because doing so would give the benefit of classify QoS management techniques as performance-bound and/or accuracy-bound as taught by Moazzemi [Page 64, Column 1, Paragraph 2].


Regarding claim 29
The system of Wang, Gross, and Moazzemi teaches: The Al processor of claim 28 (as shown above).
Wang further teaches: wherein means for dynamically adjusting the Al quantization level for the segment of the neural network comprises: means for increasing the Al quantization level in response to indicating a level of that increased constraint of a processing ability of the Al processor, and means for decreasing the Al quantization level in response to indicating a level of that decreased constraint of the processing ability of the Al processor ([Page 8619, Column 1, Last Paragraph] Discussions. In Figure 5, we visualize the bitwidth allocation strategy for MobileNet-V2. From this figure, we can observe that our framework assigns more bitwidths to the weights in depthwise convolution layers than pointwise convolution layers. Intuitively, this is because the number of parameters in the former is much smaller than the latter. Comparing Figure 4 and Figure 5, the policies are drastically [Page 8619, Column 2, Paragraph 1] different under different optimization objectives (fewer bitwiths for depthwise convolutions under latency optimization, more bitwidths for depthwise convolutions under model size optimization). Our framework succeeds in learning to adjust its bitwidth policy under different constraints).
However, Wang is not relied upon to teach: the temperature data.
Gross teaches, in an analogous system: the temperature data ([Column 2, Lines 20-22] During operation, the electronic device receives temperature measurements from thermal sensors in the computer system).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the artificial intelligence processor of Wang to incorporate the teachings of Gross to use temperature data. One would have been motivated to do this modification because doing so would give the benefit of validating the temperature measurements using a pattern-recognition model based at least on the temperature measurements as taught by Gross [Column 2, Lines 24-26].

Claims 7, 10, 11, 13-16, 19, 20, and 22-25 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al (HAQ: Hardware-Aware Automated Quantization with Mixed Precision, 2019) in view of Gross et al (US 8164434 B2) and Moazzemi et al (Trends in On-Chip Dynamic Resource Management, 2018) and further in view of Turakhia et al (US 20180164866 A1).

Regarding claim 7
The system of Wang, Gross, and Moazzemi teaches: The method of claim 1 (as shown above).
Wang further teaches: wherein: the Al quantization level is configured to indicate dynamic bits of a value to be processed by the neural network to quantize ([Page 8614, Column 2, Section 3] We model the quantization task as a reinforcement learning problem (Figure 2). We use the actor-critic model with DDPG agent to give the action: bits for each layer); 
and processing the segment of the neural network using the adjusted Al quantization level ([Page 8614, Column 2, Section 3.1] Our agent processes the neural network in a layer-wise manner. [Page 3, Column 1, Paragraph 1] Table 1. Comparison of ImageNet validation accuracy among different rounding schemes for 4-bit quantization of the first layer of Resnet18. Note: 4-bit quantization corresponds to the Al quantization level. Each layer corresponds to a segment of the neural network).
However, the system of Wang, Gross, and Moazzemi does not explicitly disclose: bypassing portions of a multiplier accumulator (MAC) associated with the dynamic bits of the value.
Turakhia teaches, in an analogous system: bypassing portions of a multiplier accumulator (MAC) associated with the dynamic bits of the value ([0015] FIG. 5 is a diagram illustrating an example of a modified multiplier-accumulator unit that bypasses the multiplier and adder when at least one of the operands for the multiplier is zero).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Wang, Gross, and Moazzemi to incorporate the teachings of Turakhia to bypass portions of a multiplier accumulator (MAC) associated with the dynamic bits of the value. One would have been motivated to do this modification because doing so would give the benefit of implementing a combinational logic as taught by Turakhia [0039]. 

Regarding claim 10
Wang teaches: and dynamically adjust an Al quantization level for a segment of a neural network neural network ([Page 8613, Column 2, Paragraph 2] To this end, we propose the Hardware-Aware Automated Quantization (HAQ) framework that leverages reinforcement learning to automatically predict the quantization policy given the hardware’s feedback. The RL agent decides the bitwidth of a given neural network in a layer-wise manner. For each layer, the agent receives the layer configuration and statistics as observation, and it then outputs the action which is the bitwidth of weights and activations. We then leverage the hardware accelerator as the environment to obtain the direct feedback from hardware to guide the RL agent to satisfy the resource constraints. [Page 8615] Resource Constraints. In real-world applications, we have limited computation budgets (i.e., latency, energy, and model size). We would like to find the quantization policy with the best performance given the constraint. [Page 8619, Column 2, Paragraph 1] Our framework succeeds in learning to adjust its bitwidth policy under different constraints. Note: Automatically predict given the hardware's feedback corresponds to dynamically adjusting. Layer-wise manner corresponds to a segment of the neural network);
to process the segment of the neural network using the adjusted Al quantization level ([Page 8613, Column 2, Paragraph 2] After all layers are quantized, we finetune the quantized model for one more epoch, and feed the validation accuracy after short-term retraining as the reward signal to our RL agent).
However, Wang does not explicitly disclose: An artificial intelligence (AI) processor, comprising: a dynamic quantization controller configured to: receive temperature data associated with operation of an AI processor; determine an AI quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the AI processor and a throughput of the AI processor; adjust in response to the QoS value; and a multiplier accumulator (MAC) array configured.
Gross teaches, in an analogous system: An artificial intelligence (AI) processor, comprising: a dynamic quantization controller configured to: receive temperature data associated with operation of an AI processor ([Column 2, Lines 20-22] During operation, the electronic device receives temperature measurements from thermal sensors in the computer system. [Column 5, Lines 47-55] Referring back to FIG. 1A, analysis device 124 then validates the telemetry data (such as the temperature and/or fan-speed measurements) using a pattern-recognition model with the telemetry data (e.g., temperature and/or fan-speed measurements) as inputs. In particular, the pattern-recognition model may be a nonlinear, nonparametric regression model, such as a multivariate state estimation technique (MSET) and/or a kernel regression model. Note that MSET refers to a class of pattern-recognition techniques. [Column 5, Lines 63-67] Hence, the term MSET as used in this specification can refer to (among other things) any technique outlined in Gribok et al., including: ordinary least squares (OLS), support vector machines (SVM), artificial neural networks (ANNs), MSET, or regularized MSET (RMSET));
adjust in response to the temperature data ([Column 11, Lines 4-6] Then, optional resampling module 432 may resample and/or de-quantize at least some of telemetry data 440, such as temperature measurements).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Wang to incorporate the teachings of Gross to use an artificial intelligence (AI) processor, comprising: a dynamic quantization controller configured to: receive temperature data associated with operation of an AI processor and adjust in response to the temperature data. One would have been motivated to do this modification because doing so would give the benefit of validating at least some of telemetry data 440 using one or more pattern-recognition models as taught by Gross [Column 11, Lines 7 and 8].
Moazzemi teaches, in an analogous system: determine an AI quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the AI processor and a throughput of the AI processor ([Abstract, Paragraph 1] We first cover heuristic and optimization methods used to manage resources such as power, energy, temperature, Qualityof-Service (QoS) and reliability of the system. [Page 63, Column 2, Section B] Temperature aware scheduling for multi-threaded processors can reduce hot spots. [Page 64, Column 1, Paragraph 2] We abstractly classify QoS management techniques as performance-bound and/or accuracy-bound, as shown in Figure 2. [Page 64, Column 2, Paragraph 3] Under workload diversity, smart co-location - scheduling an optimized combination of latency and throughput sensitive applications together, exploits under-utilized resources to satisfy QoS of both types of applications. Note: Also see Figure 1 showing QoS and temperature. See Fig. 2 showing QoS and accuracy);
adjust in response to the QoS value ([Page 63, Column 2, Last Paragraph] Runtime QoS management becomes necessary and challenging [Page 64, Column 1, Paragraph 1] with i) variable workload characteristics ii) variable QoS requirements of applications, iii) identification and translation of QoS metrics into system level parameters for provisioning and iv) resource contention and arbitration among concurrent applications. Meeting QoS requirements of applications are largely based on: • the nature of computation - compute, memory and I/O intensity, streaming inputs and batch processing • the nature of end result - numerical, perceptive, soft and hard real-time, and user-interaction. We abstractly classify QoS management techniques as performance-bound and/or accuracy-bound, as shown in Figure 2. We present major underlying approaches and strategies for performance bound QoS guarantees through provisioning compute, memory and network bandwidth resources and accuracy-bound QoS through quality monitoring and control. Note: Run-time QoS management and Fig. 2 showing  calibration corresponds to adjusting).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Wang and Gross to incorporate the teachings of Moazzemi to determine an AI quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the AI processor and a throughput of the AI processor and adjusting in response to the QoS value. One would have been motivated to do this modification because doing so would give the benefit of classify QoS management techniques as performance-bound and/or accuracy-bound as taught by Moazzemi [Page 64, Column 1, Paragraph 2].
Turakhia teaches, in an analogous system: and a multiplier accumulator (MAC) array configured ([0015] FIG. 5 is a diagram illustrating an example of a modified multiplier-accumulator unit).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Wang, Gross, and Moazzemi to incorporate the teachings of Turakhia to use a multiplier accumulator (MAC). One would have been motivated to do this modification because doing so would give the benefit of implementing a combinational logic as taught by Turakhia [0039]. 

Regarding claim 11
The system of Wang, Gross, Moazzemi, and Turakhia teaches: The Al processor of claim 10 (as shown above).
Wang further teaches: wherein the dynamic quantization controller is configured such that dynamically adjusting the Al quantization level for the segment of the neural network comprises: increasing the Al quantization level in response to indicating a level that increased constraint of a processing ability of the Al processor, and decreasing the Al quantization level in response to indicating a level that decreased constraint of the processing ability of the Al processor ([Page 8619, Column 1, Last Paragraph] Discussions. In Figure 5, we visualize the bitwidth allocation strategy for MobileNet-V2. From this figure, we can observe that our framework assigns more bitwidths to the weights in depthwise convolution layers than pointwise convolution layers. Intuitively, this is because the number of parameters in the former is much smaller than the latter. Comparing Figure 4 and Figure 5, the policies are drastically [Page 8619, Column 2, Paragraph 1] different under different optimization objectives (fewer bitwiths for depthwise convolutions under latency optimization, more bitwidths for depthwise convolutions under model size optimization). Our framework succeeds in learning to adjust its bitwidth policy under different constraints).
However, Wang is not relied upon to teach: the temperature data.
Gross teaches, in an analogous system: the temperature data ([Column 2, Lines 20-22] During operation, the electronic device receives temperature measurements from thermal sensors in the computer system).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the AI processor of Wang to incorporate the teachings of Gross to use temperature data. One would have been motivated to do this modification because doing so would give the benefit of validating the temperature measurements using a pattern-recognition model based at least on the temperature measurements as taught by Gross [Column 2, Lines 24-26].

Regarding claim 13
The system Wang, Gross, Moazzemi, and Turakhia teaches: The Al processor of claim 10 (as shown above).
Wang further teaches: wherein the dynamic quantization controller is configured such that dynamically adjusting the Al quantization level for the segment of the neural network comprises adjusting the Al quantization level for quantizing weight values to be processed by the segment of the neural network ([Page 8614, Column 1, Last but one Paragraph] Our framework further explores the automated quantization for network weights and activations, and it takes the hardware architectures into consideration. [Page 8615, Column 2, Section 3.4, Paragraph 2] Specifically, for each weight value w in the kth layer, we first truncate it into the range of [−c, c], and we then quantize it linearly into ak bits).

Regarding claim 14
The system Wang, Gross, Moazzemi, and Turakhia teaches: The Al processor of claim 10 (as shown above).
Wang further teaches: wherein the dynamic quantization controller is configured such that dynamically adjusting the Al quantization level for the segment of the neural network comprises adjusting the Al quantization level for quantizing activation values to be processed by the segment of the neural network ([Page 8614, Column 1, Last but one Paragraph] Our framework further explores the automated quantization for network weights and activations, and it takes the hardware architectures into consideration).

Regarding claim 15
The system Wang, Gross, Moazzemi, and Turakhia teaches: The Al processor of claim 10 (as shown above).
Wang further teaches: wherein the dynamic quantization controller is configured such that dynamically adjusting the Al quantization level for the segment of the neural network comprises adjusting the Al quantization level for quantizing weight values and activation values to be processed by the segment of the neural network ([Page 8614, Column 1, Last but one Paragraph] Our framework further explores the automated quantization for network weights and activations, and it takes the hardware architectures into consideration. [Page 8615, Column 2, Section 3.4, Paragraph 1] We linearly quantize the weights and activations of each layer).

Regarding claim 16
The system Wang, Gross, Moazzemi, and Turakhia teaches: The Al processor of claim 10 (as shown above).
Wang further teaches wherein: the Al quantization level is configured to indicate dynamic bits of a value to be processed by the neural network to quantize ([Page 8614, Column 2, Section 3] We model the quantization task as a reinforcement learning problem (Figure 2). We use the actor-critic model with DDPG agent to give the action: bits for each layer);
processing  the segment of the neural network using the adjusted Al quantization level ([Page 8614, Column 2, Section 3.1] Our agent processes the neural network in a layer-wise manner. [Page 3, Column 1, Paragraph 1] Table 1. Comparison of ImageNet validation accuracy among different rounding schemes for 4-bit quantization of the first layer of Resnet18. Note: 4-bit quantization corresponds to the Al quantization level. Each layer corresponds to a segment of the neural network).
However, the system of Wang, Gross, and Moazzemi does not explicitly disclose: the MAC array is configured such that processing comprises bypassing portions of a MAC associated with the dynamic bits of the value.
Turakhia teaches, in an analogous system: the MAC array is configured such that processing comprises bypassing portions of a MAC associated with the dynamic bits of the value ([0015] FIG. 5 is a diagram illustrating an example of a modified multiplier-accumulator unit that bypasses the multiplier and adder when at least one of the operands for the multiplier is zero).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Wang, Gross, and Moazzemi to incorporate the teachings of Turakhia wherein the MAC array is configured such that processing comprises bypassing portions of a MAC associated with the dynamic bits of the value. One would have been motivated to do this modification because doing so would give the benefit of implementing a combinational logic as taught by Turakhia [0039]. 

Regarding claim 19
Wang teaches: and dynamically adjust an Al quantization level for a segment of a neural network ([Page 8613, Column 2, Paragraph 2] To this end, we propose the Hardware-Aware Automated Quantization (HAQ) framework that leverages reinforcement learning to automatically predict the quantization policy given the hardware’s feedback. The RL agent decides the bitwidth of a given neural network in a layer-wise manner. For each layer, the agent receives the layer configuration and statistics as observation, and it then outputs the action which is the bitwidth of weights and activations. We then leverage the hardware accelerator as the environment to obtain the direct feedback from hardware to guide the RL agent to satisfy the resource constraints. [Page 8615] Resource Constraints. In real-world applications, we have limited computation budgets (i.e., latency, energy, and model size). We would like to find the quantization policy with the best performance given the constraint. [Page 8619, Column 2, Paragraph 1] Our framework succeeds in learning to adjust its bitwidth policy under different constraints. Note: Automatically predict given the hardware's feedback corresponds to dynamically adjusting. Layer-wise manner corresponds to a segment of the neural network);
to process the segment of the neural network using the adjusted Al quantization level ([Page 8613, Column 2, Paragraph 2] After all layers are quantized, we finetune the quantized model for one more epoch, and feed the validation accuracy after short-term retraining as the reward signal to our RL agent).
However, Wang does not explicitly disclose: A computing device, comprising: an artificial intelligence (Al) processor comprising a dynamic quantization controller configured to: receive temperature data associated with operation of an Al processor; and in response to the temperature data; and the Al processor further comprising a multiplier accumulator (MAC) array configured.
Gross teaches, in an analogous system: A computing device, comprising: an artificial intelligence (Al) processor comprising a dynamic quantization controller configured to: receive temperature data associated with operation of an Al processor ([Column 2, Lines 20-22] During operation, the electronic device receives temperature measurements from thermal sensors in the computer system. [Column 5, Lines 47-55] Referring back to FIG. 1A, analysis device 124 then validates the telemetry data (such as the temperature and/or fan-speed measurements) using a pattern-recognition model with the telemetry data (e.g., temperature and/or fan-speed measurements) as inputs. In particular, the pattern-recognition model may be a nonlinear, nonparametric regression model, such as a multivariate state estimation technique (MSET) and/or a kernel regression model. Note that MSET refers to a class of pattern-recognition techniques. [Column 5, Lines 63-67] Hence, the term MSET as used in this specification can refer to (among other things) any technique outlined in Gribok et al., including: ordinary least squares (OLS), support vector machines (SVM), artificial neural networks (ANNs), MSET, or regularized MSET (RMSET));
adjust in response to the temperature data ([Column 11, Lines 4-6] Then, optional resampling module 432 may resample and/or de-quantize at least some of telemetry data 440, such as temperature measurements).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Wang to incorporate the teachings of Gross to use a computing device, comprising: an artificial intelligence (Al) processor comprising a dynamic quantization controller configured to: receive temperature data associated with operation of an Al processor and adjust in response to the temperature data. One would have been motivated to do this modification because doing so would give the benefit of validating at least some of telemetry data 440 using one or more pattern-recognition models as taught by Gross [Column 11, Lines 7 and 8].
Moazzemi teaches, in an analogous system: determine an AI quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the AI processor and a throughput of the AI processor ([Abstract, Paragraph 1] We first cover heuristic and optimization methods used to manage resources such as power, energy, temperature, Qualityof-Service (QoS) and reliability of the system. [Page 63, Column 2, Section B] Temperature aware scheduling for multi-threaded processors can reduce hot spots. [Page 64, Column 1, Paragraph 2] We abstractly classify QoS management techniques as performance-bound and/or accuracy-bound, as shown in Figure 2. [Page 64, Column 2, Paragraph 3] Under workload diversity, smart co-location - scheduling an optimized combination of latency and throughput sensitive applications together, exploits under-utilized resources to satisfy QoS of both types of applications. Note: Also see Figure 1 showing QoS and temperature. See Fig. 2 showing QoS and accuracy);
adjust in response to the QoS value ([Page 63, Column 2, Last Paragraph] Runtime QoS management becomes necessary and challenging [Page 64, Column 1, Paragraph 1] with i) variable workload characteristics ii) variable QoS requirements of applications, iii) identification and translation of QoS metrics into system level parameters for provisioning and iv) resource contention and arbitration among concurrent applications. Meeting QoS requirements of applications are largely based on: • the nature of computation - compute, memory and I/O intensity, streaming inputs and batch processing • the nature of end result - numerical, perceptive, soft and hard real-time, and user-interaction. We abstractly classify QoS management techniques as performance-bound and/or accuracy-bound, as shown in Figure 2. We present major underlying approaches and strategies for performance bound QoS guarantees through provisioning compute, memory and network bandwidth resources and accuracy-bound QoS through quality monitoring and control. Note: Run-time QoS management and Fig. 2 showing  calibration corresponds to adjusting).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Wang and Gross to incorporate the teachings of Moazzemi to determine an AI quality of service (QoS) value based on the temperature data, wherein the QoS value represents a target for accuracy of a result generated by the AI processor and a throughput of the AI processor and adjusting in response to the QoS value. One would have been motivated to do this modification because doing so would give the benefit of classify QoS management techniques as performance-bound and/or accuracy-bound as taught by Moazzemi [Page 64, Column 1, Paragraph 2].
Turakhia teaches, in an analogous system: and the Al processor further comprising a multiplier accumulator (MAC) array configured ([0047] In some examples, the compiler 132 can divide the neural network model into portions (e.g., neural network 200)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Wang, Gross, and Moazzemi to incorporate the teachings of Turakhia to use the Al processor further comprising a multiplier accumulator (MAC) array configured. One would have been motivated to do this modification because doing so would give the benefit of implementing a combinational logic as taught by Turakhia [0039]. 


Regarding claim 20
The system of Wang, Gross, Moazzemi, and Turakhia teaches: The computing device of claim 19 (as shown above).
Wang further teaches: wherein the dynamic quantization controller is configured to dynamically adjust the Al quantization level for the segment of the neural network by: increasing the Al quantization level in response to indicating a level that increased constraint of a processing ability of the Al processor, and decreasing the Al quantization level in response to indicating a level that decreased constraint of the processing ability of the Al processor ([Page 8619, Column 1, Last Paragraph] Discussions. In Figure 5, we visualize the bitwidth allocation strategy for MobileNet-V2. From this figure, we can observe that our framework assigns more bitwidths to the weights in depthwise convolution layers than pointwise convolution layers. Intuitively, this is because the number of parameters in the former is much smaller than the latter. Comparing Figure 4 and Figure 5, the policies are drastically [Page 8619, Column 2, Paragraph 1] different under different optimization objectives (fewer bitwiths for depthwise convolutions under latency optimization, more bitwidths for depthwise convolutions under model size optimization). Our framework succeeds in learning to adjust its bitwidth policy under different constraints).
However, Wang is not relied upon to teach: the temperature data.
Gross teaches, in an analogous system: the temperature data ([Column 2, Lines 20-22] During operation, the electronic device receives temperature measurements from thermal sensors in the computer system).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computing device of Wang to incorporate the teachings of Gross to use temperature data. One would have been motivated to do this modification because doing so would give the benefit of validating the temperature measurements using a pattern-recognition model based at least on the temperature measurements as taught by Gross [Column 2, Lines 24-26].

	
Regarding claim 22
The system of Wang, Gross, Moazzemi, and Turakhia teaches: The computing device of claim 19 (as shown above).
Wang further teaches: wherein the dynamic quantization controller is configured to dynamically adjust the Al quantization level for the segment of the neural network by adjusting the Al quantization level for quantizing weight values to be processed by the segment of the neural network ([Page 8614, Column 1, Last but one Paragraph] Our framework further explores the automated quantization for network weights and activations, and it takes the hardware architectures into consideration. [Page 8615, Column 2, Section 3.4, Paragraph 2] Specifically, for each weight value w in the kth layer, we first truncate it into the range of [−c, c], and we then quantize it linearly into ak bits).

Regarding claim 23
The system of Wang, Gross, Moazzemi, and Turakhia teaches: The computing device of claim 19 (as shown above).
Wang further teaches: wherein the dynamic quantization controller is configured to dynamically adjust the Al quantization level for the segment of the neural network by adjusting the Al quantization level for quantizing activation values to be processed by the segment of the neural network ([Page 8614, Column 1, Last but one Paragraph] Our framework further explores the automated quantization for network weights and activations, and it takes the hardware architectures into consideration).

Regarding claim 24
The system of Wang, Gross, Moazzemi, and Turakhia teaches: The computing device of claim 19 (as shown above).
Wang further teaches: wherein the dynamic quantization controller is configured to dynamically adjust the Al quantization level for the segment of the neural network by adjusting the Al quantization level for quantizing weight values and activation values to be processed by the segment of the neural network ([Page 8614, Column 1, Last but one Paragraph] Our framework further explores the automated quantization for network weights and activations, and it takes the hardware architectures into consideration. [Page 8615, Column 2, Section 3.4, Paragraph 1] We linearly quantize the weights and activations of each layer).

Regarding claim 25
The system Wang, Gross, Moazzemi, and Turakhia teaches: The computing device of claim 19 (as shown above).
Wang further teaches wherein: the Al quantization level is configured to indicate dynamic bits of a value to be processed by the neural network to quantize ([Page 8614, Column 2, Section 3] We model the quantization task as a reinforcement learning problem (Figure 2). We use the actor-critic model with DDPG agent to give the action: bits for each layer);
process  the segment of the neural network using the adjusted Al quantization level ([Page 8614, Column 2, Section 3.1] Our agent processes the neural network in a layer-wise manner. [Page 3, Column 1, Paragraph 1] Table 1. Comparison of ImageNet validation accuracy among different rounding schemes for 4-bit quantization of the first layer of Resnet18. Note: 4-bit quantization corresponds to the Al quantization level. Each layer corresponds to a segment of the neural network).
However, the system of Wang, Gross, Moazzemi does not explicitly disclose: the MAC array is configured to bypassing portions of a MAC associated with the dynamic bits of the value.
Turakhia teaches, in an analogous system: the MAC array is configured to bypassing portions of a MAC associated with the dynamic bits of the value ([0015] FIG. 5 is a diagram illustrating an example of a modified multiplier-accumulator unit that bypasses the multiplier and adder when at least one of the operands for the multiplier is zero).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Wang, Gross, and Moazzemi to incorporate the teachings of Turakhia wherein the MAC array is configured to bypassing portions of a MAC associated with the dynamic bits of the value. One would have been motivated to do this modification because doing so would give the benefit of implementing a combinational logic as taught by Turakhia [0039]. 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Du et al (Self-Aware Neural Network Systems: A Survey and New Perspective, 2020) discloses DEVICE ENHANCEMENTS FOR SOFTWARE DEFINED SILICON IMPLEMENTATIONS. We propose a comprehensive SaNNS from a new perspective, that is, the model layer, to exploit more opportunities for high efficiency. The proposed system is called as MinMaxNN, which features model switching and elastic sparsity based on monitored information from the execution environment. The model switching mechanism implies that models (i.e., min and max model) dynamically switch given different inputs for both efficiency and accuracy. The elastic sparsity mechanism indicates that the sparsity of NNs can be dynamically adjusted in each layer for efficiency. The experimental results show that compared with traditional SaNNS, MinMaxNN can achieve 5.64× and 19.66% performance improvement and energy reduction, respectively, without notable loss of accuracy and negative effects on developers’ productivity.

Jiang et al (Optimizing energy efficiency of CNN-based object detection with dynamic voltage and frequency scaling, 2020) discloses to use dynamic voltage and frequency scaling (DVFS) to further optimize the energy efficiency for CNNs. First, we have developed a DVFS framework on FPGAs. Second, we apply the DVFS to SkyNet, a state-of-the-art neural network targeting on object detection. Third, we analyze the impact of DVFS on CNNs in terms of performance, power, energy efficiency and accuracy. Compared to the state-of-the-art, experimental results show that we have achieved 38% improvement in energy efficiency without any loss in accuracy. Results also show that we can achieve 47% improvement in energy efficiency if we allow 0.11% relaxation in accuracy.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAITANYA RAMESH JAYAKUMAR whose telephone number is (571)272-3369. The examiner can normally be reached Mon-Fri 9am-1pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.R.J./Examiner, Art Unit 2128                                                                                                                                                                                         
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Mar 24, 2021
Application Filed
Dec 05, 2024
Non-Final Rejection — §103
Jan 16, 2025
Interview Requested
Feb 05, 2025
Examiner Interview Summary
Feb 05, 2025
Applicant Interview (Telephonic)
Feb 28, 2025
Response Filed
May 06, 2025
Final Rejection — §103
Jun 18, 2025
Interview Requested
Jul 10, 2025
Response after Non-Final Action
Aug 12, 2025
Request for Continued Examination
Aug 19, 2025
Response after Non-Final Action
Jan 05, 2026
Non-Final Rejection — §103
Apr 13, 2026
Examiner Interview Summary
Apr 13, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

15/884,279
Patent 12293260
GENERATING AND DEPLOYING PACKAGES FOR MACHINE LEARNING AT EDGE DEVICES
2y 5m to grant Granted May 06, 2025
16/547,380
Patent 12147915
SYSTEMS AND METHODS FOR MODELLING PREDICTION ERRORS IN PATH-LEARNING OF AN AUTONOMOUS LEARNING AGENT
2y 5m to grant Granted Nov 19, 2024
15/866,225
Patent 11770571
Matrix Completion and Recommendation Provision with Deep Learning
2y 5m to grant Granted Sep 26, 2023
16/507,025
Patent 11769074
COLLECTING OBSERVATIONS FOR MACHINE LEARNING
2y 5m to grant Granted Sep 26, 2023
15/826,613
Patent 11741693
SYSTEM AND METHOD FOR SEMI-SUPERVISED CONDITIONAL GENERATIVE MODELING USING ADVERSARIAL NETWORKS
2y 5m to grant Granted Aug 29, 2023
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
26%
Grant Probability
48%
With Interview (+22.5%)
4y 6m
Median Time to Grant
High
PTA Risk
Based on 51 resolved cases by this examiner. Grant probability derived from career allow rate.