Last updated: April 19, 2026
Application No. 17/314,970
DEEP LEARNING ACCELERATOR MODELS AND HARDWARE

Final Rejection §103
Filed
May 07, 2021
Examiner
WU, BENJAMIN C
Art Unit
2195
Tech Center
2100 — Computer Architecture & Software
Assignee
Micron Technology, Inc.
OA Round
4 (Final)
Interview Optional

— +16.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 522 resolved cases, 2023–2026
Examiner Intelligence

WU, BENJAMIN C View full profile →
Grants 87% — above average
Career Allow Rate
456 granted / 522 resolved
+32.4% vs TC avg
Strong +16% interview lift
Without
With
+16.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
29 currently pending
Career history
551
Total Applications
across all art units
Statute-Specific Performance

§101
19.8%
-20.2% vs TC avg
§103
48.4%
+8.4% vs TC avg
§102
0.8%
-39.2% vs TC avg
§112
16.1%
-23.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 522 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

2.	Claims 1, 5–6, 8, 29, 32–33, 35–36, 39–40 and 42–48 are pending for examination in the reply filed on 10/29/2025.
	Claim 2–4, 7, 9–28, 30–31, 34, 37–38, and 41 are cancelled.


Examiner Notes
3.	Examiner refers to and explicitly cites particular pages, sections, figures, paragraphs or columns and lines in the references as applied to Applicant’s claims to the extent practicable to streamline prosecution.
Although the cited portions of the references are representative of the best teachings in the art and are applied to meet the specific limitations of the claims, other uncited but related teachings of the references may be equally applicable as well.  It is respectfully requested that, in preparing responses to the rejections, the Applicant fully considers not only the cited portions of the references, but also the references in their entirety, as potentially teaching, suggesting or rendering obvious all or one or more aspects of the claimed invention.


Abbreviations
4.	Where appropriate, the following abbreviations will be used when referencing Applicant’s submissions and specific teachings of the reference(s):
i.	figure / figures:		Fig. / Figs.
ii.	column / columns:		Col. / Cols.
iii.	page / pages:			p. / pp.


References Cited
5.	(A)	Hong et al., US 2022/0121912 A1 (“Hong”).
	(B)	GADELRAB et al., US 2021/0279635 A1 (“Gadelrab”).
	
	Hong and Gadelrab were cited in the previous Office action.


Notice re prior art available under both pre-AIA  and AIA 
6.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

A.
7.	Claims 1, 5–6, 8, 29, 32–33, 35–36, 39–40 and 42–48 are rejected under 35 U.S.C. 103 as being unpatentable over (A) Hong in view of (B) Gadelrab.

See “References Cited” section, above, for full citations of references.

(Claims 1, 5–6, 8, 43, and 44 are methods claims.
Claims 29, 32–33, 35, 45, and 46 are apparatus/system claims.
Claims 36, 39–40, 42, 47, and 48 are computer program product claims)

8.	Regarding claim 1, (A) Hong teaches/suggests the invention substantially as claimed, including:
	“A method, comprising:
	assigning a first quantity of a plurality of deep learning accelerator (DLA) cores of a DLA chip to a first subset of the DLA cores based at least in part on a first computational capability of a first DLA model;
(¶ 56: plurality of neural networks executed on the accelerator 120 may be neural networks in different structures;
¶ 48: request may be for data inference based on the neural network and may cause the accelerator 120 to execute, that is, run … the neural network and to acquire a data inference result for, for example, an object recognition, a pattern recognition, a computer vision, a voice recognition, a machine translation, a machine interpretation, a recommendation service, a personalized service, a video processing, and/or autonomous driving;
The Examiner notes: different applications have different computational capabilities, features, or functions, requiring different accelerator/core performance;
¶ 77: The scheduler may select a candidate kernel estimated to have the best accelerator performance when running each candidate kernel based on a current situation of the accelerator and kernel information of candidate kernels. A criterion of performance may include any one or any combination of a throughput of each model, a latency, a fairness, a power consumption amount, and a utilization rate of the accelerator based on a situation or a selection of a user;
¶ 119: example, a workload that uses a relatively large computational amount may be allocated to a relatively LARGE NUMBER OF PROCESSING ELEMENTS and processed thereby, and a workload that uses a relatively small computational amount may be allocated to a relatively small number of processing elements and processed thereby;
see ¶¶ 23–25 below, for elaboration and context;
see also ¶ 51 for elaboration that each accelerator core (of the plurality of accelerator cores on one accelerator chip) may include one or more processing elements (PEs) configured to perform operations according to the neural network).

	“assigning a second, greater, quantity of the plurality of DLA cores to a second subset of
the DLA cores based at least in part on a second computational capability of a second DLA
model that is greater than the first computational capability of the first DLA model,
wherein the second DLA model is different than the first DLA model”

(¶ 96: five of the eight cores are allocated to run layer 1 of model 1 and the remaining three of the eight cores are in an idle status. In an “after” situation in which layer 1 of model 1 starts to run in the above situation, layer 1 of model 2 may run by selecting kernel 2 capable of maximally using the remaining three cores;
Fig. 8 and ¶¶ 97–98: FIG. 8 illustrates another example of dynamically allocating resources;
¶ 56: plurality of neural networks executed on the accelerator 120 may be neural networks in different structures;
¶ 48: request may be for data inference based on the neural network and may cause the accelerator 120 to execute, that is, run … the neural network and to acquire a data inference result for, for example, an object recognition, a pattern recognition, a computer vision, a voice recognition, a machine translation, a machine interpretation, a recommendation service, a personalized service, a video processing, and/or autonomous driving;
The Examiner notes: different applications have different computational capabilities, features, or functions, requiring different accelerator/core performance;
¶ 77: The scheduler may select a candidate kernel estimated to have the best accelerator performance when running each candidate kernel based on a current situation of the accelerator and kernel information of candidate kernels. A criterion of performance may include any one or any combination of a throughput of each model, a latency, a fairness, a power consumption amount, and a utilization rate of the accelerator based on a situation or a selection of a user;
¶ 119: example, a workload that uses a relatively large computational amount may be allocated to a relatively large number of processing elements and processed thereby, and a workload that uses a relatively small computational amount may be allocated to a relatively small number of processing elements and processed thereby;

see also, Gadelrab, infra, ¶ 54: execution of inferences may begin using a high accuracy model on high performance cores ( e.g., using a machine learning model with floating point weights), and inferences may be performed in parallel on one or more high efficiency cores using one or more sets of quantized weights (e.g., weights quantized to reduced-size representations, such as 16-bit integer, to 8-bit integer, 4-bit integer, etc., relative to the floating point weights included in a high accuracy model);
¶ 103: the system executes an inference using the high accuracy representation of the machine learning model on high accuracy hardware. High accuracy hardware may be processors or processing cores that can perform floating point operations, such as cores designated as high performance cores in a heterogeneous multicore processor ( e.g. "big" cores in a BIG.little architecture), graphics processing units, tensor processing units, neural processing units, and/or other high performance processing units;
¶ 110: The selected high efficiency model may be executed on high efficiency processors that use less power than the high accuracy hardware discussed above. These processors may include, for example, processors designed as high efficiency cores in a heterogeneous multicore processor (e.g., “little” cores in a BIG.little architecture), integer processing modules on a processor, or the like);


	executing the first DLA model using the first subset of the plurality of DLA cores of a DLA chip; and
(¶¶ 23–25 **: generating a plurality of kernels for each of a first neural network model and a second neural network model; and running a kernel of the first model on a number of cores of an available resource of an accelerator; in response to a start of the running, running a kernel of the second model on a remaining number of cores of the available resource of an accelerator; generating one or more output feature maps based on the running of the kernels.
The running of the kernel of the second model may include running the kernel of the second model in response to determining that a collision will not occur between memory access patterns during the running of the kernel of the first model and the running of the kernel of the second model.
For each of the kernel of the first model and the kernel of the second model, the kernel may be selected for running from among the plurality of kernels based on kernel information including any one or more of: a number of accelerator cores;
¶ 56: in response to a plurality of requests received at the host processor 110, the accelerator 120 may execute a plurality of neural networks according to kernels generated by the host processor 110. Here, the plurality of neural networks executed on the accelerator 120 may be neural networks in different structures … Herein, a neural network may also be referred to as a model for clarity of description;
Fig. 2 and ¶ 58: executing a multi-model on a multi-core accelerator;
¶ 55: neural network may provide an optimal output corresponding to an input by mapping an input and an output having a nonlinear relationship based on deep learning. The deep learning may be a machine learning scheme for solving a given problem from a big data set …);


executing a second DLA model using a second subset of the plurality of DLA cores of the DLA chip …”
(¶¶ 23–25 **: generating a plurality of kernels for each of a first neural network model and a second neural network model; and running a kernel of the first model on a number of cores of an available resource of an accelerator; in response to a start of the running, running a kernel of the second model on a remaining number of cores of the available resource of an accelerator; generating one or more output feature maps based on the running of the kernels.
The running of the kernel of the second model may include running the kernel of the second model in response to determining that a collision will not occur between memory access patterns during the running of the kernel of the first model and the running of the kernel of the second model.
For each of the kernel of the first model and the kernel of the second model, the kernel may be selected for running from among the plurality of kernels based on kernel information including any one or more of: a number of accelerator cores;
¶ 56: in response to a plurality of requests received at the host processor 110, the accelerator 120 may execute a plurality of neural networks according to kernels generated by the host processor 110. Here, the plurality of neural networks executed on the accelerator 120 may be neural networks in different structures … Herein, a neural network may also be referred to as a model for clarity of description;
Fig. 2 and ¶ 58: executing a multi-model on a multi-core accelerator;
¶ 55: neural network may provide an optimal output corresponding to an input by mapping an input and an output having a nonlinear relationship based on deep learning. The deep learning may be a machine learning scheme for solving a given problem from a big data set …).


	“wherein the first subset comprises a first quantity of the plurality of DLA cores and the second subset comprises a second quantity of the plurality of DLA cores that is different than the first quantity of the plurality of DLA cores”
(¶ 96: five of the eight cores are allocated to run layer 1 of model 1 and the remaining three of the eight cores are in an idle status. In an “after” situation in which layer 1 of model 1 starts to run in the above situation, layer 1 of model 2 may run by selecting kernel 2 capable of maximally using the remaining three cores;
Fig. 8 and ¶¶ 97–98: FIG. 8 illustrates another example of dynamically allocating resources).


	Hong does not teach “determining to switch from execution of the first DLA model to execution of the second DLA model based on accuracy of results from execution of the first DLA model” and “executing the second DLA model … in response to determining to switch from execution of the first DLA model to execution of the second DLA model.”



	(B) Gadelrab however teaches or suggests:
	“wherein the second DLA model is different than the first DLA model; …”
(¶ 54: execution of inferences may begin using a high accuracy model on high performance cores ( e.g., using a machine learning model with floating point weights), and inferences may be performed in parallel on one or more high efficiency cores using one or more sets of quantized weights (e.g., weights quantized to reduced-size representations, such as 16-bit integer, to 8-bit integer, 4-bit integer, etc., relative to the floating point weights included in a high accuracy model);
¶ 103: the system executes an inference using the high accuracy representation of the machine learning model on high accuracy hardware. High accuracy hardware may be processors or processing cores that can perform floating point operations, such as cores designated as high performance cores in a heterogeneous multicore processor ( e.g. "big" cores in a BIG.little architecture), graphics processing units, tensor processing units, neural processing units, and/or other high performance processing units;
¶ 110: The selected high efficiency model may be executed on high efficiency processors that use less power than the high accuracy hardware discussed above. These processors may include, for example, processors designed as high efficiency cores in a heterogeneous multicore processor (e.g., “little” cores in a BIG.little architecture), integer processing modules on a processor, or the like);

“determining to switch from execution of the first DLA model to execution of the second DLA model based on accuracy of results from execution of the first DLA model” and
“executing the second DLA model … in response to determining to switch from execution of the first DLA model to execution of the second DLA model.”
(Fig. 5 and ¶¶ 112–121;
¶ 113: As illustrated, executing inferences in a high efficiency mode begins at block 502;
¶ 114: Generally, the executed inference may be performed using a selected high efficiency model ( e.g., a machine learning model and weights quantized to a data type that involves less complex computation;
¶ 118: At block 514, the system determines whether the high efficiency model accuracy and overflow/underflow statistics are within an acceptable range. To determine whether high efficiency model accuracy is within an acceptable range;
¶¶ 120–121: If the system determines, at block 514, that high efficiency model accuracy is acceptable and overflow/underflow statistics are within the threshold value, the system may return to block 502 and execute a subsequent inference using the high efficiency model.
Otherwise, at block 516, the system exits the high efficiency mode and executes subsequent inferences using the high accuracy mode (e.g., as discussed above with respect to FIG. 4);

¶ 53: To allow for inferences to be performed on efficient cores while maintaining a sufficient degree of accuracy in the results generated by executing inferences using a quantized machine learning model, embodiments described herein provide various techniques for adaptively quantizing parameters used by a machine learning model and switching between performing inferences using a high accuracy model on high performance cores and performing inferences using quantized models on efficient cores;
¶ 54: execution of inferences may begin using a high accuracy model on high performance cores ( e.g., using a machine learning model with floating point weights), and inferences may be performed in parallel on one or more high efficiency cores using one or more sets of quantized weights (e.g., weights quantized to reduced-size representations, such as 16-bit integer, to 8-bit integer, 4-bit integer, etc., relative to the floating point weights included in a high accuracy model);

¶ 57: The quantized weight information may be in a format for which computation is less intensive than computation using the received weight information;

¶ 103: the system executes an inference using the high accuracy representation of the machine learning model on high accuracy hardware. High accuracy hardware may be processors or processing cores that can perform floating point operations, such as cores designated as high performance cores in a heterogeneous multicore processor ( e.g. "big" cores in a BIG.little architecture), graphics processing units, tensor processing units, neural processing units, and/or other high performance processing units;
¶ 110: The selected high efficiency model may be executed on high efficiency processors that use less power than the high accuracy hardware discussed above. These processors may include, for example, processors designed as high efficiency cores in a heterogeneous multicore processor (e.g., “little” cores in a BIG.little architecture), integer processing modules on a processor, or the like).


It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of (B) Gadelrab with those of (A) Hong to switch between different models having different resource requirements, computation complexities, and/or result accuracies (i.e. QoS). The motivation or advantage to do so is to improve resource efficiency by adaptively optimizing/switching the use of computational (core) resources based on acceptable computational and performance results.


9.	Regarding claim 5, Hong teaches/suggests:
“assigning less than all of the plurality of DLA cores to the first subset and the second subset of the plurality of DLA cores”
(¶ 96: five of the eight cores are allocated to run layer 1 of model 1 and the remaining three of the eight cores are in an idle status;
Fig. 8 and ¶¶ 97–98: FIG. 8 illustrates another example of dynamically allocating resources;
¶ 87: determine whether a number of idle cores of an accelerator is greater than or equal to a number of required cores of a candidate kernel;
¶ 88: When it is determined that the number of idle cores of the accelerator is greater than or equal to the number of required cores of the candidate kernel in operation 610, in operation 620, the scheduler may extract a combination of cores mappable per candidate kernel).


10.	Regarding claim 6, Hong teaches/suggests:
“assigning the first quantity and the second quantity of the plurality of DLA cores without regard to a total quantity of the plurality of DLA cores”
(¶ 96: five of the eight cores are allocated to run layer 1 of model 1 and the remaining three of the eight cores are in an idle status;
Fig. 8 and ¶¶ 97–98: FIG. 8 illustrates another example of dynamically allocating resources;
¶ 87: determine whether a number of idle cores of an accelerator is greater than or equal to a number of required cores of a candidate kernel;
¶ 88: When it is determined that the number of idle cores of the accelerator is greater than or equal to the number of required cores of the candidate kernel in operation 610, in operation 620, the scheduler may extract a combination of cores mappable per candidate kernel;
¶ 89: when ten cores included in two clusters are being used and two cores are being used in another cluster, three cores and five cores among remaining eight idle cores may be verified).


11.	Regarding claim 8, Hong teaches/suggests:
	“executing a third DLA model using a third subset of the plurality of DLA cores of the DLA chip, wherein the third subset comprises a third quantity of the plurality of DLA cores that is different than the first and third quantities of the plurality of DLA cores; and
assigning the third quantity of the plurality of DLA cores to the third subset of the DLA cores based at least in part on a third computational capability of the third DLA model, wherein the third computational capability is different than the first and second computational capabilities”
(¶ 96: five of the eight cores are allocated to run layer 1 of model 1 and the remaining three of the eight cores are in an idle status. In an “after” situation in which layer 1 of model 1 starts to run in the above situation, layer 1 of model 2 may run by selecting kernel 2 capable of maximally using the remaining three cores;
Fig. 8 and ¶¶ 97–98: FIG. 8 illustrates another example of dynamically allocating resources;
See supra, ¶¶ 48, 56, 77, and 119, as applied in rejecting claim 2 above;
Fig. 2 and ¶ 59: models 1 to 3;
¶ 80: For example, if a total number of cores in the accelerator is 20 and four models are to run (that is, be executed)).


12.	Regarding claim 43, Hong and Gadelrab teach or suggest:
	“determining to switch from execution of the first DLA model to execution of the second DLA model based on accuracy of results from execution of the first DLA model on data received after compile time”
(Hong, Fig. 2 and ¶ 65: Every time a request for executing a corresponding model is received, the compiler 210 may generate a plurality of candidate kernels for each of layers included in the corresponding model and corresponding kernel information and transfers the same to the scheduler 220;
¶ 132: In one example, the instructions or software include machine code that is directly executed by the one or more processors or computers, such as machine code produced by a compiler;
   Gadelrab, Fig. 5 and ¶¶ 112–121;
¶ 113: As illustrated, executing inferences in a high efficiency mode begins at block 502;
¶ 114: Generally, the executed inference may be performed using a selected high efficiency model ( e.g., a machine learning model and weights quantized to a data type that involves less complex computation;
¶ 118: At block 514, the system determines whether the high efficiency model accuracy and overflow/underflow statistics are within an acceptable range. To determine whether high efficiency model accuracy is within an acceptable range;
¶¶ 120–121: If the system determines, at block 514, that high efficiency model accuracy is acceptable and overflow/underflow statistics are within the threshold value, the system may return to block 502 and execute a subsequent inference using the high efficiency model.
Otherwise, at block 516, the system exits the high efficiency mode and executes subsequent inferences using the high accuracy mode (e.g., as discussed above with respect to FIG. 4);
¶ 53: To allow for inferences to be performed on efficient cores while maintaining a sufficient degree of accuracy in the results generated by executing inferences using a quantized machine learning model, embodiments described herein provide various techniques for adaptively quantizing parameters used by a machine learning model and switching between performing inferences using a high accuracy model on high performance cores and performing inferences using quantized models on efficient cores).



13.	Regarding claim 44,  Gadelrab teaches or suggests:
	“determining to switch from execution of the first DLA model to execution of the second DLA model based on accuracy of results from execution of the first DLA model on data representative of data on which execution of the first DLA model and the second DLA model is anticipated”
(Gadelrab, Fig. 5 and ¶¶ 53 and 112–121;
¶ 52: Where models are quantized, the quantization may be tested with some inputs to verify that the quantized model works for those outputs;
¶ 58: In some embodiments, the computing device can reduce the weight information into a plurality of sets of quantized weight information associated with different quantization levels to be tested during execution of the machine learning model to identify an optimal level of quantization, or a level of quantization that results in sufficient inference accuracy relative to inference accuracy for
inferences performed using the high accuracy model;
¶ 60: In another embodiment, the computing device can quantize weight information from floating point to a minimal bit size representation (e.g., 1-bit or 2-bit integer) and determine if inferences performed using the minimal bit size representation has sufficient accuracy relative to inference accuracy for inferences performed using the high accuracy model).


14.	Regarding claims 29, 32–33, 35, 45, and 46, they are the corresponding system claims reciting similar limitations of commensurate scope as the method of claims 1, 5–6, 8, 43, and 44, respectively. Therefore, they are rejected on the same basis as claims 1, 5–6, 8, 43, and 44 above, including the following rationale:

	Hong teaches/suggests: “a deep learning accelerator (DLA) chip comprising a plurality of DLA cores; and a compiler coupled to the DLA chip and configured to:”
(Fig. 1 and ¶ 51: The accelerator core 131 may include one or more processing elements (PEs) configured to perform operations according to the neural network. Although FIG. 1 illustrates that a single accelerator core 131 is included in the accelerator chip 130 for clarity of description, a plurality of accelerator cores may be included in the accelerator chip 130 and may process operations;
Fig. 2 and ¶ 65: Every time a request for executing a corresponding model is received, the compiler 210 may generate a plurality of candidate kernels for each of layers included in the corresponding model and corresponding kernel information and transfers the same to the scheduler 220;
¶ 132: In one example, the instructions or software include machine code that is directly executed by the one or more processors or computers, such as machine code produced by a compiler;
¶ 134: components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents).

For claim 32, Hong also teaches/suggests “wherein the compiler is configured to assign less than all of the plurality of DLA cores to a respective subset of the DLA cores” as similarly applied in rejecting claim 5
(¶ 96: five of the eight cores are allocated to run layer 1 of model 1 and the remaining three of the eight cores are in an idle status;
Fig. 8 and ¶¶ 97–98: FIG. 8 illustrates another example of dynamically allocating resources;
¶ 87: determine whether a number of idle cores of an accelerator is greater than or equal to a number of required cores of a candidate kernel;
¶ 88: When it is determined that the number of idle cores of the accelerator is greater than or equal to the number of required cores of the candidate kernel in operation 610, in operation 620, the scheduler may extract a combination of cores mappable per candidate kernel).


15.	Regarding claims 36, 39–40, 42, 47, and 48, they are the corresponding computer program product claims reciting similar limitations of commensurate scope as the method of claims 1, 5–6, 8, 43, and 44, respectively. Therefore, they are rejected on the same basis as claims 1, 5–6, 8, 43, and 44 above, including the following rationale:

For claim 39, Hong also teaches/suggests “assign less than all of the plurality of DLA cores to a respective subset of the DLA cores” as similarly applied in rejecting claim 5
(¶ 96: five of the eight cores are allocated to run layer 1 of model 1 and the remaining three of the eight cores are in an idle status;
Fig. 8 and ¶¶ 97–98: FIG. 8 illustrates another example of dynamically allocating resources;
¶ 87: determine whether a number of idle cores of an accelerator is greater than or equal to a number of required cores of a candidate kernel;
¶ 88: When it is determined that the number of idle cores of the accelerator is greater than or equal to the number of required cores of the candidate kernel in operation 610, in operation 620, the scheduler may extract a combination of cores mappable per candidate kernel).


Response to Arguments
16.	Applicant’s arguments with respect to the claims have been considered but are moot because the arguments do not apply to any of the newly applied teachings or references being used in the current rejection.
	

In the Remarks, the Applicant also argues that Gadelrab does not teach determining to switch from execution of a first DLA model to execution of a second (different) DLA model because the models themselves do not appear to be different.

The Examiner disagrees.
As applied in the rejection, Gadelrab teaches in paragraph 103 that “the system executes an inference using the high accuracy representation of the machine learning model on high accuracy hardware. High accuracy hardware may be processors or processing cores that can perform floating point operations, such as cores designated as high performance cores in a heterogeneous multicore processor” and in paragraph 110, when switching or entering “the high efficiency mode using the selected high efficiency model for the execution of subsequent inferences … The selected high efficiency model may be executed on high efficiency processors that use less power than the high accuracy hardware discussed above. These processors may include, for example, processors designed as high efficiency cores in a heterogeneous multicore processor (e.g., “little” cores in a BIG.little architecture), integer processing modules on a processor, or the like.
	Accordingly, Gadelrab teaches or at least suggests that the two models are different in at least one aspect with respect to the different types or levels of mathematical operations called for or performed by the models (e.g. floating point operations or integer operations) and the required hardware on which the models execute.

	(see also Gadelrab, paragraph 47: “Because integer and floating point numbers are represented differently, mathematical operations may involve different levels of computational expense based on whether a mathematical operation is operating on integers or floating point numbers. For example, addition of two integers may be a trivial bitwise operation in which each bit is combined and overflow is carried to the next bit. However, floating point operations may be more complex, as multiple operations may be performed to combine the exponent and precision bits, and a multiplication operation may be performed based on the exponent and precision bits to generate a result” — teaching that floating point operations may require multiple operations or steps which are different than a “trivial bitwise operation” for operating on integers)



Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee J. Li can be reached on (571)272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BENJAMIN C WU/Primary Examiner, Art Unit 2195                                                                                                                                                                                                        
February 4, 2026
Read full office action
Prosecution Timeline

May 07, 2021
Application Filed
Sep 06, 2024
Non-Final Rejection — §103
Jan 06, 2025
Response Filed
Apr 23, 2025
Final Rejection — §103
Jun 25, 2025
Response after Non-Final Action
Jul 14, 2025
Request for Continued Examination
Jul 19, 2025
Response after Non-Final Action
Jul 24, 2025
Non-Final Rejection — §103
Oct 29, 2025
Response Filed
Feb 04, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/352,293
Patent 12602258
INSTANTIATING SOFTWARE DEFINED STORAGE NODES ON EDGE INFORMATION HANDLING SYSTEMS
2y 5m to grant Granted Apr 14, 2026
18/247,243
Patent 12585508
RECONSTRUCTING AND VERIFYING PROPRIETARY CLOUD BASED ON STATE TRANSITION
2y 5m to grant Granted Mar 24, 2026
17/817,109
Patent 12579006
SYSTEMS AND METHODS FOR UNIVERSAL AUTO-SCALING
2y 5m to grant Granted Mar 17, 2026
18/182,878
Patent 12572388
COMPUTING RESOURCE SCHEDULING BASEDON EXPECTED CYCLES
2y 5m to grant Granted Mar 10, 2026
17/587,663
Patent 12566646
Accessing Critical Resource in a Non-Uniform Memory Access (NUMA) System
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
87%
Grant Probability
99%
With Interview (+16.4%)
3y 0m
Median Time to Grant
High
PTA Risk
Based on 522 resolved cases by this examiner. Grant probability derived from career allow rate.