DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. This action is made final. Claims 1- 9 are pending. Claim 1 is an independent claim. Response to Arguments Applicant’s arguments, dated 3/5/2026, regarding the 35 U.S.C. 101 rejections of the previous office action have been fully considered and are persuasive. The 101 rejections have been withdrawn. Applicant’s arguments, dated 3/5/2026, regarding the 35 U.S.C. 103 rejections of the previous office action have been fully considered but were not persuasive. Due to the amendments, the scope of the claims has changed and new grounds of rejection have been applied – see the updated rejections below. The objection to claim 6 has been withdrawn due to the amendments. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim (s) 1, 2, and 4-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. ( “ Work-in-Progress: Drama: A High Efficient Neural Network Accelerator on FPGA using Dynamic Reconfiguration ” , 2019), herein Yang , in view of Li et al. (US 20220374689 A1), herein Li . Regarding claim 1, Yang teaches: A method of using at least one field-programmable gate array (FPGA) for artificial intelligence (Al) inference software stack acceleration ( pg. 1, Abstract, In this paper, we propose a high efficient neural network accelerator on FPGA – implementation was done on an FPGA board as described on pg. 2, Section 3, ¶ 1, To demonstrate the efficiency of the Drama accelerator architecture, we implement a hardware prototype system on the Xilinx ZC706 FPGA board ) , comprising: ii. performing executing, by an embedded processor within the at least one FPGA, an Al inference software stack to perform layer-by-layer profiling of said neural network model (pg. 1, Section 2.1, ¶ 1, To design the optimal configuration, we profile the basic operators and structure for each layer and try all kinds of algorithms and parallelisms to figure out the best combination of FPGA resources include memory, bandwidth, computing resources and so on – and – pg. 2, Section 2.1, ¶ 2, Regarding the distinguished computing-intensive and memory-intensive features of the layer-based model in neural networks, it is essential to profile the model. The layers are clustered into different categories ) ; iii. identifying, based on said profiling, at least one compute-intensive layer type of said neural network model (pg. 1, Algorithm 1, layers are classified by speed, with consideration for the time it takes to reconfigure the hardware – the layers that result in reconfiguring the hardware can be interpreted as compute-intensive) ; and iv. implementing, in hardware logic of the FPGA, acceleration using at least one layer accelerator on at least one of said compute-intensive layer type (pg. 2, Section 2.1, ¶ 3, After the iteration, Ai contains all the layers that share the same configuration with X. Repeat these steps until all the layers have their own classes – also see: pg. 1, Algorithm 1 – and – pg. 2, Section 2.2, ¶ 1, The layer sequences will be offloaded to accelerators at run time, and the accelerators can be reconfigured to boost speedup ) . Yang fails to teach: i . performing quantization on at least one neural network model … However, in the same field of endeavor, Li teaches: i . performing quantization on at least one neural network model ( ¶ 59, The neural network accelerator 180 provides functionality that can be used to convert data represented in full precision floating-point formats in the neural network module 130 into quantized format values ) . Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to perform quantization on the neural network as disclosed by Li in the method disclosed by Yang to reduce the computation and memory burden of running the neural network model ( ¶ 89, reduce computational cost and/or memory usage for computing the output of a neural network) . Regarding claim 2, Yang further teaches: The method of using at least one FPGA for Al inference software stack acceleration as claimed in Claim 1, wherein said layer accelerator is a custom layer accelerator, a layer accelerator from at least one layer accelerators library or a combination thereof ( pg. 1, Section 1, ¶ 3, Based on the layer-based model, we provide a hardware template that is able to generate a specific hardware accelerator for each layer – i.e., custom layer accelerator ) . Regarding claim 4, Yang fails to teach: The method of using at least one FPGA for Al inference software stack acceleration as claimed in Claim 1, wherein said quantization is done post-training or via quantization aware training . However, in the same field of endeavor, Li teaches: wherein said quantization is done post-training or via quantization aware training ( ¶ 90, In some examples, quantization may be used… Using quantization during training of a neural network may be referred to as quantization aware training (QAT) whereas quantization after training of a neural network may be referred to as post-training quantization (PTQ) ) . Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to perform post-training quantization or quantization aware training as disclosed by Li in the method disclosed by Yang to reduce the computation and memory burden of running the neural network model ( ¶ 89, reduce computational cost and/or memory usage for computing the output of a neural network ) . Regarding claim 5 , Yang fails to teach: The method of using at least one FPGA for Al inference software stack acceleration as claimed in Claim 1, wherein said performing quantization is converting floating-point neural network model to full integer quantized neural network model . However, in the same field of endeavor, Li teaches: wherein said performing quantization is converting floating-point neural network model to full integer quantized neural network model ( ¶ 92, In a commonly used 8 bit quantization scheme, weight values are converted from 32 bit floating-point format to 8 bit integer format ). T herefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use an integer quantized model as disclosed by Li in the method disclosed by Yang to reduce memory and processing required for storing weights ( ¶ 89, reduces the memory and processing requirements for performing the computations of a neural network and storing the tensors of a neural network by reducing the number of bits required to store each value of a weight of the neural network, and the number of bits required to store the output of each neural network layer ) . Regarding claim 6 , Yang further teaches: The method of using at least one FPGA for Al inference software stack acceleration as claimed in Claim 1, wherein said layer type is a convolution layer, a depthwise convolution layer, a pooling layer, a fully connected layer or any other suitable layer in said neural network model ( pg. 1, Section 2.1, ¶ 1, To design the optimal configuration, we profile the basic operators and structure for each layer and try all kinds of algorithms and parallelisms to figure out the best combination of FPGA resources include memory, bandwidth, computing resources and so on. For instance, designing the configuration of convolution layers, we allocate more computing resources. However, when it comes to fully connected layers, bandwidth resources should be more ) . The term “ any other suitable layer ” is being interpreted as any layer found in convolutional neural networks or other neural networks. Claim(s) 3 , 7, 8, and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang in view of Li as applied to claim 2 above, and further in view of Yazdanbakhsh et al. (US 20230376664 A1), herein Yazdanbakhsh . Regarding claim 3, Yang in view of Li fails to teach: T he method of using at least one FPGA for Al inference software stack acceleration as claimed in Claim 2, further comprising the following after (iv): v. recording an Al inference's speed performance to be evaluated; vi. implementing said accelerated Al inference on the at least one FPGA if said Al inference's speed performance meets at least one application's requirement; or enhancing at least one custom layer accelerator, adding more custom layer acceleration, adjusting said layer accelerator's at least one parameter or a combination thereof before performing (ii) again if said Al inference's speed performing does not meet said application's requirement . However, in the same field of endeavor, Yazdanbakhsh teaches: v. recording an Al inference's speed performance to be evaluated ( ¶ 76, As another example, the pre-evaluation criteria 130 include one or more estimated performance criteria which rejects any candidate hardware architectures that have been preliminarily estimated to fall short of a satisfactory hardware performance on the particular machine learning task, e.g., in terms of the target runtime latency of a neural network configured to perform the particular machine learning task when deployed on the hardware accelerator ) ; vi. implementing said accelerated Al inference on the at least one FPGA if said Al inference's speed performance meets at least one application's requirement ( ¶ 67, the system 100 can effectively determine a hardware architecture for a hardware accelerator having an area (or power consumption) that is no greater than the target hardware area (or target power consumption) and on which a neural network can be deployed and configured to perform a particular machine learning task with an acceptable accuracy while having an acceptable latency, e.g., a latency that is approximately equal to or no greater than the target latency specified in the constraint data – and – ¶ 58, The system 100 also obtains neural network architecture data 108 that specifies a configuration or architecture of the neural network which, once the final hardware accelerator architecture 150 is determined, is to be deployed on a hardware accelerator having the determined final hardware accelerator architecture 150 so as to perform the particular machine learning task ) ; o r enhancing at least one custom layer accelerator, adding more custom layer acceleration, adjusting said layer accelerator's at least one parameter or a combination thereof before performing (ii) again if said Al inference's speed performing does not meet said application's requirement ( ¶ 9, determining whether the candidate hardware architecture satisfies pre-evaluation criteria, including… determining an estimated performance measure of the candidate hardware architecture on the particular machine learning task – and – ¶ 113, in response to a negative determination, the system bypasses using the one or more hardware simulators to evaluate the performance measure of the candidate hardware architecture on the particular machine learning task. In some cases, the process 300 can directly return to step 302 to generate and evaluate another candidate architecture – note that one of the candidate architecture variations discussed in Yazdanbakhsh is varying the dimensions of processing element (PE) arrays: ¶ 55, An example of a hardware accelerator architecture search space and the corresponding set of hardware parameters that define the search space is described below in Table 1, which illustrates microarchitecture parameters and their number of discrete values in the search space – also see the referenced Table 1 ) . Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to evaluate speed to determine whether to finalize the accelerator or perform further optimization as disclosed by Yazdanbakhsh in the method disclosed by Yang in view of Li to determine an improved accelerator that meets requirements ( ¶ 77, can select the candidate hardware accelerator architecture that has the best performance measures, best satisfies the various hardware design constraints specified in the constraint data 110, or both as the final architecture 150 of the hardware accelerator ) . Regarding claim 7, Yang (in view of Yazdanbakhsh ) further teaches : The method of using at least one FPGA for Al inference software stack acceleration as claimed in Claim 3, wherein said parameter is convolution accelerator (pg. 2, Section 3, ¶ 2, we have presented a high efficient CNN accelerator on FPGA using dynamic reconfiguration ) input parallelism, output parallelism or a combination thereof (pg. 1, Section 2.1, ¶ 1, To design the optimal configuration, we profile the basic operators and structure for each layer and try all kinds of algorithms and parallelisms to figure out the best combination of FPGA resources include memory, bandwidth, computing resources and so on – also recall in the claim 3 rejection, PE arrangements are discussed by Yazdanbakhsh – PEs perform parallel calculations ) . Regarding claim 8, Yang further teaches: The method of using at least one FPGA for Al inference software stack acceleration as claimed in Claim 3, wherein said application is edge Al, general Al inference application or any other suitable Al inference application ( pg. 1, Section 1, ¶ 1, Convolutional Neural Network (CNN) has been widely used in computer vision fields such as object detection applications. In order to boost the speed up with affordable cost, specialized hardware accelerators… have attracted many attentions of the research community. Among all these accelerators, FPGA stands out for its flexibility, short time-to-market, and energy efficiency ) . Regarding claim 9, Yang in view of Li fails to teach: The method of using at least one FPGA for Al inference software stack acceleration as claimed in Claim 3, wherein said Al inference's speed performance comprises of an overall Al inference's speed performance, layer-by-layer Al inference's speed performance or combination thereof. However, in the same field of endeavor, Yazdanbakhsh teaches: wherein said Al inference's speed performance comprises of an overall Al inference's speed performance, layer-by-layer Al inference's speed performance or combination thereof ( ¶ 76, As another example, the pre-evaluation criteria 130 include one or more estimated performance criteria which rejects any candidate hardware architectures that have been preliminarily estimated to fall short of a satisfactory hardware performance on the particular machine learning task, e.g., in terms of the target runtime latency of a neural network configured to perform the particular machine learning task when deployed on the hardware accelerator ) . Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to evaluate an overall speed to determine whether to finalize the accelerator or perform further optimization as disclosed by Yazdanbakhsh in the method disclosed by Yang in view of Li to determine an improved accelerator that meets requirements ( ¶ 77, can select the candidate hardware accelerator architecture that has the best performance measures, best satisfies the various hardware design constraints specified in the constraint data 110, or both as the final architecture 150 of the hardware accelerator ) . Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL . See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Enter examiner's name" \* MERGEFORMAT HARRISON CHAN YOUNG KIM whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)272-0713 . The examiner can normally be reached FILLIN "Work schedule?" \* MERGEFORMAT Monday - Thursday 9:00 am - 5:00 pm . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Cesar Paula can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT (571) 272-4128 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /HARRISON C KIM/ Examiner, Art Unit 2145 /CESAR B PAULA/ Supervisory Patent Examiner, Art Unit 2145