Last updated: May 29, 2026
Application No. 18/657,735
EDGE DEVICE WITH BUILT-IN COMPILER FOR NEURAL NETWORK MODELS

Non-Final OA §102
Filed
May 07, 2024
Priority
Apr 08, 2024 — RE 10-2024-0047283
Examiner
DARWISH, AMIR ELSAYED
Art Unit
2199
Tech Center
2100 — Computer Architecture & Software
Assignee
Deepx Co. Ltd.
OA Round
2 (Non-Final)
Interview Optional

— +66.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 67% grant rate with +66.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 6 resolved cases, 2023–2026
Examiner Intelligence

DARWISH, AMIR ELSAYED View full profile →
Grants 67% — above average
Career Allowance Rate
4 granted / 6 resolved
+11.7% vs TC avg
Strong +67% interview lift
Without
With
+66.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
25 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
7.5%
-32.5% vs TC avg
§103
85.1%
+45.1% vs TC avg
§102
7.5%
-32.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 6 resolved cases
Office Action

§102
DETAILED ACTION
Claims 1-22 are presented for examination.
Claims 1-12, 14-19 and 21-22 have been amended.
This office action is in response to the amendment submitted on 23-Apr-2026.

 Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed Application No. KR10-2024-0047283, filed on 04/08/2024.

Response to Arguments – 35 USC 102

Applicant’s arguments with respect to the 102 rejections have been considered, but are not persuasive.

The applicant on pg. 11 argues: Vemuri therefore does not disclose a compilation process in which a framework- independent model is converted into a separate and distinct hardware-independent code. Instead, the XGraph is the sole representation produced by the parser, and that single representation is itself consumed by the hardware-independent and hardware-dependent optimizers without any intervening conversion to a separate hardware-independent code. 

	The examiner respectfully disagrees.  The applicant is improperly reading the ‘separate and distinct’ limitations into the claim language.  It doesn’t exist in the claim language nor is it supported by the specification.  Please see [0276] of the instant application’s specification.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  Vemuri clearly outlines a stepped process for converting framework dependent code to framework independent code and subsequently to hardware independent code.  See: Col 9, Ln 1-22, “After allocating buffer handles for the neural network, at block 414 the compiler 114 optimizes the network graph using hardware-independent optimizations and hardware dependent optimizations. Optimization of the network graph can improve the efficiency of data passing through the neural network. Table 1 provided some types of optimizations performed by the compiler 114 to the generated network graph. FIGS. 5-12 also illustrate various example optimizations performed by the compiler 114 on the generated network graph. In some embodiments, the compiler 114 performs hardware independent optimizations on the network graph before performing hardware dependent optimizations.  Also, Col 5, ln 60-67, "Below is a table providing a list of OpCodes supported by the compiler 114. These opcodes correspond to various operations performed by layers of the DNN. In some embodiments, the opcodes correspond to operations resulting from an optimization by the hardware independent optimizer 212 “  Also see lines 26-59.
Additionally, one of ordinary skill in the art, would understand that the Xgraph before the hardware independent optimization is different from the Xgraph after the hardware independent optimization.  The Xgraph after being transformed by the hardware independent optimizer constitutes a hardware-independent code representation.

The applicant additionally, argues on pg. 12 that Vemuri do[es] not provide an enabling disclosure of a single integrated circuit that includes both a CPU and an NPU as recited in claim 1.  

The applicant is reminded of MPEP 2121 III, A prior art reference provides an enabling disclosure and thus anticipates a claimed invention if the reference describes the claimed invention in sufficient detail to enable a person of ordinary skill in the art to carry out the claimed invention; "proof of efficacy is not required for a prior art reference to be enabling for purposes of anticipation." Impax Labs. Inc. v. Aventis Pharm. Inc., 468 F.3d 1366, 1383, 81 USPQ2d 1001, 1013 (Fed. Cir. 2006) (citing Rasmusson v. SmithKline Beecham Corp., 413 F.3d 1318, 1326, 75 USPQ2d 1297, 1302 (Fed. Cir. 2005)). See also MPEP § 2122.
Additionally, per the MPEP 1701, Every patent is presumed to be valid. See 35 U.S.C. 282, first sentence. Public policy demands that every employee of the United States Patent and Trademark Office (USPTO) refuse to express to any person any opinion as to the validity or invalidity of, or the patentability or unpatentability of any claim in any U.S. patent or the expiration date of any patent, except to the extent necessary to carry out:
(A) an examination of a non-reissue patent application where determination of the expiration date of a patent is necessary to conduct examination of the non-reissue patent application,
(B) an examination of a reissue application of the patent,
(C) a supplemental examination proceeding or reexamination proceeding to reexamine the patent,
(D) an interference or derivation proceeding involving the patent,
(E) a patent term adjustment or extension under 35 U.S.C. 154  and/or 35 U.S.C. 156  where determination of the expiration date of a patent is necessary to determine the adjustment or extension,
(F) a notification that a patent has expired for failure to pay maintenance fee,
(G) a consideration of a request under the regulations (e.g., a petition) wherein determination of patent term is necessary or arises as an ancillary matter, or
(H) an inter partes or post-grant review of the patent.
The question of validity or invalidity is otherwise exclusively a matter to be determined by a court. Likewise, the question of enforceability or unenforceability is exclusively a matter to be determined by a court. Members of the patent examining corps are cautioned to be especially wary of any inquiry from any person outside the USPTO, including an employee of another U.S. government agency, the answer to which might indicate that a particular patent should not have issued. No USPTO employee may pursue a bounty offered by a private sector source for identifying prior art. The acceptance of payments from outside sources for prior art search activities may subject the employee to administrative disciplinary action.
Office employees may provide factual information regarding the calculation of patent term in general (i.e., a design patent term is 15 years-from-grant if the underlying design patent application was filed on or after May 13, 2015, and 14 years-from-grant if the design application was filed earlier). However, office employees should refuse to provide a determination or express an opinion addressing any patent owner or public inquiries as to a specific patent's expiration date, except as provided above in items A-H. A number of factors may affect calculation of a patent term expiration date, both pre- and post-issuance, that may create difficulty in accurately calculating the term of a patent. In the event of any inquiries, the USPTO has provided a downloadable patent term calculator as a resource to help the public estimate the expiration date of a patent at www.uspto.gov/patent /laws-and-regulations/ patent-term-calculator. See also MPEP § 2701.
When a field of search for an invention is requested, examiners should routinely inquire whether the invention has been patented in the United States. If the invention has been patented, no field of search should be suggested.
Employees of the USPTO, particularly patent examiners who examined an application which matured into a patent or a reissued patent or who conducted a reexamination proceeding, should not discuss or answer inquiries from any person outside the USPTO as to whether or not a certain reference or other particular evidence was considered during the examination or proceeding and whether a claim would have been allowed over that reference or other evidence had it been considered during the examination or proceeding. Likewise, employees are cautioned against answering any inquiry concerning any entry in the patent or reexamination file, including the extent of the field of search and any entry relating thereto. The record of the file of a patent or reexamination proceeding must speak for itself.
Practitioners shall not make improper inquiries of members of the patent examining corps. Inquiries from members of the public relating to the matters discussed above must out of necessity be refused and such refusal should not be considered discourteous or an expression of opinion as to validity, patentability or enforceability.
The definitions set forth in 37 CFR 104.1  and the exceptions in 37 CFR 104.21  are applicable to this section.

Examiner’s Note (EN)
  The prior art rejections below cite particular paragraphs, columns, and/or line  numbers in the references for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.



Claim(s) 1-22 are rejected under 35 U.S.C. 102(a)(1) and (a)(2) as being anticipated by Vemuri et al. (US10789402B1).
Regarding Claim 1, Vemuri teaches an integrated circuit comprising: a neural processing unit (NPU) comprising a plurality of processing elements (PEs), each of the PEs comprising a multiplier-accumulator circuit configured to perform multiply-accumulate operations (Col 3, ln 27-66, "One type of programmable IC that may work for processing and accelerating data passing through the layers of DNNs are FPGAs, which have many lookup arrays, available on-chip storage, and digital signal processing units. Using these FPGA components, an exemplary software design to take in a neural network and configure the programmable IC to execute the DNN is described herein. While the present disclosure discusses a software design to configure a neural network, the present disclosure is not limited to neural networks or deep neural networks and can include other types of machine learning frameworks. … In one embodiment, the programmable IC 120 includes programmable logic 122, a DPE array 130 having multiple DPEs 1321-132N, memory 140, and control logic 150. In one embodiment, the control logic 150 configures the programmable logic 122, and the programmable logic uses run-time parameters from the control logic 150 to control the DPE array 130. For example, using a received bitstream that contains configuration data, control logic 150 can configure the programmable logic 122 (which can include a plurality of configurable logic blocks) with run-time parameters, and the programmable logic 122 controls the DPE array 130 that has any number of DPEs (132 1-132 N). For example, the programmable logic 122 can include look up tables, function generators, registers, multiplexers, and the like.  In one embodiment, the programmable IC includes a DPE array 130 having any number of DPEs, and each DPE comprises specialized circuitry to connect an array of neural network units (NNU) (not illustrated). In one embodiment, the NNUs of the DPEs comprise non-programmable logic i.e., are hardened specialized processing elements, and comprise hardware elements including, but not limited to, program memories, an instruction fetch/decode unit, fixed-point vector units, floating-point vector units, arithmetic logic units (ALUs), and multiply accumulators (MAC)").
a central processing unit (CPU) coupled to the NPU; (Col 4, ln 3-63, “FIG. 2 is a block diagram 200 of the compiler 114 and the HAL 116 to be used with a hardware-software interface 118 to communicate with the programmable IC 120. As mentioned with FIG. 1, the host computer 102 includes a compiler 114 and a HAL 116 for use with a DNN inference accelerator (also referred herein as a programmable IC). In one embodiment, the compiler 114 exports an application program interface (API) to the host computer 102. This exported API takes in a network description of a DNN in various framework specific formats (e.g., deploy.prototxt of the caffe framework) and generates an intermediate hardware-dependent representation of the network. The HAL 116 takes this intermediate representation of the network and programs the hardware for execution using the hardware-software interface 118”)

    PNG
    media_image1.png
    876
    597
    media_image1.png
    Greyscale

one or more memory circuits coupled to the NPU and the CPU, the one or more memory circuits storing instructions, when executed by the CPU, cause the CPU to (Col 3, ln 8-27, "Embodiments herein describe a compiler and hardware-abstraction-layer architecture for a programmable integrated circuit (IC). The complexity of mapping and porting a neural network to the programmable IC is abstracted by exporting a set of application programming interfaces (APIs). A software developer with minimal know how on hardware design can attach their network description of the neural network to the API and map/port their neural networks to FPGA for acceleration. The API takes the network description of the neural network in a high level abstraction. The compiler generates a network graph and a corresponding execution sequence vector based on the network description and optimally allocates buffer handles for each of the layers in the network graph. The hardware abstraction layer, then, takes the network graph, the corresponding execution sequence vector, and the handles allocated by the compiler, sets up the hardware runtime parameters, and schedules the commands in the network graph and corresponding execution sequence vector to respective hardware blocks on a programmable IC").
compile a first neural network model of a first machine learning framework incompatible with the NPU into first machine code executable by the NPU (Col 7, ln 44-50, “Operations 400 begin, at 402, with the compiler 114 receiving a network description of a neural network. In one embodiment, a user provides the network description of the neural network to an API, and the API in turn transmits the network description to the compiler 114 on the host computer 102. In some embodiments, the network description uses framework specific formats (e.g., caffe, TensorFlow).” Col 3, ln 17-27, "The compiler generates a network graph and a corresponding execution sequence vector based on the network description and optimally allocates buffer handles for each of the layers in the network graph. The hardware abstraction layer, then, takes the network graph, the corresponding execution sequence vector, and the handles allocated by the compiler, sets up the hardware runtime parameters, and schedules the commands in the network graph and corresponding execution sequence vector to respective hardware blocks on a programmable IC.").
by at least converting the first neural network model into a framework-independent model (Col 4, ln 48-52, "The front-end parser 202 takes the network description in framework specific formats and generates a framework independent network graph"  Also please refer to Fig. 2 and Fig. 4A Col 7, ln 39-43, “FIG. 4A illustrates example operations performed by a compiler 114 and a HAL 116 to apply a DNN such as the network graph 300 of FIG. 3 to a programmable IC 120 for execution, according to embodiments of the present disclosure”).
converting the framework-independent model into a hardware-independent code (Fig. 4B, Col 9, Ln 1-22, “After allocating buffer handles for the neural network, at block 414 the compiler 114 optimizes the network graph using hardware-independent optimizations and hardware dependent optimizations. Optimization of the network graph can improve the efficiency of data passing through the neural network. Table 1 provided some types of optimizations performed by the compiler 114 to the generated network graph. FIGS. 5-12 also illustrate various example optimizations performed by the compiler 114 on the generated network graph. In some embodiments, the compiler 114 performs hardware independent optimizations on the network graph before performing hardware dependent optimizations.  In such embodiments, if the compiler 114 performs hardware dependent optimizations before hardware independent optimizations, the compiler 114 may have to replay some hardware dependent optimizations in order to achieve the same resulting network graph or the optimized network graph may produce different output data compared to output data from a network graph optimized using hardware independent optimizations first. In some embodiments, the compiler 114 can perform any number of optimizations on the network graph to increase efficiency.”  and Col 4, ln 52-54, "The backend 210 refines this framework-independent and hardware-agnostic network graph into a hardware-dependent graph"  Col 5, ln 17-19, “The backend 210 comprises a hardware independent optimizer 212, a hardware dependent optimizer 214”).

according to first mapping information representing mapping of elements of the first machine learning framework to functions or operations executable on at least one of the NPU or the CPU (Col 3, ln 17-27 and Col 4-5, ln 65-5 , “Accordingly, the parser 202 takes models trained using various deep learning network frameworks 206 like caffe or TensorFlow and converts them to a network graph structure. In one embodiment, the network graph structure is an XGraph. In one embodiment, the graph structure converted by the parser 202 is a directed acyclic graph with heterogeneous nodes which encode information about various network layers and their connectivity. An example of a directed acyclic graph is presented in FIG. 3. In one embodiment, the backend 210 of the compiler 114 works on the network graph structure (generated by the parser 202) and performs operations on the network graph structure to generate an execution sequence vector. The execution sequence vector comprises a sequential queue of the layers of the network graph structure. Details about the execution sequence vector are provided below. The backend 210 comprises a hardware independent optimizer 212, a hardware dependent optimizer 214, a job queue scheduler 216 and an IO memory optimizer 218. Each of these components in the backend 210 works to perform operations on the network graph structure and generate an execution sequence vector to pass onto the HAL 116.”)

store the first machine code (Col 4, ln 54-62, "In one embodiment, the HAL 116 takes the hardware-dependent graph from the compiler 114 and sets up the hardware runtime parameters of the programmable IC 120, allocates the buffers needed by the programmable IC hardware for processing the network, and schedules the nodes in the hardware-dependent graph into respective hardware execution queues. The command scheduler 226 of the HAL 116 then invokes the programmable IC through the hardware-software interface 118."  Col 7, ln 20-24 ,"In one embodiment, the HAL 116 also comprises a command scheduler 226 that efficiently dispatches commands in the execution sequence vector to the programmable IC for processing. The command scheduler is further detailed with regards to FIG. 18"; see also col. 7, ln 44-61 and Col 18, ln 3-19 , "The command scheduler 226 uses the layer classifier 1804 to segregate the commands in the execution sequence vector 1802 based on the DPE to be used for processing the command. In some embodiments, the command scheduler 226 maintains a separate command queue 228 1-228 N for each DPE 132 1-132 N of the programmable IC 120. Once the commands of the execution sequence vector 1802 are separated based on layer type, the dispatcher 1806 then pops commands from the queues, checks for any dependencies on the command, and if the dependencies are cleared for a command, the scheduler dispatches the command to the respective DPEs 132 1-132 N asynchronously and receives a corresponding response from the respective DPE upon completion of the command. Because each DPE has its own command queue 228 1-228 N for dispatch, multiple DPEs can be active simultaneously.").
send the first machine code to the NPU for execution (Col 16, ln 7-22, "As mentioned previously, the HAL 116 receives an execution sequence vector from the compiler 114, and the execution sequence vector passes to the programmable IC setup component 222, the buffer manager 224, and to the command scheduler 226. Of the components of the HAL 116, the buffer manager 224 handles both constant buffers and I/O buffers used for both hardware and software of the programmable IC 120. The buffer manager 224 allocates two kinds of buffers: constant buffers and I/O buffers. The constant buffers are read-only buffers for the programmable IC 120 and are used for trained parameters (e.g., weights for layers in the neural network to process input data). The I/O buffers are read-write buffers for the programmable IC 120 to store the intermediate outputs between layers/nodes and accordingly can be reused between layers/nodes of the neural network"; see also col. 7, ln 44-61).

Regarding claim 2, Vemuri teaches wherein the instructions, when executed by the CPU, cause the CPU to: compile a second neural network model of a second machine learning framework incompatible with the NPU into second machine code executable by the NPU, according to second mapping information representing mapping of elements of the second machine learning framework to functions or operations executable on at least one of the NPU or the CPU  (Col 7, ln 44-50, “Operations 400 begin, at 402, with the compiler 114 receiving a network description of a neural network. In one embodiment, a user provides the network description of the neural network to an API, and the API in turn transmits the network description to the compiler 114 on the host computer 102. In some embodiments, the network description uses framework specific formats (e.g., caffe, TensorFlow).” Col 3, ln 17-27, "The compiler generates a network graph and a corresponding execution sequence vector based on the network description and optimally allocates buffer handles for each of the layers in the network graph. The hardware abstraction layer, then, takes the network graph, the corresponding execution sequence vector, and the handles allocated by the compiler, sets up the hardware runtime parameters, and schedules the commands in the network graph and corresponding execution sequence vector to respective hardware blocks on a programmable IC."  Col 4-5, ln 64-5, "In one embodiment, the parser 202 provides an interface to various deep learning network frameworks 206 with an API, like an API exported by the compiler 114. The API takes inputs in the same format as the deep learning frameworks do. Accordingly, the parser 202 takes models trained using various deep learning network frameworks 206 like caffe or TensorFlow and converts them to a network graph structure. In one embodiment, the network graph structure is an XGraph. In one embodiment, the graph structure converted by the parser 202 is a directed acyclic graph with heterogeneous nodes which encode information about various network layers and their connectivity. An example of a directed acyclic graph is presented in FIG. 3. In one embodiment, the backend 210 of the compiler 114 works on the network graph structure (generated by the parser 202) and performs operations on the network graph structure to generate an execution sequence vector. The execution sequence vector comprises a sequential queue of the layers of the network graph structure. Details about the execution sequence vector are provided below. The backend 210 comprises a hardware independent optimizer 212, a hardware dependent optimizer 214, a job queue scheduler 216 and an IO memory optimizer 218. Each of these components in the backend 210 works to perform operations on the network graph structure and generate an execution sequence vector to pass onto the HAL 116." EN: Caffe and Tensorflow are two separate frameworks denoting a first and second framework; see also col. 18, line 64 – col. 19, line 17).

store the second machine code  (Col 4, ln 54-62, "In one embodiment, the HAL 116 takes the hardware-dependent graph from the compiler 114 and sets up the hardware runtime parameters of the programmable IC 120, allocates the buffers needed by the programmable IC hardware for processing the network, and schedules the nodes in the hardware-dependent graph into respective hardware execution queues. The command scheduler 226 of the HAL 116 then invokes the programmable IC through the hardware-software interface 118."  Col 7, ln 20-24 ,"In one embodiment, the HAL 116 also comprises a command scheduler 226 that efficiently dispatches commands in the execution sequence vector to the programmable IC for processing. The command scheduler is further detailed with regards to FIG. 18" and Col 18, ln 3-19 , "The command scheduler 226 uses the layer classifier 1804 to segregate the commands in the execution sequence vector 1802 based on the DPE to be used for processing the command. In some embodiments, the command scheduler 226 maintains a separate command queue 228 1-228 N for each DPE 132 1-132 N of the programmable IC 120. Once the commands of the execution sequence vector 1802 are separated based on layer type, the dispatcher 1806 then pops commands from the queues, checks for any dependencies on the command, and if the dependencies are cleared for a command, the scheduler dispatches the command to the respective DPEs 132 1-132 N asynchronously and receives a corresponding response from the respective DPE upon completion of the command. Because each DPE has its own command queue 228 1-228 N for dispatch, multiple DPEs can be active simultaneously."; see also col. 18, line 64 – col. 19, line 17).

send the second machine code to the NPU for execution (Col 16, ln 7-22, "As mentioned previously, the HAL 116 receives an execution sequence vector from the compiler 114, and the execution sequence vector passes to the programmable IC setup component 222, the buffer manager 224, and to the command scheduler 226. Of the components of the HAL 116, the buffer manager 224 handles both constant buffers and I/O buffers used for both hardware and software of the programmable IC 120. The buffer manager 224 allocates two kinds of buffers: constant buffers and I/O buffers. The constant buffers are read-only buffers for the programmable IC 120 and are used for trained parameters (e.g., weights for layers in the neural network to process input data). The I/O buffers are read-write buffers for the programmable IC 120 to store the intermediate outputs between layers/nodes and accordingly can be reused between layers/nodes of the neural network"; see also col. 18, line 64 – col. 19, line 17).

Regarding claim 3, Vemuri teaches wherein a configuration of the NPU includes at least one of:
an internal memory size of the NPU (Col 16, ln 25-39, "For the constant buffers, each layer of the network graph has its own set of constants data (e.g., weights, biases) and the buffer manager 224 loads the constant data into the constant buffers before invoking the programmable IC for inference. The buffer manager 224 allocates a pool of constant buffers and generates the layer offsets into these constant buffers. The hardware-setup block, described in further detail below, uses these layer offsets to populate the constant buffers with the constants data. The buffer manager 224 pre-allocates a pool of fixed-size buffers (e.g., 64 MB) based on the memory footprint of the constants (e.g., parameters, biases) used by the network. Each buffer is a contiguous block of memory and can host constants of multiple layers, but the constant buffers do not permit the constants data to straddle across multiple buffers").
a bitwidth of read or write operations associated with the one or more memory circuits;
a type, structure or speed of the one or more memory circuits (Col 3-4, ln 43-2, "In one embodiment, the programmable IC includes a DPE array 130 having any number of DPEs, and each DPE comprises specialized circuitry to connect an array of neural network units (NNU) (not illustrated). In one embodiment, the NNUs of the DPEs comprise non-programmable logic i.e., are hardened specialized processing elements, and comprise hardware elements including, but not limited to, program memories, an instruction fetch/decode unit, fixed-point vector units, floating-point vector units, arithmetic logic units (ALUs), and multiply accumulators (MAC). The detailed circuitry within the memory 140 can include any type of volatile or nonvolatile memory. In one embodiment, the memory 140 includes an array of memory elements" and Col 16, ln 40-42, "In one embodiment of FIG. 15, the buffer manager 224 allocates constant buffers 1502 of equal sizes in memory (such as DDR memory)").
types of number formats supported by the NPU (Col 3-4, ln 43-2, "In one embodiment, the programmable IC includes a DPE array 130 having any number of DPEs, and each DPE comprises specialized circuitry to connect an array of neural network units (NNU) (not illustrated). In one embodiment, the NNUs of the DPEs comprise non-programmable logic i.e., are hardened specialized processing elements, and comprise hardware elements including, but not limited to, program memories, an instruction fetch/decode unit, fixed-point vector units, floating-point vector units, arithmetic logic units (ALUs), and multiply accumulators (MAC). The detailed circuitry within the memory 140 can include any type of volatile or nonvolatile memory. In one embodiment, the memory 140 includes an array of memory elements").
a range of bitwidth supported for integer operations or floating-point operations;
an operating frequency of the NPU;
a number of the plurality of PEs (Col 3-4, ln 43-2, "In one embodiment, the programmable IC 120 includes programmable logic 122, a DPE array 130 having multiple DPEs 1321-132N, memory 140, and control logic 150. In one embodiment, the control logic 150 configures the programmable logic 122, and the programmable logic uses run-time parameters from the control logic 150 to control the DPE array 130. For example, using a received bitstream that contains configuration data, control logic 150 can configure the programmable logic 122 (which can include a plurality of configurable logic blocks) with run-time parameters, and the programmable logic 122 controls the DPE array 130 that has any number of DPEs (132 1-132 N). For example, the programmable logic 122 can include look up tables, function generators, registers, multiplexers, and the like.").
capability of special function unit circuits in the NPU (Col 5, ln 60-67, "Below is a table providing a list of OpCodes supported by the compiler 114. These opcodes correspond to various operations performed by layers of the DNN. In some embodiments, the opcodes correspond to operations resulting from an optimization by the hardware independent optimizer 212 or the hardware dependent optimizer 214. In some embodiments, the opcodes correspond to software operations"  Please see Table 1).
Regarding claim 4, Vemuri teaches wherein the instructions causing the CPU to compile the first neural network model into the first machine code cause the CPU to: 
convert the hardware-independent code into a hardware-dependent code (Col 9, Ln 1-22, “After allocating buffer handles for the neural network, at block 414 the compiler 114 optimizes the network graph using hardware-independent optimizations and hardware dependent optimizations. Optimization of the network graph can improve the efficiency of data passing through the neural network. Table 1 provided some types of optimizations performed by the compiler 114 to the generated network graph. FIGS. 5-12 also illustrate various example optimizations performed by the compiler 114 on the generated network graph. In some embodiments, the compiler 114 performs hardware independent optimizations on the network graph before performing hardware dependent optimizations.  In such embodiments, if the compiler 114 performs hardware dependent optimizations before hardware independent optimizations, the compiler 114 may have to replay some hardware dependent optimizations in order to achieve the same resulting network graph or the optimized network graph may produce different output data compared to output data from a network graph optimized using hardware independent optimizations first. In some embodiments, the compiler 114 can perform any number of optimizations on the network graph to increase efficiency.” Col 4, ln 52-54,  "The backend 210 refines this framework-independent and hardware-agnostic network graph into a hardware-dependent graph"  Col 5, ln 17-19, “The backend 210 comprises a hardware independent optimizer 212, a hardware dependent optimizer 214”  Col 5, ln 9-25, "In one embodiment, the backend 210 of the compiler 114 works on the network graph structure (generated by the parser 202) and performs operations on the network graph structure to generate an execution sequence vector. The execution sequence vector comprises a sequential queue of the layers of the network graph structure. Details about the execution sequence vector are provided below. The backend 210 comprises a hardware independent optimizer 212, a hardware dependent optimizer 214, a job queue scheduler 216 and an IO memory optimizer 218. Each of these components in the backend 210 works to perform operations on the network graph structure and generate an execution sequence vector to pass onto the HAL 116" and Col 4, ln 33-47, “FIG. 2 is a block diagram 200 of the compiler 114 and the HAL 116 to be used with a hardware-software interface 118 to communicate with the programmable IC 120. As mentioned with FIG. 1, the host computer 102 includes a compiler 114 and a HAL 116 for use with a DNN inference accelerator (also referred herein as a programmable IC). In one embodiment, the compiler 114 exports an application program interface (API) to the host computer 102. This exported API takes in a network description of a DNN in various framework specific formats (e.g., deploy.prototxt of the caffe framework) and generates an intermediate hardware-dependent representation of the network. The HAL 116 takes this intermediate representation of the network and programs the hardware for execution using the hardware-software interface 118.  In one embodiment, the compiler 114 has two components: the front-end parser 202 and the backend 210. The front-end parser 202 takes the network description in framework specific formats and generates a framework independent network graph. The backend 210 refines this framework-independent and hardware-agnostic network graph into a hardware-dependent graph. In one embodiment, the HAL 116 takes the hardware-dependent graph from the compiler 114 and sets up the hardware runtime parameters of the programmable IC 120, allocates the buffers needed by the programmable IC hardware for processing the network, and schedules the nodes in the hardware-dependent graph into respective hardware execution queues. The command scheduler 226 of the HAL 116 then invokes the programmable IC through the hardware-software interface 118”).
convert the hardware-dependent code into the first machine code (Col 8, ln 15-54, “At 408, operations 400 continue with the HAL 116 configuring the IC based on the execution sequence vector. In some embodiments, configuring the IC based on the execution sequence vector includes the HAL 116 calibrating a plurality of hardware runtime parameters of the programmable IC based on the execution sequence vector. Once the compiler 114 generates the execution sequence vector, the compiler 114 passes the execution sequence vector to the HAL 116 for further processing. In some embodiment, once the HAL 116 receives the execution sequence vector, the HAL 116 begins to setup the hardware components of the programmable IC 120, and in some embodiments, setup includes calibrating the hardware runtime parameters. In some embodiments, the HAL 116 allocates buffers on the programmable IC 120 required by both hardware components and software components based on the execution sequence vector. In such embodiments, the execution sequence vector also includes information about buffer nodes of the network graph. In one embodiment, the HAL 116 keeps track of a list of pointers for allocated buffers corresponding to the buffer nodes of the network graph.  In some embodiments, configuring the IC based on the execution sequence vector includes the HAL 116 scheduling the plurality of commands of the execution sequence vector for a plurality of components of the programmable IC. Because the commands in the execution sequence vector correspond to the operations of the layer nodes of the network graph, the HAL 116 schedules when to transmit the commands of the execution sequence vector to the programmable IC 120. When the programmable IC 120 receives the commands from the HAL 116 via the hardware-software interface 118, the programmable IC begins executing the operation corresponding to the command. The operation is based on the layer nodes of the network graph. In one embodiment, the plurality of components of the programmable IC 120 include the programmable logic 122 with the plurality of controllers, the DPE array 130, the memory 140, and the control logic 150. Further details about the HAL 116 scheduling the commands of the execution sequence vector are provided with respect to FIG. 18-20").
Regarding claim 5, Vemuri teaches wherein the instructions to compile the first neural network model cause the CPU to perform at least one of optimizing or verification of the first machine code(Col 5, ln 26-59, "To improve the efficiency of the DNN, the compiler 114 can perform several layers of optimizations and layer fusion operations onto the network graph structure. Consequently, the network graph structure has updated layers and buffers and is structured with the HAL 116. In one embodiment, the hardware independent optimizer 212 performs optimizations (also referred herein as optimization rules) of the DNN that do not require or impact the hardware aspects of the DNN. Some of these optimizations performed by the hardware independent optimizer 212 include: parallel 1×1 convolutions fuse optimizations, software fuse optimizations, dropout optimizations, reshape optimizations, flatten optimizations, concatenation layer optimizations, custom layer optimizations, and prior box optimizations. Further, in one embodiment, the hardware dependent optimizer 214 performs optimizations of the DNN that do use or impact the hardware aspects of the DNN. Some of these optimizations performed by the hardware dependent optimizer 214 include: convolution+ReLU optimizations, hardware fusion optimization, CReLU optimizations, ElementWise (sometimes shortened to “Eltwise”) Addition optimizations, ReLU optimizations, 3D separable convolution optimizations, and deconvolution optimizations.  In one embodiment, the optimizations performed by the hardware independent optimizer 212 include removal of layers used in the training phase of the DNN. With training layer removal optimization, the backend 210 of the compiler 114, specifically the hardware independent optimizer 212, identifies all the layers in the network graph which are not used during the interference phase and removes them" and Col 17, ln 32-45,  "In one embodiment, the buffer handle is a string notation to represent input and output buffers of each layer and indicates blocks of memory dedicated to corresponding buffers. The buffer manager 224 allocates a continuous block of memory for each unique buffer handle, and maintains a dictionary of buffer handles and the corresponding pointers to the contiguous block of memory. The buffer manager 224 parses through the execution sequence vector, and for each layer, checks the input and output handle occurrence in the dictionary. If the dictionary returns a miss on the check, the buffer manager 224 allocates a contiguous block of memory for the handle and registers the address of the block allocated along the handle with the dictionary").
Regarding claim 6, Vemuri teaches wherein the instructions to optimize the first machine code cause the CPU to perform at least one of: perform pruning (Col 5, ln 53-59, "In one embodiment, the optimizations performed by the hardware independent optimizer 212 include removal of layers used in the training phase of the DNN. With training layer removal optimization, the backend 210 of the compiler 114, specifically the hardware independent optimizer 212, identifies all the layers in the network graph which are not used during the interference phase and removes them").
perform quantization (Col 7, ln 11-19, "The programmable IC setup component 222 converts the weights and parameters of the DNN to fixed point format and loads them into the constant buffers managed by the buffer manager 224 using the pointers and offsets in the execution sequence vector. In one embodiment, the programmable IC setup component 222 uses a prescribed layer, optimized for hardware performance, for the data in the constant buffers managed by the buffer manager 224").
perform retraining, 
perform compression (Col 10, ln 45-60, "One type of optimization performed by the hardware independent optimizer 212 is a parallel [1×1] convolution fusion optimization, which is illustrated in FIGS. 5A and 5B. With a parallel convolution fusion optimization, the backend 210 of the compiler 114, specifically the hardware independent optimizer 212, identifies network topology regions of the network graphs where multiple convolution layers take the same input buffer and write to different output buffers and merge these convolution layers into one layer. The merged convolution layer attaches to an output buffer with a size enough to hold the output of all the convolution layers merged. Also, the hardware independent optimizer 212 of the backend 210 registers the offsets of each convolution layer's output into the new output buffer for processing of downstream layers in the network graph" and Col 11, ln 12-28, "FIGS. 6A-B depict another example optimization of a network graph performed by the compiler 114 to generate the execution sequence vector, according to embodiments of the present disclosure. In one embodiment, FIGS. 6A and 6B illustrate an example pre-execute fusion optimization. With a pre-execute fusion optimization, the backend 210 of the compiler 114, specifically the hardware independent optimizer 212, looks up for a pattern of convolution layers followed by batch-norm layers followed by scale layers, and fuses the three layers into one convolution layer, by merging the parameters and weights of the input convolution, batch-norm, and scale layers. This optimization gets rids of the buffers connecting the layers, and therefore reduces the buffer requirements to execute the network. In some embodiments, the pre-execute fusion optimization applies to convolution layers, batch-norm layers, and scale layers of any order, combination, or arrangement").
perform an artificial intelligence (AI)-based optimization algorithm, 
or perform knowledge distillation.
Regarding claim 7, Vemuri teaches wherein the instructions to compile the first neural network model cause the CPU to analyze parameter information of each layer of the first neural network model (Col 7, ln 11-15, "The programmable IC setup component 222 converts the weights and parameters of the DNN to fixed point format and loads them into the constant buffers managed by the buffer manager 224 using the pointers and offsets in the execution sequence vector" and Col 11, ln 12-23, "FIGS. 6A and 6B illustrate an example pre-execute fusion optimization. With a pre-execute fusion optimization, the backend 210 of the compiler 114, specifically the hardware independent optimizer 212, looks up for a pattern of convolution layers followed by batch-norm layers followed by scale layers, and fuses the three layers into one convolution layer, by merging the parameters and weights of the input convolution, batch-norm, and scale layers." Col 16, ln 16-39, "The constant buffers are read-only buffers for the programmable IC 120 and are used for trained parameters (e.g., weights for layers in the neural network to process input data). The I/O buffers are read-write buffers for the programmable IC 120 to store the intermediate outputs between layers/nodes and accordingly can be reused between layers/nodes of the neural network … For the constant buffers, each layer of the network graph has its own set of constants data (e.g., weights, biases) and the buffer manager 224 loads the constant data into the constant buffers before invoking the programmable IC for inference. The buffer manager 224 allocates a pool of constant buffers and generates the layer offsets into these constant buffers. The hardware-setup block, described in further detail below, uses these layer offsets to populate the constant buffers with the constants data. The buffer manager 224 pre-allocates a pool of fixed-size buffers (e.g., 64 MB) based on the memory footprint of the constants (e.g., parameters, biases) used by the network. Each buffer is a contiguous block of memory and can host constants of multiple layers, but the constant buffers do not permit the constants data to straddle across multiple buffers").
Regarding claim 8, Vemuri teaches wherein the instructions to compile the first neural network model cause the CPU to analyze sizes of weight parameters  (Col 7, ln 11-15, "The programmable IC setup component 222 converts the weights and parameters of the DNN to fixed point format and loads them into the constant buffers managed by the buffer manager 224 using the pointers and offsets in the execution sequence vector." and Col 16, ln 11-29, "Of the components of the HAL 116, the buffer manager 224 handles both constant buffers and I/O buffers used for both hardware and software of the programmable IC 120. The buffer manager 224 allocates two kinds of buffers: constant buffers and I/O buffers. The constant buffers are read-only buffers for the programmable IC 120 and are used for trained parameters (e.g., weights for layers in the neural network to process input data). The I/O buffers are read-write buffers for the programmable IC 120 to store the intermediate outputs between layers/nodes and accordingly can be reused between layers/nodes of the neural network. The following discussion further describes the differences between constant buffers and the I/O buffers, especially as to the data organization of each type of buffer.  For the constant buffers, each layer of the network graph has its own set of constants data (e.g., weights, biases) and the buffer manager 224 loads the constant data into the constant buffers before invoking the programmable IC for inference").
and feature map parameters of each layer in the first neural network model (Col 15, ln 36-42, "In one embodiment, the backend 210 of the compiler 114 further comprises a IO memory optimizer 218, and this IO memory optimizer 218 allocates a set of buffer handles along with the sizes, which can be used for storing I/O (also referred herein as activations) between layers while reusing the buffers between layers of the network graph."  Col 4, ln 26-32, "In one embodiment, the memory 106 includes an array of memory elements. In one embodiment, the memory 106 stores input image data, such as input feature maps, and activation outputs from various and/or previous layers of the DNN. Details about the compiler 114, the HAL 116, and the hardware-software interface 118 are provided below with regards with FIG. 2." Col 15, ln 42-54, "In one embodiment, a buffer handle is a string notation to represent input and output buffers of each layer and indicates blocks of memory dedicated to corresponding buffers. The backend 210 loads the buffer handles and corresponding sizes onto the execution sequence vector from the job queue scheduler 216. In one embodiment, the backend 210 may make design choices such as: (1) the backend 210 can initialize the buffer sizes of all the buffer handles to the size of the largest buffer for IO activations, and can attach all the buffer handles to the same size; and (2) the backend 210 cannot reuse buffer handles attached to layers optimized for software execution (e.g., layers that are not hardware-accelerated)."  EN: The instant application’s specification defines feature map as activation parameter, and node data [0042] and [0068] respectively).
Regarding claim 9, Vemuri teaches wherein the instructions to compile the first neural network model cause the CPU to analyze connectivity between layers in the first neural network model (Col 4-5, ln 63-51, "In one embodiment, the parser 202 provides an interface to various deep learning network frameworks 206 with an API, like an API exported by the compiler 114. The API takes inputs in the same format as the deep learning frameworks do. Accordingly, the parser 202 takes models trained using various deep learning network frameworks 206 like caffe or TensorFlow and converts them to a network graph structure. In one embodiment, the network graph structure is an XGraph. In one embodiment, the graph structure converted by the parser 202 is a directed acyclic graph with heterogeneous nodes which encode information about various network layers and their connectivity. An example of a directed acyclic graph is presented in FIG. 3.  In one embodiment, the backend 210 of the compiler 114 works on the network graph structure (generated by the parser 202) and performs operations on the network graph structure to generate an execution sequence vector. The execution sequence vector comprises a sequential queue of the layers of the network graph structure. Details about the execution sequence vector are provided below. The backend 210 comprises a hardware independent optimizer 212, a hardware dependent optimizer 214, a job queue scheduler 216 and an IO memory optimizer 218. Each of these components in the backend 210 works to perform operations on the network graph structure and generate an execution sequence vector to pass onto the HAL 116.  To improve the efficiency of the DNN, the compiler 114 can perform several layers of optimizations and layer fusion operations onto the network graph structure. Consequently, the network graph structure has updated layers and buffers and is structured with the HAL 116. In one embodiment, the hardware independent optimizer 212 performs optimizations (also referred herein as optimization rules) of the DNN that do not require or impact the hardware aspects of the DNN. Some of these optimizations performed by the hardware independent optimizer 212 include: parallel 1×1 convolutions fuse optimizations, software fuse optimizations, dropout optimizations, reshape optimizations, flatten optimizations, concatenation layer optimizations, custom layer optimizations, and prior box optimizations. Further, in one embodiment, the hardware dependent optimizer 214 performs optimizations of the DNN that do use or impact the hardware aspects of the DNN. Some of these optimizations performed by the hardware dependent optimizer 214 include: convolution+ReLU optimizations, hardware fusion optimization, CReLU optimizations, ElementWise (sometimes shortened to “Eltwise”) Addition optimizations, ReLU optimizations, 3D separable convolution optimizations, and deconvolution optimizations").

Regarding claim 10, Vemuri teaches a non-transitory computer readable storage medium storing instructions thereon, the instructions when executed by a central processing unit (CPU) cause the CPU to: (Col 1, ln 49-51, “Aspects of the present disclosure also provide apparatus, methods, processing systems, and computer readable mediums for performing the operations described above”).
The remaining limitations are similar to claim 1 and are rejected under the same rationale.
	Claim 11-18 are medium claims reciting limitations similar to claims 1,4, 2, and 5-9 respectively and are rejected under the same rationale.
Claim 19-22 are method claims reciting limitations similar to claims 1, 2, 5 and 4 respectively and are rejected under the same rationale.



Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Chen et al. (TVM: An Automated End-to-End Optimizing Compiler for Deep Learning): discloses the same framework for abstraction and layer by layer break down to the hardware level to provide a hardware indifferent API for various machine learning frameworks.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMIR DARWISH whose telephone number is (571)272-4779. The examiner can normally be reached 7:30-5:30 M-Thurs.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 571-272-3759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/A.E.D./Examiner, Art Unit 2199                                                                                                                                                                                                 
/LEWIS A BULLOCK  JR/Supervisory Patent Examiner, Art Unit 2199
Read full office action
Prosecution Timeline

May 07, 2024
Application Filed
Jan 15, 2026
Non-Final Rejection mailed — §102
Apr 16, 2026
Applicant Interview (Telephonic)
Apr 16, 2026
Examiner Interview Summary
Apr 23, 2026
Response Filed
May 10, 2026
Final Rejection (signed) — §102 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/454,663
Patent 12475391
METHOD AND SYSTEM FOR EVALUATION OF SYSTEM FAULTS AND FAILURES OF A GREEN ENERGY WELL SYSTEM USING PHYSICS AND MACHINE LEARNING MODELS
4y 0m to grant Granted Nov 18, 2025
Study what changed to get past this examiner. Based on 1 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
67%
Grant Probability
99%
With Interview (+66.7%)
4y 0m (~1y 11m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 6 resolved cases by this examiner. Grant probability derived from career allowance rate.