Last updated: April 18, 2026
Application No. 18/394,797
CONTROLLER FOR AN ARRAY OF DATA PROCESSING ENGINES

Non-Final OA §102§103§112§DP
Filed
Dec 22, 2023
Examiner
MILLER, DANIEL E
Art Unit
2194
Tech Center
2100 — Computer Architecture & Software
Assignee
Advanced Micro Devices, Inc.
OA Round
1 (Non-Final)
This examiner grants 41% of cases after interview

— +36.9% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 54 resolved cases, 2023–2026
Examiner Intelligence

MILLER, DANIEL E View full profile →
Grants 41% of resolved cases
Career Allow Rate
22 granted / 54 resolved
-14.3% vs TC avg
Strong +37% interview lift
Without
With
+36.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
10 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
22.3%
-17.7% vs TC avg
§103
38.7%
-1.3% vs TC avg
§102
15.7%
-24.3% vs TC avg
§112
19.6%
-20.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 54 resolved cases
Office Action

§102 §103 §112 §DP
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis ( i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. Double Patenting The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg , 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman , 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi , 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum , 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel , 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington , 418 F.2d 528, 163 USPQ 644 (CCPA 1969). A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13. The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA/25, or PTO/AIA/26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer . Claims 1, 5, 9, and 18 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of copending Application No. 18/394 , 859 (see claims filed 3/4/2026) in view of "CHARM: A Composable Heterogeneous Accelerator-Rich Microprocessor" (2012-Cong) . Claim s 11 and 13-17 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of copending Application No. 18/394 , 859 (see claims filed 3/4/2026) . With respect to claim 1, 18/394 , 859 teaches A system on a chip (SoC), comprising ([claim 1 line 1]) : at least one central processing unit (CPU) ([claim 1 line 2]) ; an accelerator comprising an array of data processing engines (DPEs) ([claim 1 lines 3-4]) ; and an interface communicatively coupling the CPU to the controller and the accelerator. Claim 1 of 18/394,859 does not teach receive a task from the CPU; and control data movement into and out of the array of DPEs in the accelerator to perform the task; and inform the CPU when the task is complete . However, 2012-Cong teaches a controller comprising circuitry configured to ( in FIG. 2 , this is the tile labeled "ABC", [page 380], which stands for accelerator block composer, see [Abstract] lines 4-5; FIG. 3, shows what the ABC is configured to perform, [page 318] ) : receive a task from the CPU ( in FIG. 3(A), the arrow from the "core" to the "ABC", [page 381]; see also the caption "A core sends a request for an LCA to the ABC;" , see also “The core sends a data flow graph (DFG) of the desired LCA to the ABC (Figure 3A), [page 382 col 1 paragraph 1 lines 5-6], where the graph is a graph of tasks) ; and control data movement into and out of the array of DPEs in the accelerator to perform the task (shown in FIG. 3(B), and FIG. 3(C), where the ABC allocates ABBs, [page 381]; the actual algorithm is discussed in the 3.2.1 ABC design section, "The ABC uses a two-tiered allocation policy to decide which ABBs to compose into a given LCA. First, the ABC will attempt to balance the concentration of memory-accessing ABBs across the entire system. The purpose of this is to limit contention in the DMA associated with each node. Second, the ABC will employ a simple greedy approach to select ABBs that are local to other ABBs they communicate with. This is done in order to minimize the cost of communication between ABBs.", [page 381 col 1 paragraph 5 lines 7-15]) ; and inform the CPU when the task is complete ( shown in FIG. 3(D), "The ABC signals completion to the core.", [page 381] ). It would have been obvious to one skilled in the art before the effective filing date to combine 18/394,859 with 2012-Cong because a teaching, suggestion, or motivation in the prior art would have led one skilled in the art to combine prior art teaching to arrive at the claimed invention. Claim 1 along with claim 7 of the ‘859 application discloses a system that teaches all of the claimed features except for how the controller is configured. 2012-Cong teaches: Running medical imaging benchmarks, our experimental results show an average speedup of 2.1X (best case 3.7X) compared to approaches that use LCAs together with a hardware resource manager. We also gain in terms of energy consumption (average 2.4X; best case 4.7X). (2012-Cong [Abstract] line s 14-18]). A person having skill in the art would have a reasonable expectation of successfully speeding up the system in the system of 18/394,859 by modifying the ‘859 Application with the steps performed by the controller shown in FIG. 3, (2012-Cong [page 381]) . Therefore, it would have been obvious to combine 18/394 , 859 with 2012-Cong to a person having ordinary skill in the art . With respect to claim 5 , ‘859 Application in view 2012-Cong teaches all of the limitations of claim 1, as noted above. Claim 5 of the ‘859 Application further teaches: wherein the interface is a second NoC , wherein the second NoC is larger than the NoC in the Al accelerator ([see claim 5]). With respect to claim 9, ‘859 Application in view 2012-Cong teaches all of the limitations of claim 1, as noted above. Claim 8 of the ‘859 Application further teaches: wherein each of the DPEs comprises a core, a memory module, and an interconnect, wherein the interconnects in the DPEs are interconnected so that the DPEs are able to transmit data between each other ([see claim 8]). With respect to claim 11, 18/394 , 859 teaches A method, comprising ([claim 10 line 1]) : receiving, from a CPU, an instruction at a controller to perform a hardware acceleration task using an accelerator, wherein the CPU, the controller, and accelerator are disposed on a same integrated circuit (IC) ([claim 1 0 lines 2-4] ; controlling, using the controller, data movement into and out of an array of DPEs in the accelerator to perform the hardware acceleration task ([claim 1 0 lines 5-10]) ; and informing the CPU that the hardware acceleration task is complete using the controller ([claim 1 0 lines 11-12]) . With respect to claim 13, ‘859 Application teaches all of the limitations of claim 11, as noted above. Claim 10 of the ‘859 Application further teaches: transmitting data generated by the DPEs when performing the hardware acceleration task to a NoC in the accelerator ([claim 10 ln 7-8]) ; performing, at an IOMMU in the accelerator, an address translation on the data received from the NoC ([claim 10 ln 9-10]) ; and transmitting the address translated data to the CPU ([claim 10 ln 11-12]) . With respect to claim 14, ‘859 Application teaches all of the limitations of claim 13, as noted above. Claim 11 of the ‘859 Application further teaches: performing the address translation comprises: translating virtual addresses used by the accelerator to physical addresses used to store the address translated data ([claim 11]) . With respect to claim 15, ‘859 Application teaches all of the limitations of claim 14, as noted above. Claim 12 of the ‘859 Application further teaches: wherein the virtual addresses are memory mapped virtual addresses, wherein the memory mapped virtual addresses are used to transmit the data from the DPEs, through the NoC , and to the IOMMU. ([claim 12]) . With respect to claim 16, ‘859 Application teaches all of the limitations of claim 13, as noted above. Claim 14 of the ‘859 Application further teaches: wherein the controller communicates with the CPU only through a second NoC , wherein the second NoC is larger than the NoC in the accelerator, wherein the second NoC also communicatively couples the CPU to the accelerator ([claim 14]) . With respect to claim 17, ‘859 Application teaches all of the limitations of claim 16, as noted above. Claim 17 of the ‘859 Application further teaches: wherein the controller communicates with the CPU only through a second NoC , wherein the second NoC is larger than the NoC in the accelerator, wherein the second NoC also communicatively couples the CPU to the accelerator ([claim 17]) . With respect to claim 18, ‘859 Application teaches A system, comprising ([claim 18 line 1] : an IC, comprising ([claim 18 line 2]) : at least one CPU ([claim 18 line 3]) , an accelerator comprising DPEs ([claim 18 lines 4-5]) a memory controller ([claim 18 line 10]) , and an interface communicatively coupling the CPU to the accelerator, the controller, and the memory controller ([claim 18 ln 11-12]) ; and at least one memory coupled to the memory controller in the IC ([claim 18 ln 13]) . Claim 18 of the ‘859 Application does not teach a controller configured to: receive a task from the CPU; control data movement into and out of the DPEs in the accelerator to perform the task; and inform the CPU when the task is complete . However, 2012-Cong teaches a controller comprising circuitry configured to ( in FIG. 2 , this is the tile labeled "ABC", [page 380], which stands for accelerator block composer, see [Abstract] lines 4-5; FIG. 3, shows what the ABC is configured to perform, [page 318] ) : receive a task from the CPU ( in FIG. 3(A), the arrow from the "core" to the "ABC", [page 381]; see also the caption "A core sends a request for an LCA to the ABC;" , see also “The core sends a data flow graph (DFG) of the desired LCA to the ABC (Figure 3A), [page 382 col 1 paragraph 1 lines 5-6], where the graph is a graph of tasks) ; and control data movement into and out of the array of DPEs in the accelerator to perform the task (shown in FIG. 3(B), and FIG. 3(C), where the ABC allocates ABBs, [page 381]; the actual algorithm is discussed in the 3.2.1 ABC design section, "The ABC uses a two-tiered allocation policy to decide which ABBs to compose into a given LCA. First, the ABC will attempt to balance the concentration of memory-accessing ABBs across the entire system. The purpose of this is to limit contention in the DMA associated with each node. Second, the ABC will employ a simple greedy approach to select ABBs that are local to other ABBs they communicate with. This is done in order to minimize the cost of communication between ABBs.", [page 381 col 1 paragraph 5 lines 7-15]) ; and inform the CPU when the task is complete ( shown in FIG. 3(D), "The ABC signals completion to the core.", [page 381] ). It would have been obvious to one skilled in the art before the effective filing date to combine 18/394,859 with 2012-Cong because a teaching, suggestion, or motivation in the prior art would have led one skilled in the art to combine prior art teaching to arrive at the claimed invention. Claim 1 along with claim 7 of the ‘859 application discloses a system that teaches all of the claimed features except for how the controller is configured. 2012-Cong teaches: Running medical imaging benchmarks, our experimental results show an average speedup of 2.1X (best case 3.7X) compared to approaches that use LCAs together with a hardware resource manager. We also gain in terms of energy consumption (average 2.4X; best case 4.7X). (2012-Cong [Abstract] lines 14-18]). A person having skill in the art would have a reasonable expectation of successfully speeding up the system in the system of 18/394,859 by modifying the ‘859 Application with the steps performed by the controller shown in FIG. 3, (2012-Cong [page 381]). Therefore, it would have been obvious to combine 18/394,859 with 2012-Cong to a person having ordinary skill in the art. This is a provisional nonstatutory double patenting rejection. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b ) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the appl icant regards as his invention. Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 5 recites the limitation "the Noc in the accelerator" in line 3, but this limitation is introduced in claim 2, and claim 5 does not depend on claim 2. There is insufficient antecedent basis for this limitation in the claim. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale , or otherwise available to the public before the effective filing date of the claimed invention. Claim(s) 1 , 5-12, and 17 -20 is/are rejected under 35 U.S.C. 102(a)(1) FILLIN "Insert either \“(a)(1)\” or \“(a)(2)\” or both. If paragraph (a)(2) of 35 U.S.C. 102 is applicable, use form paragraph 7.15.01.aia, 7.15.02.aia or 7.15.03.aia where applicable." \d "[ 2 ]" as being anticipated by "CHARM: A Composable Heterogeneous Accelerator-Rich Microprocessor" (2012-Cong) With respect to claim 1, Cong teaches A system on a chip (SoC), comprising ( see FIG. 2 , which shows Architecture of CHARM, [page 380], where CHARM stands for "Composable Heterogeneous Accelerator-Rich Microprocessor design", [Abstract] lines 1-2; and Table 5 shows the area for the main components of the chip, [page 382] ) : at least one central processing unit (CPU) ( in FIG. 2 , all of the tiles labeled "c" are cores, which are CPUs, [page 380] ) ; an accelerator comprising an array of data processing engines (DPEs) ( the accelerator is a " loosely coupled accelerator" (LCA), [page 379 col 1 paragraph 1 line 6], is built from "accelerator building blocks" (ABB), [page 379 col 2 paragraph 2 line 15]; and in FIG. 2 , all of the blocks labeled "I" are "ABB islands" that together make up the LCA, [page 380]; so the LCA is the accelerator and the ABBs are the DPEs ) ; a controller comprising circuitry configured to ( in FIG. 2 , this is the tile labeled "ABC", [page 380], which stands for accelerator block composer, see [Abstract] lines 4-5; FIG. 3, shows what the ABC is configured to perform, [page 318] ) : receive a task from the CPU ( in FIG. 3(A), the arrow from the "core" to the "ABC", [page 381]; see also the caption "A core sends a request for an LCA to the ABC;" , see also “The core sends a data flow graph (DFG) of the desired LCA to the ABC (Figure 3A), [page 382 col 1 paragraph 1 lines 5-6], where the graph is a graph of tasks ) ; and control data movement into and out of the array of DPEs in the accelerator to perform the task (shown in FIG. 3(B), and FIG. 3(C), where the ABC allocates ABBs, [page 381]; the actual algorithm is discussed in the 3.2.1 ABC design section, "The ABC uses a two-tiered allocation policy to decide which ABBs to compose into a given LCA. First, the ABC will attempt to balance the concentration of memory-accessing ABBs across the entire system. The purpose of this is to limit contention in the DMA associated with each node. Second, the ABC will employ a simple greedy approach to select ABBs that are local to other ABBs they communicate with. This is done in order to minimize the cost of communication between ABBs.", [page 381 col 1 paragraph 5 lines 7-15]) ; and inform the CPU when the task is complete ( shown in FIG. 3(D), "The ABC signals completion to the core.", [page 381] ) ; and an interface communicatively coupling the CPU to the controller and the accelerator ( Applicant uses the word "interface" to mean "bus" or "network" (see Specification Drawings reference character 125 in FIG. 1) ; the interface as in “network” here refers to the hardware infrastructure of NoC or network on chip, which refers to the components of the chip in FIG. 2 being connected by a network, see [page 380 col 2 paragraph 4 line 12]; in FIG. 2 , you can see the term " NoC interface" on the expanded view of the ABB island, this is in reference to this chip having an interface to the rest of the " NoC " or network on chip, [page 380]; the network/ NoC is further mentioned in passing in Table 2 and Table 5, [page 382] ; in FIG. 4, it shows the accelerator with two NoC interfaces, where one is connected to the core, [page 382]; and last but not least indirectly through FIG. 3, which describes how data is passed between the components, [page 381] and this information with the knowledge that the hardware is NoC , shows how everything is "communicatively coupled" ) . With respect to claim 5, 2012-Cong teaches all of the limitations of claim 1, as noted above. 2012-Cong further teaches wherein the controller communicates with the CPU only through the interface ( Applicant uses the word "interface" to mean "bus" or "network" (see Specification Drawings reference character 125 in FIG. 2 ) ; the interface as in “network” here refers to the hardware infrastructure of NoC or network on chip, which refers to the components of the chip in FIG. 2 being connected by a network, see [page 380 col 2 paragraph 4 line 12]; the network/ NoC is further mentioned in passing in Table 2 and Table 5, [page 382], and last but not least indirectly through FIG. 3, which describes how data is passed between the components, [page 381]) , wherein the interface is a second NoC , wherein the second NoC is larger than the NoC in the accelerator ( the details of the NOC in the accelerator are shown in FIG. 4 , which specifically refers to all of the lines showing the interconnections between the component parts inside the ABB, [page 382]; in FIG. 2 and FIG. 4 , you can see the term " NoC interface" on the expanded view of the ABB island, this is in reference to this chip having an interface to the rest of the " NoC " or network on chip, [page 380 ] , [page 382] ; the NoC in the accelerator is a subcomponent of the ABB island component, which makes it much smaller than the NoC of the entire chip ) . With respect to claim 6, 2012-Cong teaches all of the limitations of claim 1, as noted above. 2012-Cong further teaches wherein the controller does not contain any programmable logic (the ABC in FIG. 2 is a hardware component, [page 380], [page 382 col 2 paragraph 4 lines 1-3]; the authors specifically contrasts the ABC with software-based resource management : “We contrast with these software-based methodologies by advocating the use of hardware based accelerator management”, [page 384 col 1 paragraph 2 lines 13-15]). With respect to claim 7, 2012-Cong teaches all of the limitations of claim 1, as noted above. 2012-Cong further teaches wherein the controller comprises circuitry that is separate from the CPU (the ABC in FIG. 2 is a hardware component separate from the cores shown in FIG. 2 , [page 380], see also [page 382 col 2 paragraph 4 lines 1-3]) , wherein the controller is configured to execute software code or firmware for orchestrating the DPEs to perform the task (The ABC uses five components to manage its collection of ABBs: a Resource Table, a Composed LCA Table, a collection of Task Lists, a TLB, and a Data Flow Graph Interpreter , [page 381 col 1 paragraph 4 lines 4-7 ]; in this case the interpreter is software/instructions that orchestrate the ABBs/DPEs to perform a task: “Our software framework provides composition instructions in the form of a data flow graph. These graphs are fed as resource instantiation templates from the cores to the ABC. Each node in the data flow graph needs to be allocated to a particular ABB, and each ABB is only assigned to a single graph node, and a single LCA, at a time, [page 381 col 2 paragraph 3]). With respect to claim 8, 2012-Cong teaches all of the limitations of claim 1, as noted above. 2012-Cong further teaches wherein the array comprises memory tiles (the array is the array of tiles labeled “I”, and in the expanded view see “SPM” in FIG. 2 , which stands for “scratchpad memory”, [page 380]) and interface tiles (the array is the array of tiles labeled “I”, and in the expanded view see “DMA” in FIG. 2 , which stands for “direct memory access”, [page 380]) , wherein the controller is configured to control data movement into the memory tiles and interface tiles such that data flows from the interface tiles into the memory tiles, and then from the memory tiles into the DPEs (this lower level is taught in FIG. 4, where the NoC interface points to the DMA-C, then the DMC to the SPM bank, and finally from the SPM bank to the ASMs (adder/subtractor/multiplier) tiles, which are the DPEs, [page 382]) . With respect to claim 9, 2012-Cong teaches all of the limitations of claim 1, as noted above. 2012-Cong further teaches wherein each of the DPEs (in FIG. 2 , ABB islands labeled “I”, [page 380]) comprises a core (actual ABBs in the expanded view in FIG. 2 , [page 380]) , a memory module (SPM in FIG. 2 , [page 380]) , and an interconnect, wherein the interconnects in the DPEs are interconnected so that the DPEs are able to transmit data between each other ( see the NoC interface and DMA in FIG. 2 , [page 380], “The dedicated DMA engine in each ABB island is responsible for transferring data between the SPM and the L2 cache, and also between SPMs in different ABB islands, [page 381 col 1 paragraph 3 lines 1-3]) . With respect to claim 10, 2012-Cong teaches all of the limitations of claim 1, as noted above. 2012-Cong further teaches wherein the accelerator is at least one of an artificial intelligence (AI) accelerator, a cryptography accelerator, or a compression accelerator (An example of this could be a dedicated encryption unit that runs a particular encryption algorithm on data it receives from different cores that share access to the accelerator, [page 379 col 1 paragraph 1 lines 7-10]) . With respect to claim 11, 2012-Cong teaches A method, comprising ( FIG. 3, shows what the ABC is configured to perform, [page 318] ; to see the components see FIG. 2 , th e main component performing the method is labeled "ABC", [page 380], which stands for accelerator block composer, see [Abstract] lines 4-5 ) : receiving, from a CPU, an instruction at a controller to perform a hardware acceleration task using an accelerator ( in FIG. 3(A), the arrow from the "core" to the "ABC", [page 381]; see also the caption "A core sends a request for an LCA to the ABC;" , see also “The core sends a data flow graph (DFG) of the desired LCA to the ABC (Figure 3A), [page 382 col 1 paragraph 1 lines 5-6], where the graph is a graph of tasks) , wherein the CPU, the controller, and accelerator are disposed on a same integrated circuit (IC) (see FIG. 2 , where the CPU is a core, the controller is labeled ABC, and the accelerators are labeled “I” for ABB island, [page 380]; note this is an accelerator-rich CMP or chip multiprocessor, [Abstract] line 3) ; controlling, using the controller, data movement into and out of an array of DPEs in the accelerator to perform the hardware acceleration task (shown in FIG. 3(B), and FIG. 3(C), where the ABC allocates ABBs, [page 381]; the actual algorithm is discussed in the 3.2.1 ABC design section, "The ABC uses a two-tiered allocation policy to decide which ABBs to compose into a given LCA. First, the ABC will attempt to balance the concentration of memory-accessing ABBs across the entire system. The purpose of this is to limit contention in the DMA associated with each node. Second, the ABC will employ a simple greedy approach to select ABBs that are local to other ABBs they communicate with. This is done in order to minimize the cost of communication between ABBs.", [page 381 col 1 paragraph 5 lines 7-15]; note the ABB s are the DPEs that perform hardware acceleration ) ; and informing the CPU that the hardware acceleration task is complete using the controller ( shown in FIG. 3(D), "The ABC signals completion to the core.", [page 381] ) . With respect to claim 12, 2012-Cong teaches all of the limitations of claim 11, as noted above. 2012-Cong further teaches wherein controlling the array of DPEs comprises: configuring, using the controller, direct memory access (DMA) circuitry in the DPEs to complete the hardware acceleration task received from the CPU (specifically shown in FIG. 3(c), “ C) An LCA instance is allocated with consideration for balancing DMA utilization ”, [page 381]; see also FIG. 4 showing on the NOC interface is connected to the DMA-C first for completing hardware acceleration tasks, [page 382]) . With respect to claim 17, 2012-Cong teaches all of the limitations of claim 1, as noted above. 2012-Cong further teaches wherein each of the DPEs (in FIG. 2, ABB islands labeled “I”, [page 380]) comprises a core (actual ABBs in the expanded view in FIG. 2, [page 380]) , a memory module (SPM in FIG. 2, [page 380]) , and an interconnect, wherein the interconnects in the DPEs are interconnected so that the DPEs are able to transmit data between each other (see the NoC interface and DMA in FIG. 2, [page 380], “The dedicated DMA engine in each ABB island is responsible for transferring data between the SPM and the L2 cache, and also between SPMs in different ABB islands, [page 381 col 1 paragraph 3 lines 1-3]) when performing the hardware acceleration task (transferring between ABBs when performing acceleration tasks is called accelerator chaining, [page 381 col 1 paragraph 3 line 3]) . With respect to claim 18, 2012-Cong teaches A system, comprising ( see processor and operating system in table 2, [page 382]; running chip in FIG. 2 , which shows Architecture of CHARM, [page 380], where CHARM stands for "Composable Heterogeneous Accelerator-Rich Microprocessor design", [Abstract] lines 1-2; and Table 5 shows the area for the main components of the chip, [page 382] ) : an IC, comprising ( FIG. 2 , which shows Architecture of CHARM, [page 380], where CHARM stands for "Composable Heterogeneous Accelerator-Rich Microprocessor design", [Abstract] lines 1-2; and Table 5 shows the area for the main components of the chip, [page 382] ) : at least one CPU ( in FIG. 1, all of the tiles labeled "c" are cores, which are CPUs, [page 380] ) , an accelerator comprising DPEs ( the accelerator is a " loosely coupled accelerator" (LCA), [page 379 col 1 paragraph 1 line 6], is built from "accelerator building blocks" (ABB), [page 379 col 2 paragraph 2 line 15]; and in FIG. 1, all of the blocks labeled "I" are "ABB islands" that together make up the LCA, [page 380]; so the LCA is the accelerator and the ABBs are the DPEs ) , a controller configured to ( in FIG. 1, this is the tile labeled "ABC", [page 380], which stands for accelerator block composer, see [Abstract] lines 4-5; FIG. 3, shows what the ABC is configured to perform, [page 318] ) : receive a task from the CPU (( in FIG. 3(A), the arrow from the "core" to the "ABC", [page 381]; see also the caption "A core sends a request for an LCA to the ABC;" , see also “The core sends a data flow graph (DFG) of the desired LCA to the ABC (Figure 3A), [page 382 col 1 paragraph 1 lines 5-6], where the graph is a graph of tasks) ; control data movement into and out of the DPEs in the accelerator to perform the task (shown in FIG. 3(B), and FIG. 3(C), where the ABC allocates ABBs, [page 381]; the actual algorithm is discussed in the 3.2.1 ABC design section, "The ABC uses a two-tiered allocation policy to decide which ABBs to compose into a given LCA. First, the ABC will attempt to balance the concentration of memory-accessing ABBs across the entire system. The purpose of this is to limit contention in the DMA associated with each node. Second, the ABC will employ a simple greedy approach to select ABBs that are local to other ABBs they communicate with. This is done in order to minimize the cost of communication between ABBs.", [page 381 col 1 paragraph 5 lines 7-15]) ; and inform the CPU when the task is complete ( shown in FIG. 3(D), "The ABC signals completion to the core.", [page 381] ) ; a memory controller (see FIG. 2, tiles labeled “M”, [page 380]) , and an interface communicatively coupling the CPU to the accelerator, the controller, and the memory controller ( Applicant uses the word "interface" to mean "bus" or "network", which is slightly different than the interfaces to that network; the hardware infrastructure is a NoC or network on chip, which refers to the components of the chip in FIG. 1 being connected by a network, see [page 380 col 2 paragraph 4 line 12]; in FIG. 1, you can see the term " NoC interface" on the expanded view of the ABB island, this is in reference to this chip having an interface to the rest of the " NoC " or network on chip, [page 380]; the network/ NoC is further mentioned in passing in Table 2 and Table 5, [page 382], and last but not least FIG. 3, describes how data is passed between the components, and this information with the knowledge that the hardware is a NoC , shows how everything is "communicatively coupled" ) ; and at least one memory coupled to the memory controller in the IC (see FIG. 2, tiles labeled B, [page 380]). With respect to claim 19, 2012-Cong teaches all of the limitations of claim 18, as noted above. 2012-Cong further teaches wherein the controller communicates with the DPEs through a NoC in the accelerator (the details of the NOC in the accelerator are shown in FIG. 4 , which specifically refers to all of the lines showing the interconnections between the component parts inside the ABB, [page 382]) . With respect to claim 20, 2012-Cong teaches all of the limitations of claim 20, as noted above. 2012-Cong further teaches wherein the controller communicates with the CPU only through the interface ( NoC here refers to the hardware infrastructure of NoC or network on chip, which refers to the components of the chip in FIG. 2 being connected by a network, see [page 380 col 2 paragraph 4 line 12]; in FIG. 2 , you can see the term " NoC interface" on the expanded view of the ABB island, this is in reference to this chip having an interface to the rest of the " NoC " or network on chip, [page 380]; the network/ NoC is further mentioned in passing in Table 2 and Table 5, [page 382] ; in FIG. 4, it shows the accelerator with two NoC interfaces, where one is connected to the core, [page 382]; and last but not least indirectly through FIG. 3, which describes how data is passed between the components, [page 381] and this information with the knowledge that the hardware is NoC , shows how everything is "communicatively coupled" ) , wherein the interface is a second NoC , wherein the second NoC is larger than the NoC in the accelerator (the details of the NOC in the accelerator are shown in FIG. 4 , which specifically refers to all of the lines showing the interconnections between the component parts inside the ABB, [page 382]; in FIG. 2 and FIG. 4 , you can see the term " NoC interface" on the expanded view of the ABB island, this is in reference to this chip having an interface to the rest of the " NoC " or network on chip, [page 380 ], [page 382]; the NoC in the accelerator is a subcomponent of the ABB island component, which makes it much smaller than the NoC of the entire chip) . Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 2- 4 and 13-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over "CHARM: A Composable Heterogeneous Accelerator-Rich Microprocessor" (2012b-Cong) in view of “Supporting Address Translation for Accelerator-Centric Architectures” (2017-Cong) With respect to claim 2, Cong teaches all of the limitations of claim 1, as noted above. Cong further teaches wherein the accelerator further comprises: a network on chip ( NoC ) ( the details of the NOC in the accelerator are shown in FIG. 4 , which specifically refers to all of the lines showing the interconnections between the component parts inside the ABB, [page 382] ) . 2012-Cong does not teach and an Input-Output Memory Management Unit (IOMMU) comprising circuitry configured to perform a physical to virtual address translation, wherein the IOMMU is coupled to the array of DPEs via the NoC . However, 2017-Cong teaches and an Input-Output Memory Management Unit (IOMMU) comprising circuitry configured to (see IOMMU in FIG. 3, [page 39], but originally described in the context of FIG. 1, [page 37]) perform a physical to virtual address translation (Commercial CPUs and SoCs have introduced I/O memory management units (IOMMUs)... to allow loosely coupled devices to handle virtual addresses, as shown in Figure 1, [page 37 col 2 paragraph 3 line 4]-[page 38 col 1 paragraph 1 line 2] ) , wherein the IOMMU is coupled to the array of DPEs via the NoC ( in FIG. 3, [page 39], the array of DPEs are depicted as two “ Accels ”; the NoC is the interface to the DMA, “A memory interface such as a DMA (direct memory access) is often used to transfer data between the SPM and the memory system”, [page 39 col 1 paragraph 2 lines 8-10]; note “the NoC ” references the NoC interface previously taught by FIG. 2 in 2012-Cong, [page 380], which is the same architecture but with the IOMMU added ) . It would have been obvious to one skilled in the art before the effective filing date to combine 2012-Cong with 2017-Cong because a teaching, suggestion, or motivation in the prior art would have led one skilled in the art to combine prior art teaching to arrive at the claimed invention. 2012-Cong discloses a system and method that teaches all of the claimed features except for the IOMMU . 2017-Cong teaches why you need the IOMMU: A unified virtual address space between the host CPU cores and customized accelerators can largely improve the programmability, which necessitates hardware support for address translation. (2017-Cong [Abstract]). 2017-Cong then goes on to teach specialized mechanisms to make an IOMMU with an IOTLB inside it faster . A person having skill in the art would have a reasonable expectation of providing a unified virtual address space in the system and method of 2012-Cong by modifying 2012-Cong with the IOMMU hardware of 2017-Cong . Therefore, it would have been obvious to combine 2012-Cong with 2017-Cong to a person having ordinary skill in the art, and this claim is rejected under 35 U.S.C. 103. \ With respect to claim 3, 2012-Cong in view of 2017-Cong teaches all of the limitations of claim 2 , as noted above. 2012-Cong does not teach wherein the IOMMU is configured to translate virtual addresses used by the accelerator to physical addresses used to store data before transmitting the data from the accelerator to the interface. However, 2017-Cong teaches wherein the IOMMU is configured to translate virtual addresses used by the accelerator to physical addresses used to store data before transmitting the data from the accelerator to the interface (see IOMMU in FIG. 3, [page 39], but originally described in the context of FIG. 1, [page 37]; A key requirement of virtually addressed accelerators is the hardware support for virtual-to-physical address translation. Commercial CPUs and SoCs have introduced I/O memory management units (IOMMUs)... to allow loosely coupled devices to handle virtual addresses, as shown in Figure 1, [page 37 col 2 paragraph 3 line 2]-[page 38 col 1 paragraph 1 line 2]). It would have been obvious to one skilled in the art before the effective filing date to combine 2012-Cong with 2017-Cong because a teaching, suggestion, or motivation in the prior art would have led one skilled in the art to combine prior art teaching to arrive at the claimed invention. 2012-Cong discloses a system and method that teaches all of the claimed features except for the IOMMU . 2017-Cong teaches why you need the IOMMU: A unified virtual address space between the host CPU cores and customized accelerators can largely improve the programmability, which necessitates hardware support for address translation. (2017-Cong [Abstract]). 2017-Cong then goes on to teach specialized mechanisms to make an IOMMU with an IOTLB inside it faster . A person having skill in the art would have a reasonable expectation of providing a unified virtual address space in the system and method of 2012-Cong by modifying 2012-Cong with the IOMMU hardware of 2017-Cong . Therefore, it would have been obvious to combine 2012-Cong with 2017-Cong to a person having ordinary skill in the art, and this claim is rejected under 35 U.S.C. 103. With respect to claim 4, 2012- Cong in view of 2017-Cong teaches all of the limitations of claim 2, as noted above. 2012-Cong further teaches wherein the controller communicates with the array of DPEs through the NoC ( in FIG. 2 , this is the tile labeled "ABC", [page 380], which stands for accelerator block composer, see [Abstract] lines 4-5; FIG. 3, shows what the ABC is configured to perform, [page 318] ; in FIG. 2 , you can see the term " NoC interface" on the expanded view of the ABB island, this is in reference to this chip having an interface to the rest of the " NoC " or network on chip, [page 380]; the network/ NoC is further mentioned in passing in Table 2 and Table 5, [page 382], and last but not least indirectly through FIG. 3, which describes how data is passed between the components, [page 381] and this information with the knowledge that the hardware is NoC , shows how everything is "communicatively coupled" ) . With respect to claim 13, 2012-Cong teaches all of the limitations of claim 11, as noted above. 2012-Cong further teaches transmitting data generated by the DPEs when performing the hardware acceleration task to a NoC in the accelerato r (in FIG. 4, see the arrow going from the last ASM back to the SPM bank, and then being transferred from the SPM bank to the DMA-C, which is connected to the NoC interface, [page 382]; use this knowledge with the knowledge that “The dedicated DMA engine in each ABB island is responsible for transferring data between the SPM and the L2 cache, and also between SPMs in different ABB islands, [page 381 col 1 paragraph 3 lines 1-3]; the actual latencies of transferring data via the “network topology” are shown in the last row of table 2, [page 382]) . 2012-Cong does not teach performing, at an IOMMU in the accelerator, an address translation on the data received from the NoC ; and transmitting the address translated data to the CPU. However, 2017-Cong teaches performing, at an IOMMU in the accelerator, an address translation on the data received from the NoC (see IOMMU in FIG. 3, [page 39], but originally described in the context of FIG. 1, [page 37]; A key requirement of virtually addressed accelerators is the hardware support for virtual-to-physical address translation. Commercial CPUs and SoCs have introduced I/O memory management units (IOMMUs)... to allow loosely coupled devices to handle virtual addresses, as shown in Figure 1, [page 37 col 2 paragraph 3 line 2]-[page 38 col 1 paragraph 1 line 2]) ; and transmitting the address translated data to the CPU (unified virtual address space between the host CPU and the accelerator [page 37 col 2 paragraph 2 lines 4-5]; the “data” that is transmitted is a pointer to the result: Consequently, an offload process simply requires passing the virtual pointer to the shared data to/from the accelerator, [page 37 col 2 paragraph 2 lines 8-9]) . It would have been obvious to one skilled in the art before the effective filing date to combine 2012-Cong with 2017-Cong because a teaching, suggestion, or motivation in the prior art would have led one skilled in the art to combine prior art teaching to arrive at the claimed invention. 2012-Cong discloses a system and method that teaches all of the claimed features except for the IOMMU . 2017-Cong teaches why you need the IOMMU: A unified virtual address space between the host CPU cores and customized accelerators can largely improve the programmability, which necessitates hardware support for address translation. (2017-Cong [Abstract]). 2017-Cong then goes on to teach specialized mechanisms to make an IOMMU with an IOTLB inside it faster . A person having skill in the art would have a reasonable expectation of providing a unified virtual address space in the system and method of 2012-Cong by modifying 2012-Cong with the IOMMU hardware of 2017-Cong . Therefore, it would have been obvious to combine 2012-Cong with 2017-Cong to a person having ordinary skill in the art, and this claim is rejected under 35 U.S.C. 103. With respect to claim 14, 2012-Cong in view of 2017-Cong teaches all of the limitations of claim 13, as noted above. 2012-Cong does not teach performing the address translation comprises: translating virtual addresses used by the accelerator to physical addresses used to store the address translated data (see IOMMU in FIG. 3, [page 39], but originally described in the context of FIG. 1, [page 37]; A key requirement of virtually addressed accelerators is the hardware support for virtual-to-physical address translation. Commercial CPUs and SoCs have introduced I/O memory management units (IOMMUs)... to allow loosely coupled devices to handle virtual addresses, as shown in Figure 1, These IOMMUs have I/O translation lookaside buffers (IOTLBs) and logic to walk the page table, which can provide address translation support for customized accelerators, [page 37 col 2 paragraph 3 line 2]- [page 38 col 1 paragraph 1 line 5 ] ; see also the two-level TLB in FIG. 10, [page 43]; discussed in section IV.C A Shared Level-Two TLB, [page 42 col 2 paragraph 4]-[page 43 col 1 paragraph 1] ). With respect to claim 15, 2012-Cong in view of 2017-Cong teaches all of the limitations of claim 14, as noted above. 2012-Cong does not teach wherein the virtual addresses are memory mapped virtual addresses, wherein the memory mapped virtual addresses are used to transmit the data from the DPEs, through the NoC , and to the IOMMU. However, 2017-Cong teaches wherein the virtual addresses are memory mapped virtual addresses, wherein the memory mapped virtual addresses are used to transmit the data from the DPEs (see FIG. 3, each labeled Accel, [page 39[) , through the NoC (see Fig. 3 , labeled Interconnect, [page 39]) and to the IOMMU (see FIG. 3, labeled IOMMU, [page 39]; where the unified virtual address space between the host CPU and the accelerator, means the virtual addresses are “memory mapped”, [page 37 col 2 paragraph 2 lines 4-5]; the “data” that is transmitted is a pointer to the address that is located in a specific device: Consequently, an offload process simply requires passing the virtual pointer to the shared data to/from the accelerator, [page 37 col 2 paragraph 2 lines 8-9]) . It would have been obvious to one skilled in the art before the effective filing date to combine 2012-Cong with 2017-Cong because a teaching, suggestion, or motivation in the prior art would have led one skilled in the art to combine prior art teaching to arrive at the claimed invention. 2012-Cong discloses a system and method that teaches all of the claimed features except for the IOMMU . 2017-Cong teaches why you need the IOMMU: A unified virtual address space between the host CPU cores and customized accelerators can largely improve the programmability, which necessitates hardware support for address translation. (2017-Cong [Abstract]). 2017-Cong then goes on to teach specialized mechanisms to make an IOMMU with an IOTLB inside it faster . A person having skill in the art would have a reasonable expectation of providing a unified virtual address space in the system and method of 2012-Cong by modifying 2012-Cong with the IOMMU hardware of 2017-Cong . Therefore, it would have been obvious to combine 2012-Cong with 2017-Cong to a person having ordinary skill in the art, and this claim is rejected under 35 U.S.C. 103. With respect to claim 16, 2012-Cong in view of 2017-Cong teaches all of the limitations of claim 13, as noted above. 2012-Cong further teaches wherein the controller communicates with the CPU only through a second NoC ( NoC here refers to the hardware infrastructure of NoC or network on chip, which refers to the components of the chip in FIG. 2 being connected by a network, see [page 380 col 2 paragraph 4 line 12]; in FIG. 2 , you can see the term " NoC interface" on the expanded view of the ABB island, this is in reference to this chip having an interface to the rest of the " NoC " or network on chip, [page 380]; the network/ NoC is further mentioned in passing in Table 2 and Table 5, [page 382] ; in FIG. 4, it shows the accelerator with two NoC interfaces, where one is connected to the core, [page 382]; and last but not least indirectly through FIG. 3, which describes how data is passed between the components, [page 381] and this information with the knowledge that the hardware is NoC , shows how everything is "communicatively coupled" ) , wherein the second NoC is larger than the NoC in the accelerator ( in FIG. 2 and FIG. 4 , you can see the term " NoC interface" on the expanded view of the ABB island, this is in reference to this chip having an interface to the rest of the " NoC " or network on chip, [page 380 ] , [page 382] ; th e NoC in the accelerator is a subcomponent of the ABB island component, which makes it much smaller than the NoC of the entire chip) , wherein the second NoC also communicatively couples the CPU to the accelerator ( and last but not least indirectly through FIG. 3, which describes how data is passed between the components, [page 381] and this information with the knowledge that the hardware is NoC , shows how everything is "communicatively coupled" ) . Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 20190347125 A1 (SANKARAN) - Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements , [Abstract]. US 20090216958 A1 (Biles) - A data processing system in the form of an integrated circuit 2 includes a general purpose programmable processor 4 and a hardware accelerator 6. A shared memory management unit 10 provides memory management operations on behalf of both of the processor core 4 and the hardware accelerator 6. The processor 4 and the hard
Read full office action
Prosecution Timeline

Dec 22, 2023
Application Filed
Mar 30, 2026
Non-Final Rejection — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/691,096
Patent 12421143
COOPERATIVE OPTIMAL CONTROL METHOD AND SYSTEM FOR WASTEWATER TREATMENT PROCESS
2y 5m to grant Granted Sep 23, 2025
17/307,474
Patent 12406113
COMPUTER-AIDED ENGINEERING TOOLKIT FOR SIMULATED TESTING OF PRESSURE-CONTROLLING COMPONENT DESIGNS
2y 5m to grant Granted Sep 02, 2025
16/278,767
Patent 12204835
STORAGE MEDIUM WHICH STORES INSTRUCTIONS FOR A SIMULATION METHOD IN A SEMICONDUCTOR DESIGN PROCESS, SEMICONDUCTOR DESIGN SYSTEM THAT PERFORMS THE SIMULATION METHOD IN THE SEMICONDUCTOR DESIGN PROCESS, AND SIMULATION METHOD IN THE SEMICONDUCTOR DESIGN PROCESS
2y 5m to grant Granted Jan 21, 2025
18/056,857
Patent 12154663
METHOD OF IDENTIFYING PROPERTIES OF MOLECULES UNDER OPEN BOUNDARY CONDITIONS
2y 5m to grant Granted Nov 26, 2024
16/274,403
Patent 12118279
Lattice Boltzmann Based Solver for High Speed Flows
2y 5m to grant Granted Oct 15, 2024
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
41%
Grant Probability
78%
With Interview (+36.9%)
3y 8m
Median Time to Grant
Low
PTA Risk
Based on 54 resolved cases by this examiner. Grant probability derived from career allow rate.