DETAILED ACTION
The Office Action is sent in response to Applicant’s Communication received on 12/17/2025 for application number 17/469,644. The Office hereby acknowledges receipt of the following and placed of record in file: Applicant’s Remarks and Amendments to claims and specification.
Examiner Notes the following: claims 4-5, 9-10, 14, and 17 have been amended.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Sanders et al. (NPL - " Two-tree algorithms for full bandwidth broadcast, reduction and scan "), hereinafter Sanders2, and in view of Matthews et al. (US 11,425,195 B1), hereinafter Matthews.
Regarding claim 9, Sanders2 discloses:
A method for performing an in-network prefix scan computation, comprising:
a dual binary tree topology to compute prefix scan aggregation operations for an array of input values as data packets traverse the network and outputting an array of prefix scan output values. [Figure 1, discloses a dual binary tree topology, see section: 3. Two pipelined binary trees instead of one; “The algorithms concurrently communicate over two binary trees which both span the entire network”, Abstract; “A scan computes the prefix sums…” section 4.3, Scan, discloses the prefix scan operation for an array of input values and outputting an array of prefix scan output values];
However, Sanders2 does not explicitly disclose embedding a dual binary tree topology in a network to compute prefix scan aggregation operations for an array of input values within the network as data packets traverse the network, the prefix scan aggregation operations being performed by collective engines in the network; and outputting an array of prefix scan output values.
In the analogous art of parallel in-networking computing, Matthews teaches embedding a topology in a physical network comprising a plurality of switches ["FIG. 6 illustrates but one example arrangement of compute planes... Any other suitable topology may be utilized for inter-node communication mechanisms 695, including more complex hierarchical topologies. Moreover, the topology within each compute plane 655 may vary-for instance, a ring topology might be used in one compute plane, while a full mesh topology might be used in another." Col. 19, Lines 50-62, teaches a networking using an topology where 695 is the connection between network compute nodes (compute-enabled switches)]; and further performing prefix scan calculations using collective engines ["A compute entity, orchestrator node, or other network entity may send compute instructions to a compute-enabled switch to specify reduction operations or other collective operations to perform on various vector data sets, chunk data sets, transactions, collections of containers, or other data. Specified collective operations may include, without limitation, aggregation, summation, product, maximum, minimum, broadcast, scatter, gather, scan, reduce-and-scan, barrier, and combinations thereof." Col. 32, Lines 25-34].
It would have been obvious to one of ordinary skill in the art, having the teachings of Sanders2 and Matthews before him before the effective filing date of the claimed invention to implement the topology disclosed by Sanders2 into the network of switches taught by Matthews, to allow for efficient scaling of operations into a network of switches [Matthews, Col. 5, Lines 3-21].
Regarding claim 10, Sanders2 disclose the invention substantially as claimed. See the discussion of claim 9 above. However, Sanders2 does not explicitly disclose wherein the network comprises a plurality of switches, further comprising performing prefix scan calculations using compute engines in the plurality of switches.
In the analogous art of parallel in-networking computing, Matthews teaches embedding a topology in a physical network comprising a plurality of switches ["FIG. 6 illustrates but one example arrangement of compute planes... Any other suitable topology may be utilized for inter-node communication mechanisms 695, including more complex hierarchical topologies. Moreover, the topology within each compute plane 655 may vary-for instance, a ring topology might be used in one compute plane, while a full mesh topology might be used in another." Col. 19, Lines 50-62, teaches a networking using an topology where 695 is the connection between network compute nodes (compute-enabled switches)]; and further comprising performing prefix scan calculations using compute engines in the plurality of switches ["A compute entity, orchestrator node, or other network entity may send compute instructions to a compute-enabled switch to specify reduction operations or other collective operations to perform on various vector data sets, chunk data sets, transactions, collections of containers, or other data. Specified collective operations may include, without limitation, aggregation, summation, product, maximum, minimum, broadcast, scatter, gather, scan, reduce-and-scan, barrier, and combinations thereof." Col. 32, Lines 25-34].
It would have been obvious to one of ordinary skill in the art, having the teachings of Sanders2 and Matthews before him before the effective filing date of the claimed invention to incorporate the physical network of switches as taught by Matthews into the method as disclosed by Sanders2, to allow for efficient scaling of operations into a network of switches [Matthews, Col. 5, Lines 3-21].
Regarding claim 11, Sanders2 disclose the invention substantially as claimed. See the discussion of claim 9 above.
Sanders2 discloses wherein an entirety of operations for computing the prefix scan are performed within the network. [“We present a new, simple algorithmic idea for the collective communication operations… scan (prefix sums). The algorithms concurrently communicate over two binary trees which both span the entire network” Abstract]
Matthews further teaches wherein an entirety of operations for computing the prefix scan are performed within the network ["Each network compute node is assigned to perform the collective operation(s), based on the local vectors, for a different a subset of the vector elements. Each network compute node returns a result chunk for the elements it processed back to each of the compute nodes, whereby each compute node receives the full result vector." Col. 5, Lines 16-21, where a compute node receives the full result vector from the network comprising of network compute nodes (Routers/compute-enabled switches)].
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Zaib (NPL - “Network on Chip Interface for Scalable Distributed Shared Memory Architectures”), and in view of Matthews and in further view of Sanders2.
Regarding claim 14, Zaib discloses:
A system [Figure 1.4, Page 7 "Network on Chip based DSM architecture"; Where Zaib discloses a tile/chiplet based Many-Core and Many-Router System, Page 43] comprising:
a network comprising a plurality of interconnected switches [NoC Routers, “Network on Chip is deployed as a distributed interconnect which connects different nodes. The network interface is a component which joins the System on Chip blocks within a tile to the NoC router. Figure 1.4 "Routers", Section 1.1.3, Page 6];
a plurality of cores, coupled to the network [Compute Tiles/Chiplets, Figure 1.4, Consists of Cores 1 to Core N and is connected to a router]; and
memory [Tile Local Memory, Figure 1.4], operatively coupled to the plurality of cores ["The on-chip memory is physically distributed among different nodes in the architecture as tile local memory." Section 1.1.3, Page 6].
Zaib further disclose a topology comprising a plurality of nodes ["However, researchers have proposed irregular Network on Chip topologies which bring benefit for applications-specific architectures [105]." Page 18]
However, Zaib does not explicitly disclose: switches having collective engines, wherein the system is configured to, insert, via a portion of the plurality of cores, an array of input values for which a prefix scan is to be performed, perform the prefix scan for the array of input values using collective engines within the network to generate a prefix scan result; and output values in the prefix scan result to a portion of the plurality of cores. Wherein a dual binary tree topology including a plurality of nodes comprising collective engines is embedded in the network to compute prefix scan operations at the plurality of nodes using the collective engines.
In the analogous art of parallel in-networking computing, Matthews teaches wherein the system is configured to, insert, via a portion of the plurality of cores, an array of input values for which a prefix scan is to be performed ["the network compute process 670a in plane 655a may generate an intermediate result chunk Aa, also referred to as a plane chunk Aa, from the vector data it receives from the compute nodes 610 in plane 655a" Col. 18, Lines 47-50, A plane contains compute nodes (Compute tiles/chiplets) that produces vector data to be inserted into the network compute process which is the process within a compute-enabled switch to generate a result],
switches having collective engines and perform the prefix scan for the array of input values using collective engines within the network to generate a prefix scan result ["A compute entity, orchestrator node, or other network entity may send compute instructions to a compute-enabled switch to specify reduction operations or other collective operations to perform on various vector data sets, chunk data sets, transactions, collections of containers, or other data. Specified collective operations may include, without limitation, aggregation, summation, product, maximum, minimum, broadcast, scatter, gather, scan, reduce-and-scan, barrier, and combinations thereof." Col. 32, Lines 25-34];
wherein an entirety of operations for computing the prefix scan are performed within the network ["Each network compute node is assigned to perform the collective operation(s), based on the local vectors, for a different a subset of the vector elements. Each network compute node returns a result chunk for the elements it processed back to each of the compute nodes, whereby each compute node receives the full result vector." Col. 5, Lines 16-21, where a compute node receives the full result vector from the network comprising of network compute nodes (Routers/compute-enabled switches)]; and
output values in the prefix scan result to a portion of the plurality of cores ["The network compute process 670a in compute plane 655b might generate a plane chunk Ab, and the network compute process 670a in compute plane 655c might generate a plane chunk Ac. The network compute processes 670a may then utilize inter-plane communication mechanism 695a to share plane chunks Aa, Ab, and Ac, so as to enable calculation of the result chunk A that is to be returned to each compute node 610 in their respective planes 655." Col 18 Lines 50-58, the switches (network compute nodes) compute the results and returns it to the compute nodes (compute tiles/chiplets)].
Matthews further teaches wherein a topology comprising a plurality of nodes is embedded in the network ["FIG. 6 illustrates but one example arrangement of compute planes. Other systems may include additional or fewer elements in varying arrangements. For instance, there may be additional compute planes, or additional compute nodes and/or network compute nodes per compute plane. Any other suitable topology may be utilized for inter-node communication mechanisms 695, including more complex hierarchical topologies. Moreover, the topology within each compute plane 655 may vary-for instance, a ring topology might be used in one compute plane, while a full mesh topology might be used in another." Col. 19, Lines 50-62]
to compute prefix scan operations at the plurality of nodes ["A first network node 650 may pass its plane chunk on to the corresponding network node 650 in the next plane 655. That network compute node 650 may reduce the plane chunk it receives with its own plane chunk. That network compute node 650 may then pass this intermediate result on to the next plane 655, which reduces it with its plane chunk, and so forth. A final result chunk will eventually be generated by the last network compute node 650 in the ring to process the vector data, and the final result chunk may then be propagated back through the ring." Col 19, Lines 37-47, Shows each plane calculating, sharing and working on a collective operation].
It would have been obvious to one of ordinary skill in the art, having the teachings of Zaib and Matthews before him before the effective filing date of the claimed invention to incorporate the compute-enabled switches as taught by Matthews into the system as disclosed by Zaib, to allow for efficient scaling of operations into a network of switches [Sanders, Col. 5, Lines 3-21].
However, Zaib and Matthews does not explicitly disclose a dual binary tree topology
In the analogous art of parallel network-based architectures, Sanders2 teaches a dual binary tree topology [Fig. 1, shows a two binary tree system, see claim 9]
It would have been obvious to one of ordinary skill in the art, having the teachings of Zaib, Matthews, and Sanders2 before him before the effective filing date of the claimed invention to incorporate the network architecture as taught by Sanders2 into the system as disclosed by Zaib, to allow for an improvement of bandwidth and runtimes [Sanders2, Abstract, sections 1, 3, 4.3, 6.3-6.4, and 7].
Claims 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Zaib, Matthews, Sanders2 and in further view of Zaman et al. (US 11,488,935 B1), hereinafter Zaman.
Regarding claim 15, Zaib, Matthews and Sanders2 disclose the invention substantially as claimed. See the discussion of claim 14 above.
Zaib further discloses:
a plurality of dies or sockets ["the single FPGA prototype has limited capabilities with respect to the size of the architecture. Therefore, a scalable prototyping approach is required to enable prototyping and evaluation of many-core architectures [6], [41], [42]. A professional multi-FPGA prototyping solution manufactured by the company Synopsys, the CHIPit Platinum Edition, is used to prototype relatively larger tiled architecture [130]." Page 98 where a single FPGA contains Core chiplets and switches], including,
a plurality of core tiles, a core tile including multiple cores; and a plurality of switches, wherein a core is interconnected with at least one switch ["The tiled architecture prototype is shown in the figure 5.1. The architecture consists of an incarnation of the Network on Chip. Each of the NoC routers connects to one of the tiles, developed for the InvasIC architecture. In our case, 4 tiles can be realized on the single FPGA prototype. Where each compute tile contains two LEON3 processing cores." Page 96, teaches a FPGA with a plurality of chiplets connected to NoC routers (switches)], and
wherein at least one switch in a die or socket is interconnected with at least one switch in another die or socket [Figure 5.2, Shows the interconnect of routers between FPGAs].
However, Zaib, Matthews and Sanders2 does not explicitly disclose a plurality of switch tiles, a switch tile including multiple switches.
In the analogous art of scalable modular architectures, Zaman teaches a plurality of switch tiles, a switch tile including multiple switches [network-on-package NoPK 200, “The NoPK 200 may include an internal routing network that includes a plurality of internal routers”, Col. 6, Lines 4-6, “FIG. 2A illustrates a simplified block diagram of a NoPK 200, according to some embodiments. The NoPK 200 maybe a digital architecture that can be implemented to fit any combination of chiplets. This digital architecture may be implemented on a dedicated die and may be considered its
own chiplet or package.” Col. 4, Lines 62-67, where the NoPK is a switch chiplet, containing multiple internal routers/switches; As such, NoPK can be scaled based on the Many-core/Many-router System requirements].
It would have been obvious to one of ordinary skill in the art, having the teachings of Zaib, Matthews, Sanders2 and Zaman before him before the effective filing date of the claimed invention to incorporate the switch chiplet architecture as taught by Zaman into the system as disclosed by Zaib, to allow for an improvement of yield and scalability of complex processing systems [Zaman, Col. 1, Lines 16-36].
Regarding claim 16, Zaib, Matthews, Sanders2 and Zaman disclose the invention substantially as claimed. See the discussion of claim 15 above.
Matthews further teaches wherein the plurality of dies or sockets are implemented in a node or subnode, and wherein the system comprises a plurality of nodes or subnodes ["The compute nodes and network compute nodes may form a compute plane, as in system 600. There may a plurality of other compute planes that separately perform flow 800 with respect to their own compute nodes and network compute nodes. In each iteration, the result chunk of block 840 may be treated as an intermediate result chunk, or plane chunk. Flow 800 may be expanded to include steps for sharing plane chunks between the network compute nodes of each plane prior to computing a final result chunk at each network compute node." Col 23, Lines 14-25, Shows a plane consisting of compute nodes (compute tiles/chiplets) and network compute nodes (compute-enabled switches), where each plane is acting in their own process as a node, and generating partial results to be shared between each plane.].
Regarding claim 17, Zaib, Matthews, Sanders2 and Zaman disclose the invention substantially as claimed. See the discussion of claim 15 above.
Zaib further discloses: wherein a switch [Figure 2.4, Page 16] comprises:
a plurality of input ports [Input Ports 1-N];
a plurality of output port [Output Ports 1-N];
Matthews further teaches:
a collective engine ["Network compute processes within a network compute node may be implemented by one or more compute subsystems." Col 34, Lines 60-62, ["Compute subsystem 1100 further comprises a compute engine 1170 configured to perform collective operations.” Col. 36, Lines 48-50],
configured to perform one or more prefix scan calculations on data received at an input port and output a result of a prefix scan calculation to an output port ["The compute controller 1110 binds inputs to the compute engine 1170 for each compute operation that the compute engine 1170 is instructed to perform." Col. 38, Lines 56-58, "Once the vector data and the associated computation instruction have been processed, the compute controller 1170 stores the result in a suitable memory (e.g., in a local staging memory or data buffer 1140) prior to being scheduled for transmission to a network interface." Col. 39, Lines 12-16].
Claim 21 are rejected under 35 U.S.C. 103 as being unpatentable over Zaib, Matthews, Sanders2 and in further view of Barsness et al. (US 8,171,047 B2), hereinafter Barsness.
Regarding claim 21, Zaib, Matthews and Sanders2 disclose the invention substantially as claimed. See the discussion of claim 14 above.
However, Zaib, Matthews and Sanders2 does not explicitly disclose wherein outputting values in the prefix scan result to a portion of the plurality of cores comprises switches directly writing prefix scan result output values into memory operatively coupled to the portion of the plurality of cores.
In the analogous art of massively parallel computer systems, Barsness teaches wherein outputting values in the prefix scan result to a portion of the plurality of cores comprises switches directly writing prefix scan result output values into memory operatively coupled to the portion of the plurality of cores ["The Global Combining Network Adapter 432 allows the parallel computer to perform collective operations on the compute nodes of a parallel computer system arranged in a binary tree. The collective operations use the contribution register 453, the ALU 446 and the results register 455. The contribution register 453 and the results register 455 can be used to hold a portion of a larger operand held in the RAM 412...For example, to perform an all reduce OR operation on the compute node shown in FIG. 4, the contents of a contribution buffer 452 in the RAM 412 is compared with inputs from the children nodes on the links 442 and the result is loaded into the results buffer 454." Col. 8, Lines 9-30, teaches that the switch (Network Adapter) is writing the result from its own ALU (compute engine) to the RAM connected to the processors].
It would have been obvious to one of ordinary skill in the art, having the teachings of Zaib, Matthews, Sanders2 and Barsness before him before the effective filing date of the claimed invention to incorporate the switch system as taught by Barness into the system as disclosed by Zaib, to allow for efficient use of resources by offloading operations to the network [Barsness, Col. 2, Lines 29-36].
Allowable Subject Matter
Claims 1-8 allowed.
Claims 12-13 and 19-20 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: For claims 12-13 and 19-20, See Prior Office Action mailed 10/01/2025
Response to Arguments
Applicant’s arguments, see page 8-15, filed 12/17/2025, with respect to rejections under 35 U.S.C 112 have been fully considered and are persuasive. The rejection under 35 U.S.C 112 of claims 14-17 and 19-21 in the Office Action mailed 10/01/2025 has been withdrawn.
Applicant’s arguments, see page 15-17, filed 09/02/2025, with respect to rejections under 35 U.S.C 102 have been fully considered and are persuasive. The rejection under 35 U.S.C 112 of claims 14-17 and 19-21 in the Office Action mailed 10/01/2025 has been withdrawn. Therefore, the rejection has been withdrawn. However, upon further consideration, a new grounds of rejection under 35 U.S.C. 103 is made. See rejections above.
Applicant's arguments, see page 17-20, filed 09/02/2025, with respect to rejections under 35 U.S.C 103 have been fully considered but they are not persuasive. The applicant asserts that Sanders2 and Matthews does not teach or suggest all the claim limitations as in claim 9 (and claim 14). However, this is not directed to the rejection as made or the combination as made, as the rejection for claim 10 wherein Matthews teaching the compute/collective engines. See rejections above.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Kenny K. Bui whose telephone number is (571)270-0604. The examiner can normally be reached 8:00 am to 3:00 pm on Monday, 8:00 am to 4:00 pm on Tuesday to Friday ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew T Caldwell can be reached at (571)272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KENNY K. BUI/Patent Examiner, Art Unit 2182 (571)270-0604
/ANDREW CALDWELL/Supervisory Patent Examiner, Art Unit 2182