Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 23 December 2025 has been entered.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-7, 11-17, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by US 20210320820 A1 to Ruan et al. (“Ruan”).
Regarding claim 1, Ruan taught a system (“data center”; consider generally paragraphs 0262-0264 and Fig. 23 regarding the structure of the system), comprising:
a first board comprising; and a second board comprising a printed circuit board, the first board further comprising: a first switch on the first board; a second switch on the first board; a memory; and a compute element, the second board further comprising: a first switch on the second board; a second switch on the second board; a memory; and a compute element, (consider paragraph 0156, “DPU 130 includes a plurality of cores 140 coupled to an on-chip memory unit 134. In some examples, memory unit 134 may include a cache memory.”) (consider further paragraph 0082, “In some example implementations, each DPU 17 implements at least four different operational networking components or functions: (1) a source component operable to receive traffic from server 12, (2) a source switching component operable to switch source traffic to other source switching components of different DPUs 17 (possibly of different DPU groups) or to core switches 22, (3) a destination switching component operable to switch inbound traffic received from other source switching components or from cores switches 22 and (4) a destination component operable to reorder packet flows and provide the packet flows to destination servers 12.”)
the first switch of the first board being connected to the first switch of the second board and to the compute element of the first board; the first switch of the second board being connected to the first switch of the first board and to the compute element of the second board; the second switch of the first board being connected to the second switch of the second board and to the compute element of the first board; the second switch of the second board being connected to the second switch of the first board and to the compute element of the second board; (consider paragraph 0067, “In example implementations, DPUs 17 are configurable to operate in a standalone network appliance having one or DPUs. For example, DPUs 17 may be arranged into multiple different DPU groups 19, each including any number of DPUs up to, for example, x DPUs 17.sub.1-17.sub.x. For example, DPUs 17 may be arranged into multiple different DPU groups 19, each including any number of DPUs. In other examples, each DPU may be implemented as a component (e.g., electronic chip) within a device, such as a compute node, storage node, or application server, and may be deployed on a motherboard of the device or within a removable card, such as a storage and/or network interface card. As such, multiple DPUs 17 may be grouped (e.g., within a single electronic device or network appliance), referred to herein as an DPU group 17, for providing services to a group of servers supported by the set of DPUs internal to the device.”) (consider further paragraph 0085, “DPUs 17 may need to connect to a fair number of core switches 22 in order to communicate packet data to any other of DPUs 17 and the servers 12 accessible through those DPUs. In some cases, to provide a link multiplier effect, DPUs 17 may connect to core switches 22 via top of rack (TOR) Ethernet switches, electrical permutation devices, or optical permutation (OP) devices (not shown in FIG. 2). To provide an additional link multiplier effect, source components of the DPUs 17 may be configured to spray packets of individual packet flows of the traffic received from server 12 across a set of the other DPUs 17 included in one or more DPU groups 19. In one example, DPU 17 may achieve an 8× multiplier effect from inter-DPU spraying, and an additional 8× multiplier effect from OP devices to connect to up to sixty-four core switches 22.”)
the memory of the second board being accessible by the compute element of the first board using a load instruction executed by the compute element of the first board or a store instruction executed by the compute element of the first board. (consider paragraph 0068, “In addition, DPUs 17 described herein may provide additional services, such as storage (e.g., integration of solid-state storage devices), security (e.g., encryption), acceleration (e.g., compression), I/O offloading, and the like. In some examples, one or more of DPUs 17 may include storage devices, such as high-speed solid-state drives or rotating hard drives, configured to provide network accessible storage for use by applications executing on the servers.”) (consider further paragraphs 0077-0078, “In some example implementations, each DPU 17 may, therefore, have multiple parallel data paths for reaching any given other DPU 17 and the servers 12 reachable through those DPUs. In some examples, rather than being limited to sending all of the packets of a given flow along a single path in the switch fabric, switch fabric 14 may be configured such that DPUs 17 may, for any given packet flow between servers 12, spray the packets of the packet flow across all or a subset of the M parallel data paths of switch fabric 14 by which a given destination DPU 17 for a destination server 12 can be reached. According to the disclosed techniques, DPUs 17 may spray the packets of individual packet flows across the M paths end-to-end forming a virtual tunnel between a source DPU and a destination DPU.”) (consider further paragraph 0162, “DPU 130 may also include one or more high bandwidth interfaces for connectivity to off-chip external memory 150”) (consider further paragraph 0165, “FIG. 10 is a block diagram illustrating an example networking unit 142 of DPU 130 from FIG. 9, in more detail. Networking unit (NU) 142 exposes Ethernet ports, also referred to herein as fabric ports, to connect DPU 130 to the switch fabric. NU 142 connects to processing cores 140 and external servers and/or storage devices, such as SSD devices, via endpoint ports. NU 142 supports switching packets from one fabric port to another fabric port without storing the complete packet (i.e., transit switching), which helps to achieve low latency for transit traffic. In this way, NU 142 enables creation of a fabric of DPUs with or without external switching elements. NU 142 may fulfill the following roles: (1) transmit packets from PCIe devices (servers and/or SSDs) to the switch fabric, and receive packets from the switch fabric and send them to the PCIe devices; (2) support switching packets from one fabric port to another fabric port; (3) support sending network control packets to an DPU controller; and (4) implement FCP tunneling.”) (consider further paragraph 0184, “As indicated above, the FCP messages include request, grant, and data messages. The request message is generated when source DPU 196 wishes to transfer a certain amount of data to destination DPU 198. The request message carries a destination tunnel ID, queue ID, request block number (RBN) of the queue, and metadata. The request message is sent over high priority channel 204 on the network fabric 200 and the message is sprayed over all available paths. The metadata may be used to indicate a request retry among other things. The grant message is generated when destination DPU 198 responds to a request from source DPU 196 to transfer a certain amount of data. The grant message carries the source tunnel ID, queue ID, grant block number (GBN) of the queue, metadata (scale factor, etc.), and timestamp. The grant message is sent over control channel 202 on network fabric 200 and the message is sprayed over all available paths. The control packet structure of request and grant messages is described below with respect to FIG. 18”) (consider further paragraph 0177, specifically “NU 142 includes source agent control block 180 and destination agent control block 182 that, collectively, are responsible for FCP control packets. In other examples, source agent control block 180 and destination control block 182 may comprise a single control block. Source agent control block 180 generates FCP request messages for every tunnel. In response to FCP grant messages received in response to the FCP request messages, source agent block 180 instructs packet buffer 174 to send FCP data packets based on the amount of bandwidth allocated by the FCP grant messages”) (consider further paragraph 0178, specifically “Destination agent control block 182 generates FCP grant messages for every tunnel. In response to received FCP request messages, destination agent block 182 updates a state of the tunnel and sends FCP grant messages allocating bandwidth on the tunnel, as appropriate. In response to FCP data packets received in response to the FCP grant messages, packet buffer 174 sends the received data packets to packet reorder engine 176 for reordering and reassembly before storage in memory 178. Memory 178 may comprise an on-chip memory or an external, off-chip memory. Memory 178 may comprise RAM or DRAM.”)
Regarding claim 2, Ruan taught the system of claim 1, comprising:
a first network plane comprising the first switch of the first board and the first switch of the second board; and a second network plane comprising the second switch of the first board and the second switch of the second board. (consider paragraph 0076, “FIG. 2 is a block diagram illustrating in further detail the logical interconnectivity provided by DPUs 17 and switch fabric 14 within the data center. As shown in this example, DPUs 17 and switch fabric 14 may be configured to provide full mesh interconnectivity such that DPUs 17 may communicate packet data for any of servers 12 to any other of the servers 12 using any of a number of M parallel data paths to any of core switches 22A-22M (collectively “core switches 22”). Moreover, according to the techniques described herein, DPUs 17 and switch fabric 14 may be configured and arranged in a way such that the M parallel data paths in switch fabric 14 provide reduced L2/L3 hops and full mesh interconnections (e.g., bipartite graph) between servers 12, even in massive data centers having tens of thousands of servers.”) (consider further paragraph 0079, “In one example, the flattened topology of switch fabric 14 may result in a core layer that includes only one level of spine switches, e.g., core switches 22, that may not communicate directly with one another but form a single hop along the M parallel data paths. In this example, any DPU 17 sourcing traffic into switch fabric 14 may reach any other DPU 17 by a single, one-hop L3 lookup by one of core switches 22.”) (consider further paragraphs 0077-0078, “In some example implementations, each DPU 17 may, therefore, have multiple parallel data paths for reaching any given other DPU 17 and the servers 12 reachable through those DPUs. In some examples, rather than being limited to sending all of the packets of a given flow along a single path in the switch fabric, switch fabric 14 may be configured such that DPUs 17 may, for any given packet flow between servers 12, spray the packets of the packet flow across all or a subset of the M parallel data paths of switch fabric 14 by which a given destination DPU 17 for a destination server 12 can be reached. According to the disclosed techniques, DPUs 17 may spray the packets of individual packet flows across the M paths end-to-end forming a virtual tunnel between a source DPU and a destination DPU.”)
Regarding claim 3, Ruan taught the system of claim 2, comprising:
a plurality of compute elements including the compute element of the first board and the compute element of the second board; a plurality of memories including the memory of the first board and the memory of the second board, (again, consider paragraph 0156, “DPU 130 includes a plurality of cores 140 coupled to an on-chip memory unit 134. In some examples, memory unit 134 may include a cache memory.”) (again, consider further paragraph 0082, “In some example implementations, each DPU 17 implements at least four different operational networking components or functions: (1) a source component operable to receive traffic from server 12, (2) a source switching component operable to switch source traffic to other source switching components of different DPUs 17 (possibly of different DPU groups) or to core switches 22, (3) a destination switching component operable to switch inbound traffic received from other source switching components or from cores switches 22 and (4) a destination component operable to reorder packet flows and provide the packet flows to destination servers 12.”)
cause the plurality of compute elements:
the plurality of memories storing instructions that, when executed by the plurality of compute elements, to route traffic of a first traffic class between the first board and the second board using the first network plane; and to route traffic of a second traffic class between the first board and the second board using the second network plane. (consider paragraph 0056, “In some examples, FCP supports improved end-to-end QoS. The FCP provides improved end-to-end QoS through the grant scheduler at the destination. The destination views the incoming requests from multiple sources grouped based on priority and schedules the grants based on the desired QoS behavior across the priority groups.”) (consider further paragraph 0231, “The FCP queues are split as tunnels and priorities. The FCP grant scheduler groups the queues based on their priority (e.g., up to 8 priorities) for scheduling purposes. The grant scheduler may select one of the priority groups through strict priority or a hierarchical deficit weighted round-robin (DWRR) scheme. On top of each priority group scheduling, a flow aware algorithm may be used to arbitrate among FCP queues that are part of the priority group. Incoming flow weights from FCP queues may be normalized and used by the DWRR grant scheduler for updating credits to the arbitrating FCP queues.”)
Regarding claim 4, Ruan taught the system of claim 3, wherein:
the instructions further cause the plurality of compute elements to execute a first application and a second application; the first traffic class comprises traffic generated by the first application; and the second traffic class comprises traffic generated by the second application. (again, consider paragraph 0056, “In some examples, FCP supports improved end-to-end QoS. The FCP provides improved end-to-end QoS through the grant scheduler at the destination. The destination views the incoming requests from multiple sources grouped based on priority and schedules the grants based on the desired QoS behavior across the priority groups.”) (consider further paragraph 0068, “In addition, DPUs 17 described herein may provide additional services, such as storage (e.g., integration of solid-state storage devices), security (e.g., encryption), acceleration (e.g., compression), I/O offloading, and the like. In some examples, one or more of DPUs 17 may include storage devices, such as high-speed solid-state drives or rotating hard drives, configured to provide network accessible storage for use by applications executing on the servers.”) (again, consider further paragraph 0231, “The FCP queues are split as tunnels and priorities. The FCP grant scheduler groups the queues based on their priority (e.g., up to 8 priorities) for scheduling purposes. The grant scheduler may select one of the priority groups through strict priority or a hierarchical deficit weighted round-robin (DWRR) scheme. On top of each priority group scheduling, a flow aware algorithm may be used to arbitrate among FCP queues that are part of the priority group. Incoming flow weights from FCP queues may be normalized and used by the DWRR grant scheduler for updating credits to the arbitrating FCP queues.”)
Regarding claim 5, Ruan taught the system of claim 3, wherein:
the first traffic class comprises traffic having a first service requirement; and the second traffic class comprises traffic having a second service requirement, different from the first service requirement. (consider paragraph 0005, “FCP is a data transmission protocol that may provide certain advantages in environments in which a network fabric provides full mesh interconnectivity between at least a set of servers such that any of the plurality of servers may communicate packet data for a given packet flow to any other of the plurality of servers using any of a number of parallel data paths within the network fabric. Example implementations of the FCP establish an FCP tunnel between a source data processing unit (DPU) and a destination DPU, where the source DPU sprays individual packets for a given packet flow across some or all of the multiple parallel data paths in the network fabric while tunneling the packets to the destination DPU. In some examples, the FCP may provide end-to-end admission control mechanisms in which a sender node explicitly requests a receiver node with the intention to transfer a certain number of bytes of payload data, and in response, the receiver node issues a grant based on its buffer resources, quality of service (QoS), and/or a measure of fabric congestion.”) (again, consider paragraph 0056, “In some examples, FCP supports improved end-to-end QoS. The FCP provides improved end-to-end QoS through the grant scheduler at the destination. The destination views the incoming requests from multiple sources grouped based on priority and schedules the grants based on the desired QoS behavior across the priority groups.”) (again, consider further paragraph 0231, “The FCP queues are split as tunnels and priorities. The FCP grant scheduler groups the queues based on their priority (e.g., up to 8 priorities) for scheduling purposes. The grant scheduler may select one of the priority groups through strict priority or a hierarchical deficit weighted round-robin (DWRR) scheme. On top of each priority group scheduling, a flow aware algorithm may be used to arbitrate among FCP queues that are part of the priority group. Incoming flow weights from FCP queues may be normalized and used by the DWRR grant scheduler for updating credits to the arbitrating FCP queues.”)
Regarding claim 6, Ruan taught the system of claim 5, wherein the first service requirement comprises a requirement for a maximum latency. (consider paragraph 0086, “In the case of routing and switching over ECMP paths, the source DPU may select the same path for two of the large bandwidth flows leading to large latencies over that path. In order to avoid this issue and keep latency low across the network, an administrator may be forced to keep the utilization of the network below 25-30%, for example. The techniques described in this disclosure of configuring DPUs 17 to spray packets of individual packet flows across all available paths enables higher network utilization, e.g., 85-90%, while maintaining bounded or limited latencies.”) (consider paragraph 0213, “By default, the FCP limits the “request window” size up to a maximum request block size (MRBS) based on the maximum queue drain rate and round-trip time (FCP request to FCP grant) from the destination queue. The value of MRBS is software programmed based on the estimated maximum queue drain rate and RTT, also known as BDP or bandwidth delay product.”)
Regarding claim 7, Ruan taught the system of claim 5, wherein the second service requirement comprises a requirement for a minimum bandwidth. (consider paragraph 0053, “Further, FCP provides flow-aware fair bandwidth distribution. The traffic is governed through a flow-aware admission control scheduler at the destination node. The request/grant mechanism uses a “pull” model (via grants), and it ensures flow-aware fair bandwidth distribution among incast flows.”) (consider further paragraph 0254, “Once the logical tunnel is established, one of the DPUs (referred to as the ‘source DPU’ in FIG. 21) may receive outbound packets associated with the same packet flow, e.g., from an application or storage source server 12 (512). In response, the source DPU sends an FCP request message for an amount of data to be transferred in the packet flow (514). In response to receipt of the FCP request message, another one of the DPUs (referred to as the ‘destination DPU’ in FIG. 21) performs grant scheduling (522) and sends an FCP grant message indicating an amount of bandwidth reserved for the packet flow (524).”)
Claims 11-17 recite a method that contain substantially the same limitations as recited in claims 1-7 respectively and are also rejected under 35 USC § 102(a)(1) as being anticipated by the same teachings of Ruan.
Claim 20 recites a system that contains substantially the same limitations as recited in claim 1 and is also rejected under 35 USC § 102(a)(1) as being anticipated by the same teachings of Ruan.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 8-9 and 18-19 are rejected under 35 U.S.C. 102(a)(1) as anticipated by or, in the alternative, under 35 U.S.C. 103 as obvious over Ruan.
Regarding claim 8, Ruan taught the system of claim 1, comprising a plurality of compute elements including the compute element of the first board and the compute element of the second board, wherein the plurality of compute elements comprises 1,000 compute elements. (consider paragraph 0004, “In most data centers, clusters of storage systems and application servers are interconnected via a high-speed switch fabric provided by one or more tiers of physical network switches and routers. Data centers vary greatly in size, with some public data centers containing hundreds of thousands of servers”) (consider paragraph 0067, “In example implementations, DPUs 17 are configurable to operate in a standalone network appliance having one or DPUs. For example, DPUs 17 may be arranged into multiple different DPU groups 19, each including any number of DPUs up to, for example, x DPUs 17.sub.1-17.sub.x. For example, DPUs 17 may be arranged into multiple different DPU groups 19, each including any number of DPUs. In other examples, each DPU may be implemented as a component (e.g., electronic chip) within a device, such as a compute node, storage node, or application server, and may be deployed on a motherboard of the device or within a removable card, such as a storage and/or network interface card. As such, multiple DPUs 17 may be grouped (e.g., within a single electronic device or network appliance), referred to herein as an DPU group 17, for providing services to a group of servers supported by the set of DPUs internal to the device.”) (consider further paragraph 0087, “As shown in the example of FIG. 2, in some example implementations, DPUs 17 may be arranged into multiple different DPU groups 19.sub.1-19.sub.Y (ANGs in FIG. 2), each including any number of DPUs 17 up to, for example, x DPUs 17.sub.1-17.sub.x. As such, multiple DPUs 17 may be grouped and arranged (e.g., within a single electronic device or network appliance), referred to herein as an DPU group (ANG) 19, for providing services to a group of servers supported by the set of DPUs internal to the device.”) (consider also paragraph 0260, “The Fabric Control Protocol (FCP) described herein is a transport protocol that delivers data packets reliably, securely, and efficiently between end-points in a data center containing as many as several hundred thousand end-points.”)
However, alternatively, Ruan may be interpreted as not expressly teaching wherein the plurality of compute elements comprises exactly 1,000 compute elements. In such an interpretation, it is unclear as to whether Ruan discloses with sufficient specificity in order to anticipate such a specific number. As noted above, Ruan does teach a plurality of compute elements that may touch on such an exact number.
As such, Examiner finds that it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to simply have 1,000 compute elements included in the plurality of compute elements taught in Ruan. Ruan teaches that any number of compute elements may be used (again, consider paragraphs 0004, 0067, and 0260) and Examiner finds that this teaching within Ruan demonstrates that to substitute the plurality of compute elements including the compute element of the first board and the compute element of the second board and their functions which were known in the art for exactly 1,000 compute elements including the compute element of the first board and the compute element of the second board and one skilled in the art could have simply substituted one known quantity of element for another more specific quantity such that the substitution would have yielded nothing more than predictable results to one of ordinary skill in the art.
Regarding claim 9, Ruan taught or rendered unpatentable the system of claim 8.
Ruan further taught the system comprising:
a plurality of network planes including: a first network plane comprising the first switch of the first board and the first switch of the second board; and a second network plane comprising the second switch of the first board and the second switch of the second board, wherein each of the plurality of compute elements is capable of communicating with each of the other compute elements of the plurality of compute elements through a single-hop network connection in the first network plane. (consider paragraph 0079, “In one example, the flattened topology of switch fabric 14 may result in a core layer that includes only one level of spine switches, e.g., core switches 22, that may not communicate directly with one another but form a single hop along the M parallel data paths. In this example, any DPU 17 sourcing traffic into switch fabric 14 may reach any other DPU 17 by a single, one-hop L3 lookup by one of core switches 22.”)
Claims 18 and 19 recite a method that contain substantially the same limitations as recited in claims 8-9 respectively and are also rejected under 35 USC § 102 or 103 as being anticipated or, in the alternative, unpatentable over the same teachings of Ruan and the same rationale supporting the conclusion of obviousness in the alternative.
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Ruan in view of US 10742513 B2 to Wong.
Regarding claim 10, Ruan taught the system of claim 8.
Ruan may be interpreted as not expressly teaching wherein each of the plurality of compute elements is capable of communicating with each of the other compute elements of the plurality of compute elements through a network connection having a latency of less than 100 nanoseconds, however, Ruan teaches that each of the plurality of compute elements is capable of communicating with each of the other compute elements of the plurality of compute elements through a network connection (consider paragraph 0048, “As described, FCP enables an adaptive and low latency fabric implementation. The source/destination nodes use adaptive bandwidth control techniques through outgoing request and grant messages that react to long term fabric congestion caused by fabric failures. By adaptively controlling the request and grant rates, the amount of data entering/leaving the fabric is controlled. By operating the destination node throughput slightly below the maximum supported throughput via grant rate limiting, the FCP maintains a congestion free fabric operation and thereby achieves a predictable latency for packets traversing through the fabric.”)
In an analogous art relating to large scale mesh network communication (consider Fig. 2 and column 8, lines 3-65), Wong taught that network communication within such environments of a latency of less than 100 nanoseconds is necessary in data analysis and artificial intelligence applications. (consider column 2, lines 15-22, “Some of these applications needs relate to the increasing use of data analytic tools (“big data”) and artificial intelligence (“AI”), for example. As discussed above, big data and AI have become very significant distributed applications. Servicing these applications require handling large amounts of data (e.g., petabytes), using great computation power (e.g., petaflops), and achieving very low latency (e.g., responses that become available within 100 ns).”)
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify the teachings of Ruan to include the taught features of Wong such that the modification includes every element as claimed. Given Ruan’s disclosure of the desirability of low latency network connections between nodes on a network and specific methods and systems to achieve the same, Wong specifically taught that having less than 100 nanoseconds of latency in network connections in an analogous environment is desirable for specific applications (again, consider column 2, lines 15-22). Given this specific advantage in Wong, one skilled in the art would have been motivated to modify the teachings of Ruan with the teachings of Wong such that each of the plurality of compute elements communicating with each of the other compute elements of the plurality of compute elements through a network connection as taught in Ruan is capable of having latency values less than 100 nanoseconds as taught in Wong so that each of the plurality of compute elements is capable of communicating with each of the other compute elements of the plurality of compute elements through a network connection having a latency of less than 100 nanoseconds as claimed. Therefore, such a modification of the teachings of Ruan with the teachings of Wong would have yielded nothing more than predictable results to one of ordinary skill in the art.
Response to Arguments
Applicant's arguments filed in the instant response have been fully considered but they are not persuasive.
Applicant argues that Ruan fails to teach or suggest the amended limitations of “the memory of the second board being accessible by the compute element of the first board using a load instruction executed by the compute element of the first board or a store instruction executed by the compute element of the first board”.
Applicant argues that “the Office action does not appear to cite to any specific disclosure or component of Ruan that allegedly equates to a ‘load instruction executed by the compute element of the first board" or to a "store instruction executed by the compute element’. Instead, the cited portions of Ruan appear to only generally disclose that the DPUs provide storage and that packets may be transmitted between the different DPUs through the switch fabric 14. The cited sections of Ruan do not disclose ‘a load instruction’ or a ‘a store instruction’ executed by a compute element as recited in amended claim 1 and similarly recited in amended claims 11 and 20.”
However, Examiner respectfully disagrees.
As previously explained, Ruan taught that (Examiner’s emphasis added):
“In addition, DPUs 17 described herein may provide additional services, such as storage (e.g., integration of solid-state storage devices), security (e.g., encryption), acceleration (e.g., compression), I/O offloading, and the like. In some examples, one or more of DPUs 17 may include storage devices, such as high-speed solid-state drives or rotating hard drives, configured to provide network accessible storage for use by applications executing on the servers.”) (paragraph 0068)
“FIG. 10 is a block diagram illustrating an example networking unit 142 of DPU 130 from FIG. 9, in more detail. Networking unit (NU) 142 exposes Ethernet ports, also referred to herein as fabric ports, to connect DPU 130 to the switch fabric. NU 142 connects to processing cores 140 and external servers and/or storage devices, such as SSD devices, via endpoint ports. NU 142 supports switching packets from one fabric port to another fabric port without storing the complete packet (i.e., transit switching), which helps to achieve low latency for transit traffic. In this way, NU 142 enables creation of a fabric of DPUs with or without external switching elements. NU 142 may fulfill the following roles: (1) transmit packets from PCIe devices (servers and/or SSDs) to the switch fabric, and receive packets from the switch fabric and send them to the PCIe devices; (2) support switching packets from one fabric port to another fabric port; (3) support sending network control packets to an DPU controller; and (4) implement FCP tunneling.” (paragraph 0165)
“As indicated above, the FCP messages include request, grant, and data messages. The request message is generated when source DPU 196 wishes to transfer a certain amount of data to destination DPU 198. The request message carries a destination tunnel ID, queue ID, request block number (RBN) of the queue, and metadata. The request message is sent over high priority channel 204 on the network fabric 200 and the message is sprayed over all available paths. The metadata may be used to indicate a request retry among other things. The grant message is generated when destination DPU 198 responds to a request from source DPU 196 to transfer a certain amount of data. The grant message carries the source tunnel ID, queue ID, grant block number (GBN) of the queue, metadata (scale factor, etc.), and timestamp. The grant message is sent over control channel 202 on network fabric 200 and the message is sprayed over all available paths. The control packet structure of request and grant messages is described below with respect to FIG. 18” (paragraph 0184)
Ruan also further enhances these teachings by also teaching that (Examiner’s emphasis added):
“DPU 130 may also include one or more high bandwidth interfaces for connectivity to off-chip external memory 150” (paragraph 0162)
“NU 142 includes source agent control block 180 and destination agent control block 182 that, collectively, are responsible for FCP control packets. In other examples, source agent control block 180 and destination control block 182 may comprise a single control block. Source agent control block 180 generates FCP request messages for every tunnel. In response to FCP grant messages received in response to the FCP request messages, source agent block 180 instructs packet buffer 174 to send FCP data packets based on the amount of bandwidth allocated by the FCP grant messages” (paragraph 0177)
“Destination agent control block 182 generates FCP grant messages for every tunnel. In response to received FCP request messages, destination agent block 182 updates a state of the tunnel and sends FCP grant messages allocating bandwidth on the tunnel, as appropriate. In response to FCP data packets received in response to the FCP grant messages, packet buffer 174 sends the received data packets to packet reorder engine 176 for reordering and reassembly before storage in memory 178. Memory 178 may comprise an on-chip memory or an external, off-chip memory. Memory 178 may comprise RAM or DRAM.” (paragraph 0178)
As can be seen, there are at least “load” (ie. as a FCP request message ) and “store” (ie. as a FCP data message) “instructions” that are “executed” by the source DPU (ie. the “compute element of the first board”) wherein the “memory” which has at least one “high bandwidth interface” consistent with the disclosure of the “second board” (ie. one of the “one or more” “DPUs”/”destination DPU”) is “accessible” “by the compute element of the first board” being way of using the FCP messages described in Ruan such that data is transferred to the destination DPU to its “memory” based upon the reception of said “load” and “store” “instructions”.
Therefore, Examiner finds that Ruan continues to teach the amended limitations added to the instant claims and Applicant’s arguments to the contrary are unpersuasive.
Conclusion
An updated search revealed additional prior art that is considered pertinent to the claimed invention and/or to the broader disclosure. The cited prior art further enhance the teachings of Ruan as they are incorporated by reference by Ruan.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to G. C. Neurauter, Jr. whose telephone number is (571)272-3918. The examiner can normally be reached Monday-Friday 9am-5pm Eastern Time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tonia Dollinger, can be reached at 571-272-4170. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/G. C. Neurauter, Jr./Primary Examiner, Art Unit 2459