Switching Enzian Network Traffic

How do we connect our Enzian machines to ensure that they can utilize the entire bandwith that they are provisioned with? Using big switches!

An Enzian has about half a Terabit of network bandwith: 400Gbs are divided among four 100Gbs QSFP+ ports on the FPGA side, while the CPU comes with 80Gbps, divided among two 40G QSFP ports.

Here is where our three Edge-Core Wedge 100BF-x65 switches come in. They each feature a whopping 12.8Tbps of switching power, which is enough to switch each of the 65 100Gbps QSFP+ ports at full duplex speed.

One of our Enzian switches
One of our Enzian switches

Network Topology

We need multiple switches, since a single switch doesn’t have enough bandwidth to connect all of our fourteen Enzians at full bandwidth: 14 Enzians x 480Gbps x 2 (duplex) = 13.44Tbps, which is just barely more than the 12.8Tbps that a single switch can handle. Additionally, we are also over our port budget of 65 ports, since each Enzian requires six ports for a grand total of 84 ports.

Instead, we separate our network into two halves, each consisting of 7 Enzians connected to a leaf switch, and use a 16-port LACP interconnect to connect both leaf switches via a third spine switch. Technically, this third switch is not necessary to connect all Enzians, but it opens up possibilities of interfacing with other clusters in the future.

The CPUs and FPGAs are put in separate VLANs, with our spine switche also acting as router between these VLANs.

Fortunately for us, our datacenter operator has spare QSFP+ uplinks, which could allow us to get a whopping 100Gbps outbound connection in the future. Whether we are able to saturate this connection is a different matter.

Deploying the Switches

We have successfully deployed the switches in our impromptu server room1, physically connecting all ports on all of our machines at once for the first time.

Our three 100G switches connected in our temporary server room
Our three 100G switches connected in our temporary server room

Next step is to get them all racked up!

  1. Named this way because it is a room full of servers, not a dedicated server room