Network design

How we set up our network

               +-------+
        +------|fi-hel1|
        |      +-+-----+
        |        |       +-------+
  +-----+-+      | +-----|pl-klc1|
  |us-pno1|----+ | |     +-------+
  +-------+    | | |
             +-+-+-+-+   
             |at-vie1|   
             +-----+-+   
                   |
                 +-+------+
                 |famfo-pi|
                 +--------+

eBGP

Our eBGP sessions are mostly set up using extended next hop, and extended message. To select the best routes from our peers, we are comparing the advertised communities and update the local_pref accordingly. Find more about the bgp communities used in dn42 on the wiki.

We grouped the region communities together to create larger geographic regions to aid in making better routing decisions.

Region communitiesGeographic region
42, 541 (Europe & Antarctica)
42, 43, 44, 45, 46, 472 (America)
50, 51, 52, 55, 56, 573 (Asia)
534 (Pacific & Oceania)
48, 495 (Africa)

Base local_pref: 250

Assuming same country / region for adding local_pref.

Communitylocal_pref change
Geographic region+50
Region (Community)+100
Country+200
Ping < 2.7ms+50
Ping < 7.3ms+40
Ping < 20ms+30
Ping < 55ms+10
Bandwidth < 0.1mbit-100
Bandwidth < 1mbit-80
Bandwidth < 10mbit-60
Bandwidth < 100mbit-40
Bandwidth < 1000mbit-20

For each hop in the path_len of a route, we subtract an additional 50 from the local_pref.

IGP and iBGP

With iBGP, we are using a similar configuration and route selection method to eBGP. We just add one additional hop (-50 local_pref) to the local_pref.

We are routing our internal network using babel. We choose babel, because of the ability it has to route IPv4 and IPv6 over a single session. The rxcost is based on the ping to the respective node. The ping from at-vie1 to us-pno1 is about 143ms for example, so the rxcost for the interface is 143.

Important: if you want to use babel in your network, make sure to only export routes that don’t originate from BGP. It will cause route hijacking, because the AS information are stripped when the routes are send over babel.

If you want to know more about route hijacking, read this blog post from lantian.

To add routes originating from the node, we are using a dummy interface (provided by the dummy kernel module, we spend way too much time trying to find it).

To learn more about babel and IGP, jlu5 has an excellent blog post, which also helped us massively to setup our internal routing.

Configuration examples

To separate internal and external routes, we are rejecting all routes that originate from BGP:

ipv4 {
    import where source != RTS_BGP && is_self_net(); 
    export where source != RTS_BGP && is_self_net();
};

ipv6 {
    import where source != RTS_BGP && is_self_net();
    export where source != RTS_BGP && is_self_net();
};

An example of the rxcost:

famfo@frog :: ~ » ping karx.xyz                        
PING karx.xyz (47.187.160.240) 56(84) bytes of data.
64 bytes from 47.187.160.240 (47.187.160.240): icmp_seq=1 ttl=45 time=142 ms
64 bytes from 47.187.160.240 (47.187.160.240): icmp_seq=2 ttl=45 time=144 ms
64 bytes from 47.187.160.240 (47.187.160.240): icmp_seq=3 ttl=45 time=143 ms
64 bytes from 47.187.160.240 (47.187.160.240): icmp_seq=4 ttl=45 time=144 ms
^C
--- karx.xyz ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3001ms
rtt min/avg/max/mdev = 141.549/142.846/143.765/0.882 ms
interface "igp_bagpipe" {
    rxcost 143;
};

The (potentially outdated) configuration to all of our nodes can be found on our git. But please do us the favor, understand the basic concepts of BGP and Bird, and don’t just copy someones config.