Skip to content

Use Cases Data Center Interconnect

Thomas Mangin edited this page Nov 15, 2025 · 1 revision

Data Center Interconnect (DCI)

ExaBGP enables Data Center Interconnect using EVPN, VPLS, and Layer 2 extension for multi-site deployments, disaster recovery, and workload mobility.

Table of Contents

Overview

DCI Requirements

Data Center Interconnect solutions need:

  • Layer 2 extension: Stretch VLANs across sites for VM mobility
  • Layer 3 routing: Efficient inter-site routing and traffic engineering
  • Multi-tenancy: Isolated networks per customer/application
  • Disaster recovery: Seamless failover between sites
  • Workload mobility: Live migration of VMs/containers across sites
  • Geographic distribution: Active-active or active-standby deployments

DCI Technologies

ExaBGP supports multiple DCI approaches:

Technology Layer Use Case Complexity
EVPN L2/L3 Modern DCI, VXLAN-based Medium
VPLS L2 Legacy DCI, MPLS-based Medium
L3VPN L3 IP-only DCI, no L2 extension Low
Stretched Anycast L3 Service availability, no state Low

ExaBGP Role

ExaBGP enables DCI by:

  1. Route exchange: Advertise endpoints between data centers via BGP
  2. Topology discovery: Collect site information via BGP-LS
  3. Traffic engineering: Control inter-site traffic flow
  4. Automation: Dynamic provisioning via API

Important: ExaBGP exchanges routes but does NOT create tunnels or manipulate forwarding tables. Your application must configure VXLAN/GRE tunnels and install routes.

DCI Architectures

Pattern 1: Active-Active DCI

Both data centers serve traffic simultaneously:

[DC1: VTEP 10.1.0.1]  <--EVPN-->  [DC2: VTEP 10.2.0.1]
        |                                  |
    [VMs: VNI 10000]                  [VMs: VNI 10000]
        |                                  |
    [IP: 192.168.1.0/24]              [IP: 192.168.1.0/24]

Benefits:

  • Load distribution across sites
  • No idle capacity
  • Optimal resource utilization

Challenges:

  • Requires stretched Layer 2
  • Potential for MAC flapping
  • Need for loop prevention

Pattern 2: Active-Standby DCI

One site active, other standby for DR:

[DC1: Active]  <--EVPN-->  [DC2: Standby]
     |                           |
  [Services]              [Services (inactive)]

Benefits:

  • Simple failover
  • No split-brain issues
  • Clear ownership

Challenges:

  • Wasted standby capacity
  • Slower failover than active-active

Pattern 3: Multi-Site Fabric

Multiple data centers in full mesh:

    [DC1]
    /    \
[DC2]----[DC3]
    \    /
    [DC4]

Benefits:

  • Geographic distribution
  • Resilience to site failures
  • Flexible workload placement

Challenges:

  • Complex routing
  • Higher WAN bandwidth needs
  • Increased latency

EVPN for DCI

EVPN Type 2: MAC/IP Advertisement

Advertise endpoints across data centers:

#!/usr/bin/env python3
import sys

# Data center configuration
DC_ID = 1
VTEP_IP = f"10.{DC_ID}.0.1"
VNI = 10000

def announce_endpoint(mac, ip):
    """Announce endpoint to remote DC"""
    rd = f"{VTEP_IP}:1"
    rt = f"65000:{VNI}"

    print(f"announce route-distinguisher {rd} "
          f"mac {mac} ip {ip} "
          f"label {VNI} "
          f"route-target {rt}", flush=True)

# Announce local endpoints
announce_endpoint("00:11:22:33:44:55", "192.168.1.10")
announce_endpoint("00:11:22:33:44:56", "192.168.1.11")

while True:
    line = sys.stdin.readline().strip()
    if not line:
        break

EVPN Type 5: IP Prefix Advertisement

For Layer 3 DCI without L2 extension:

#!/usr/bin/env python3
import sys

DC_ID = 1
VTEP_IP = f"10.{DC_ID}.0.1"
VNI = 10000

def announce_prefix(prefix):
    """Announce IP prefix to remote DC (EVPN Type 5)"""
    rd = f"{VTEP_IP}:1"
    rt = f"65000:{VNI}"

    # EVPN Type 5 via text API
    print(f"announce route {prefix} "
          f"next-hop {VTEP_IP} "
          f"route-target {rt} "
          f"label {VNI}", flush=True)

# Announce DC1 prefixes to DC2
announce_prefix("192.168.1.0/24")
announce_prefix("192.168.2.0/24")

while True:
    line = sys.stdin.readline().strip()
    if not line:
        break

Gateway Redundancy

Use EVPN for gateway failover:

#!/usr/bin/env python3
import sys
import time
import requests

DC_ID = 1
VTEP_IP = f"10.{DC_ID}.0.1"
GATEWAY_IP = "192.168.1.1"
VNI = 10000

def check_gateway_health():
    """Check if local gateway is healthy"""
    try:
        # Check gateway reachability
        response = requests.get(f"http://{GATEWAY_IP}/health", timeout=2)
        return response.status_code == 200
    except:
        return False

def announce_gateway(mac, ip):
    """Announce gateway IP to take over"""
    rd = f"{VTEP_IP}:1"
    rt = f"65000:{VNI}"

    print(f"announce route-distinguisher {rd} "
          f"mac {mac} ip {ip} "
          f"label {VNI} "
          f"route-target {rt}", flush=True)

def withdraw_gateway(mac, ip):
    """Withdraw gateway announcement"""
    rd = f"{VTEP_IP}:1"

    print(f"withdraw route-distinguisher {rd} "
          f"mac {mac} ip {ip}", flush=True)

# Monitor gateway and announce/withdraw
announced = False
GATEWAY_MAC = "00:11:22:33:44:01"

while True:
    healthy = check_gateway_health()

    if healthy and not announced:
        announce_gateway(GATEWAY_MAC, GATEWAY_IP)
        announced = True
    elif not healthy and announced:
        withdraw_gateway(GATEWAY_MAC, GATEWAY_IP)
        announced = False

    time.sleep(10)

VPLS for DCI

VPLS Overview

VPLS (Virtual Private LAN Service) provides Layer 2 VPN using MPLS:

#!/usr/bin/env python3
import sys

# VPLS configuration
VPLS_ID = 100
RD = "10.1.0.1:100"
RT = "65000:100"
PE_ADDRESS = "10.1.0.1"

def announce_vpls():
    """Announce VPLS endpoint"""
    # VPLS via BGP (RFC 4761)
    print(f"announce route-distinguisher {RD} "
          f"vpls {VPLS_ID} "
          f"endpoint {PE_ADDRESS} "
          f"route-target {RT}", flush=True)

announce_vpls()

while True:
    line = sys.stdin.readline().strip()
    if not line:
        break

Multi-Site VPLS

Connect multiple data centers via VPLS:

#!/usr/bin/env python3
import sys

# Multi-site VPLS
SITES = {
    'dc1': {'pe': '10.1.0.1', 'vpls_id': 100},
    'dc2': {'pe': '10.2.0.1', 'vpls_id': 100},
    'dc3': {'pe': '10.3.0.1', 'vpls_id': 100}
}

SITE_NAME = 'dc1'  # Current site
RD_BASE = SITES[SITE_NAME]['pe']
VPLS_ID = SITES[SITE_NAME]['vpls_id']
RT = f"65000:{VPLS_ID}"

def announce_vpls_endpoint():
    """Announce VPLS endpoint for this site"""
    rd = f"{RD_BASE}:{VPLS_ID}"
    pe = SITES[SITE_NAME]['pe']

    print(f"announce route-distinguisher {rd} "
          f"vpls {VPLS_ID} "
          f"endpoint {pe} "
          f"route-target {RT}", flush=True)

announce_vpls_endpoint()

while True:
    line = sys.stdin.readline().strip()
    if not line:
        break

Configuration Examples

EVPN DCI Configuration

Configuration (/etc/exabgp/dci-evpn.conf):

process dci-controller {
    run python3 /etc/exabgp/dci-evpn.py;
    encoder json;
}

# Connection to local spine switches
neighbor 10.1.0.254 {
    router-id 10.1.0.1;
    local-address 10.1.0.1;
    local-as 65001;
    peer-as 65001;

    family {
        l2vpn evpn;
        ipv4 unicast;
    }

    api {
        processes [ dci-controller ];
    }
}

# Connection to remote DC (via DCI circuit)
neighbor 10.2.0.1 {
    router-id 10.1.0.1;
    local-address 10.1.0.1;
    local-as 65001;
    peer-as 65002;

    family {
        l2vpn evpn;
    }

    api {
        processes [ dci-controller ];
    }
}

VPLS DCI Configuration

Configuration (/etc/exabgp/dci-vpls.conf):

process vpls-controller {
    run python3 /etc/exabgp/vpls-announce.py;
    encoder text;
}

neighbor 10.1.0.254 {
    router-id 10.1.0.1;
    local-address 10.1.0.1;
    local-as 65001;
    peer-as 65001;

    family {
        l2vpn vpls;
    }

    api {
        processes [ vpls-controller ];
    }
}

neighbor 10.2.0.1 {
    router-id 10.1.0.1;
    local-address 10.1.0.1;
    local-as 65001;
    peer-as 65002;

    family {
        l2vpn vpls;
    }

    api {
        processes [ vpls-controller ];
    }
}

Stretched Anycast

Simple anycast without L2 extension:

#!/usr/bin/env python3
import sys
import time
import requests

# Anycast service IP
ANYCAST_IP = "1.2.3.4/32"
NEXT_HOP = "10.1.0.1"

def check_service_health():
    """Check local service health"""
    try:
        response = requests.get("http://127.0.0.1:8080/health", timeout=2)
        return response.status_code == 200
    except:
        return False

announced = False

while True:
    healthy = check_service_health()

    if healthy and not announced:
        print(f"announce route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
        announced = True
    elif not healthy and announced:
        print(f"withdraw route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
        announced = False

    time.sleep(10)

Troubleshooting

Issue: MAC Flapping

Problem: Same MAC appears in multiple locations

Debugging:

# Check for duplicate MAC
show bgp l2vpn evpn route-type 2 | grep "00:11:22:33:44:55"

# Verify Route Distinguishers are unique
exabgpcli show neighbor advertised-routes

Solution:

  • Ensure unique RD per VTEP
  • Check for VM migration loops
  • Verify no duplicate MAC addresses assigned

Issue: High Latency Between Sites

Problem: Slow inter-site communication

Debugging:

# Check underlay latency
ping 10.2.0.1

# Verify tunnel MTU
ip link show vxlan10

# Check for packet fragmentation
tcpdump -i eth0 -vv udp port 4789

Solution:

  • Optimize underlay routing
  • Adjust MTU for VXLAN overhead (typically -50 bytes)
  • Use jumbo frames if supported

Issue: Split-Brain

Problem: Both sites think they're primary

Debugging:

# Check gateway announcements from both sites
show bgp l2vpn evpn | grep "192.168.1.1"

Solution:

  • Implement proper health checking
  • Use consensus mechanism (e.g., Raft, etcd)
  • Add manual override capability

Production Best Practices

Monitoring

Monitor DCI health:

#!/usr/bin/env python3
from prometheus_client import start_http_server, Gauge
import sys
import json

# Metrics
dci_link_status = Gauge('dci_link_status', 'DCI link status', ['site'])
dci_endpoints = Gauge('dci_endpoints_total', 'Total endpoints', ['site'])
dci_latency = Gauge('dci_latency_ms', 'Inter-site latency', ['site'])

start_http_server(9100)

# Update metrics
while True:
    line = sys.stdin.readline().strip()
    if not line:
        break

    try:
        data = json.loads(line)
        # Parse BGP updates and update metrics
    except:
        pass

Capacity Planning

Consider DCI bandwidth requirements:

  • Control plane: BGP updates (typically < 1 Mbps)
  • Data plane: Actual workload traffic (depends on applications)
  • Overhead: VXLAN/GRE adds ~50 bytes per packet
  • Burst handling: Account for VM migration traffic

Security

Secure DCI links:

  1. Encryption: Use IPsec or MACsec for DCI links
  2. Authentication: Use BGP MD5 or TCP-AO
  3. Route filtering: Accept only expected routes
  4. Rate limiting: Protect against BGP route storms

See Also

Address Families

Related Use Cases

Configuration

API


πŸ‘» Ghost written by Claude (Anthropic AI)

Clone this wiki locally