Skip to content

high availability patterns

Thomas Mangin edited this page Nov 10, 2025 · 1 revision

High Availability Patterns with ExaBGP

Overview

ExaBGP enables dynamic service advertisement and failover through BGP, providing high availability without traditional failover mechanisms like VRRP or Pacemaker.

Core Concept

Dynamic Service Advertisement: Nodes advertise their availability via BGP, announcing service IP addresses they can serve. The network automatically routes traffic to available nodes based on BGP path selection.

Architecture Patterns

Pattern 1: Direct Announcement (Simple)

[Web Servers] ----BGP----> [Edge Routers]
   (ExaBGP)                (BGP Speakers)

Characteristics:

  • Web servers run ExaBGP
  • Direct BGP peering with edge routers
  • Service IPs on loopback interfaces
  • Health checks control announcements

Pros: Simple, direct Cons: NΓ—M BGP sessions, router config changes for new services

Pattern 2: Route Server (Recommended)

[Web Servers] ----BGP----> [Route Servers] ----BGP----> [Edge Routers]
   (ExaBGP)                (BIRD/Quagga)                (BGP Speakers)

Characteristics:

  • Intermediate route servers (BIRD/Quagga)
  • Star topology instead of full mesh
  • Route servers select best paths
  • Separates routing decisions from processes

Pros: Scalable, clean separation, easier management Cons: Additional infrastructure component

Service IP Allocation

Virtual IP Strategy

Allocate multiple virtual IPs per service:

Service A: 2001:db8:30::1, ::2, ::3
Service B: 2001:db8:40::1, ::2, ::3

Each node announces all IPs with different metrics for load distribution.

Loopback Configuration

Configure service IPs on loopback interface:

# Linux
ip addr add 2001:db8:30::1/128 dev lo
ip addr add 2001:db8:30::2/128 dev lo
ip addr add 2001:db8:30::3/128 dev lo

This prevents IP movement issues and enables anycast-style operation.

Metric-Based Load Distribution

Metric Strategy

Each node advertises routes with calculated metrics:

Node 1 (web1):

2001:db8:30::1 - metric 100 (primary)
2001:db8:30::2 - metric 101 (backup)
2001:db8:30::3 - metric 102 (backup)

Node 2 (web2):

2001:db8:30::1 - metric 101 (backup)
2001:db8:30::2 - metric 100 (primary)
2001:db8:30::3 - metric 102 (backup)

Node 3 (web3):

2001:db8:30::1 - metric 102 (backup)
2001:db8:30::2 - metric 101 (backup)
2001:db8:30::3 - metric 100 (primary)

Result: Each IP is primarily served by different node, achieving load distribution.

Failure Handling

When a service fails, increase metrics:

Healthy: metric 100-102
Failed: metric 1000-1002

BGP automatically converges to healthy nodes within seconds.

Health Check Implementation

Basic Health Check Script

#!/usr/bin/env python3
import sys
import subprocess
from time import sleep

def check_service():
    """Check if service is healthy"""
    try:
        result = subprocess.run(
            ['curl', '--fail', '--max-time', '2', 'http://localhost'],
            capture_output=True,
            timeout=3
        )
        return result.returncode == 0
    except:
        return False

# Configuration
service_ips = [
    ('2001:db8:30::1', 100),  # (IP, healthy_metric)
    ('2001:db8:30::2', 101),
    ('2001:db8:30::3', 102),
]
failed_metric = 1000

while True:
    healthy = check_service()

    for ip, base_metric in service_ips:
        metric = base_metric if healthy else (failed_metric + base_metric - 100)
        sys.stdout.write(
            f'announce route {ip}/128 next-hop self med {metric}\n'
        )

    sys.stdout.flush()
    sleep(10)

Advanced Health Checks

Multiple Service Checks:

checks = [
    ('http', 'http://localhost:80'),
    ('https', 'https://localhost:443'),
    ('app', 'http://localhost:8080/health'),
]

def check_all_services():
    return all(check_url(url) for name, url in checks)

Gradual Degradation:

def calculate_metric(base_metric, health_score):
    """
    health_score: 0.0 (dead) to 1.0 (perfect)
    Returns adjusted metric
    """
    if health_score < 0.3:
        return 1000  # Remove from service
    elif health_score < 0.7:
        return base_metric + 50  # Reduced preference
    else:
        return base_metric  # Normal operation

Retry Logic:

def check_with_retry(check_func, retries=3):
    for i in range(retries):
        if check_func():
            return True
        sleep(1)
    return False

ExaBGP Configuration

Basic Configuration

neighbor 192.0.2.1 {
    router-id 10.0.0.1;
    local-address 192.0.2.2;
    local-as 65000;
    peer-as 65000;

    family {
        ipv4 unicast;
        ipv6 unicast;
    }

    api {
        processes [healthcheck];
    }
}

process healthcheck {
    run /usr/local/bin/healthcheck.py;
    encoder text;
}

Route Server Configuration (BIRD)

protocol bgp web1 {
    local as 65000;
    neighbor 10.0.0.1 as 65000;

    ipv4 {
        import filter {
            # Accept service announcements
            if net ~ [ 2001:db8:30::/48+ ] then accept;
            reject;
        };
        export none;
    };
}

protocol bgp edge_router {
    local as 65000;
    neighbor 10.0.1.1 as 65000;

    ipv4 {
        import none;
        export filter {
            # Send best paths to edge routers
            if net ~ [ 2001:db8:30::/48+ ] then accept;
            reject;
        };
    };
}

Use Case Examples

1. Anycast DNS

Multiple DNS servers announce same service IP:

service_ip = '198.51.100.1'

while True:
    if dns_server_healthy():
        sys.stdout.write(f'announce route {service_ip}/32 next-hop self\n')
    else:
        sys.stdout.write(f'withdraw route {service_ip}/32\n')
    sys.stdout.flush()
    sleep(5)

Benefits:

  • Automatic failover
  • Geographic load distribution
  • Query latency reduction

2. Web Service High Availability

Multiple web servers with health checks:

service_ips = ['203.0.113.10', '203.0.113.11', '203.0.113.12']

for ip in service_ips:
    if check_web_service():
        # Announce with metric based on load
        current_load = get_system_load()
        metric = int(100 + current_load * 10)
        sys.stdout.write(
            f'announce route {ip}/32 next-hop self med {metric}\n'
        )
    else:
        sys.stdout.write(f'withdraw route {ip}/32\n')

3. Database Read Replicas

Announce read replica availability:

replica_ip = '198.51.100.50'

while True:
    # Check replication lag
    lag = get_replication_lag()

    if lag < 1.0:  # Less than 1 second lag
        metric = int(100 + lag * 100)
        sys.stdout.write(
            f'announce route {replica_ip}/32 next-hop self med {metric}\n'
        )
    else:
        # Too far behind, remove from pool
        sys.stdout.write(f'withdraw route {replica_ip}/32\n')

    sys.stdout.flush()
    sleep(5)

4. CDN Edge Node

Content delivery nodes announce based on capacity:

edge_ip = '203.0.113.100'

while True:
    available_bandwidth = get_available_bandwidth()
    cpu_usage = get_cpu_usage()

    if cpu_usage < 80 and available_bandwidth > 100:  # Mbps
        # Metric based on utilization
        metric = int(100 + cpu_usage)
        sys.stdout.write(
            f'announce route {edge_ip}/32 next-hop self med {metric}\n'
        )
    else:
        # Over capacity
        sys.stdout.write(f'withdraw route {edge_ip}/32\n')

    sys.stdout.flush()
    sleep(10)

Integration with Load Balancers

HAProxy + ExaBGP

[HAProxy] ----monitor----> [ExaBGP]
    |
    |---> Announces VIP based on backend health

HAProxy health check:

import socket

def check_haproxy_backends():
    """Query HAProxy stats socket"""
    s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    s.connect('/var/run/haproxy.sock')
    s.send(b'show stat\n')
    stats = s.recv(4096).decode()
    s.close()

    # Parse stats and check backend health
    return healthy_backends_count > 0

NGINX + ExaBGP

[NGINX] ----health_check----> [ExaBGP]
    |
    |---> Announces based on upstream health

Maintenance Windows

Graceful Withdrawal

Use maintenance file trigger:

maintenance_file = '/etc/exabgp/maintenance'

def in_maintenance():
    return os.path.exists(maintenance_file)

while True:
    if in_maintenance():
        # Withdraw routes for maintenance
        for ip in service_ips:
            sys.stdout.write(f'withdraw route {ip}/32\n')
    else:
        # Normal operation
        if service_healthy():
            announce_routes()

Controlled Drain

Gradually increase metrics to drain traffic:

def drain_traffic(duration=300):  # 5 minutes
    steps = 10
    step_duration = duration / steps

    for i in range(steps):
        metric = 100 + (i * 100)  # 100 -> 1000
        announce_with_metric(metric)
        sleep(step_duration)

    # Final withdrawal
    withdraw_all_routes()

Monitoring and Alerting

Key Metrics to Monitor

  1. BGP Session State: Ensure sessions stay established
  2. Route Announcements: Track active announcements per node
  3. Failover Events: Count and time failovers
  4. Health Check Results: Success/failure rates
  5. Convergence Time: Time to failover completion

Example Prometheus Metrics

from prometheus_client import Counter, Gauge, Histogram

bgp_session_up = Gauge('bgp_session_up', 'BGP session status')
routes_announced = Gauge('routes_announced', 'Number of announced routes')
health_checks_total = Counter('health_checks_total', 'Health checks', ['status'])
failover_time = Histogram('failover_time_seconds', 'Failover duration')

Troubleshooting

Common Issues

1. Routes Not Appearing

  • Check BGP session state
  • Verify health check is passing
  • Confirm service IPs on loopback
  • Check routing policy filters

2. Slow Failover

  • Reduce health check interval
  • Tune BGP timers
  • Verify route server configuration
  • Check for delayed withdrawals

3. Flapping

  • Implement health check dampening
  • Add retry logic
  • Increase check intervals during instability
  • Use hysteresis (different thresholds for up/down)

Debug Commands

# Check ExaBGP status
exabgpcli show neighbor summary

# View announced routes
exabgpcli show adj-rib out

# Check BGP sessions on router
show bgp summary
show bgp ipv4 unicast neighbors

# Monitor health check script
tail -f /var/log/exabgp/healthcheck.log

Best Practices

  1. Use Route Servers: Simplifies management at scale
  2. Metric Strategy: Plan metrics carefully for load distribution
  3. Health Check Robustness: Multiple retries before failing
  4. Loopback IPs: Always configure service IPs on loopback
  5. Monitoring: Comprehensive monitoring of BGP and health checks
  6. Graceful Degradation: Use metrics for gradual failure, not binary
  7. Documentation: Document metric assignments and IP allocations
  8. Testing: Regularly test failover scenarios
  9. Logging: Log all health check state changes
  10. Automation: Automate deployment and configuration

Performance Considerations

  • BGP Convergence: Typically 5-15 seconds
  • Health Check Frequency: 5-10 seconds recommended
  • Resource Usage: ExaBGP is lightweight (<50MB RAM typical)
  • Scale: Can handle 100+ service IPs per node

Alternative Approaches

Why Not OSPF?

  • Doesn't scale well
  • Impacts entire network on misconfiguration
  • Limited route filtering
  • Restricts network topologies
  • Better to keep OSPF for network devices only

Why Not VRRP/Keepalived?

  • Active/passive only (no load distribution)
  • Limited to L2 domains
  • Manual metric configuration
  • No application-level health checks
  • Harder to integrate with modern orchestration

Why Not DNS Round-Robin?

  • No health checking
  • Client-side caching issues
  • Long TTLs delay failover
  • No real-time failover

References

  • Vincent Bernat's HA with ExaBGP blog
  • RIPE Labs articles on ExaBGP
  • ExaBGP GitHub wiki
  • BGP RFC 4271
  • BGP MED (RFC 4451)
Clone this wiki locally