Skip to content

Log Analysis

Thomas Mangin edited this page Nov 13, 2025 · 4 revisions

ExaBGP Log Analysis

Analyzing, troubleshooting, and monitoring ExaBGP logs

πŸ“ Logs tell the story - master log analysis for effective troubleshooting


Table of Contents


Overview

What ExaBGP Logs

ExaBGP logs these events:

Event Type              Example
-----------------------------------------
BGP session state       neighbor up/down
Route announcements     announce route X
Route withdrawals       withdraw route X
Configuration changes   reload config
Process events          API process started/exited
Errors                  connection refused
Debug information       BGP packet details

Log Locations

Default log destinations:

# Stdout (default)
exabgp /etc/exabgp/exabgp.conf

# Redirect to file
exabgp /etc/exabgp/exabgp.conf > /var/log/exabgp.log 2>&1

# Systemd journal
journalctl -u exabgp -f

# Syslog
logger -t exabgp "message"

Log Levels

Available Log Levels

ExaBGP log levels (from most to least verbose):

Level       Purpose
-----------------------------------------
DEBUG       Detailed debugging (BGP packets, internal state)
INFO        Normal operations (session up, routes announced)
WARNING     Potential issues (retries, delays)
ERROR       Errors (connection failures, invalid config)
CRITICAL    Critical failures (process exit)

Configure Log Level

Environment variables:

# Set log level
export exabgp_log_level=INFO        # Default
export exabgp_log_level=DEBUG       # Verbose
export exabgp_log_level=WARNING     # Quiet

exabgp /etc/exabgp/exabgp.conf

Enable specific subsystems:

# Log BGP packets
export exabgp_log_packets=true

# Log BGP messages (OPEN, UPDATE, KEEPALIVE, NOTIFICATION)
export exabgp_log_message=true

# Log configuration parsing
export exabgp_log_configuration=true

# Log process communication
export exabgp_log_processes=true

# Log network events (TCP connections)
export exabgp_log_network=true

# Log routes
export exabgp_log_routes=true

# Log timers
export exabgp_log_timers=true

# Enable ALL logging
export exabgp_log_all=true

exabgp /etc/exabgp/exabgp.conf

Log Configuration

Systemd Logging

Systemd service with logging:

# /etc/systemd/system/exabgp.service
[Service]
ExecStart=/usr/local/bin/exabgp /etc/exabgp/exabgp.conf

# Log to file
StandardOutput=append:/var/log/exabgp/exabgp.log
StandardError=append:/var/log/exabgp/exabgp.log

# Also log to journal
SyslogIdentifier=exabgp

View logs:

# Tail log file
tail -f /var/log/exabgp/exabgp.log

# View systemd journal
journalctl -u exabgp -f

# View last 100 lines
journalctl -u exabgp -n 100

Syslog Configuration

Send logs to syslog:

# ExaBGP wrapper script
#!/bin/bash
# /usr/local/bin/exabgp-wrapper.sh

exabgp /etc/exabgp/exabgp.conf 2>&1 | logger -t exabgp -p daemon.info

View syslog:

# Tail syslog
tail -f /var/log/syslog | grep exabgp

# Search syslog
grep exabgp /var/log/syslog

Log Rotation

Configure log rotation:

# /etc/logrotate.d/exabgp
/var/log/exabgp/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    create 0640 exabgp exabgp
    sharedscripts
    postrotate
        systemctl reload exabgp > /dev/null 2>&1 || true
    endscript
}

Test rotation:

logrotate -f /etc/logrotate.d/exabgp

Log Format

Log Line Anatomy

Typical log line format:

TIMESTAMP LEVEL COMPONENT MESSAGE

Examples:

2025-11-10 10:30:15 | INFO     | 12345 | network    | connected to peer 192.168.1.1
2025-11-10 10:30:16 | INFO     | 12345 | neighbor   | neighbor 192.168.1.1 up
2025-11-10 10:30:17 | INFO     | 12345 | routes     | announce route 100.10.0.100/32
2025-11-10 10:30:18 | WARNING  | 12345 | process    | API process exited with code 1
2025-11-10 10:30:19 | ERROR    | 12345 | network    | connection refused by 192.168.1.2

Components:

  • Timestamp: When event occurred
  • Level: Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
  • PID: Process ID
  • Component: Which part of ExaBGP (network, neighbor, routes, process)
  • Message: Event description

Debug Log Format

Debug logs are verbose:

2025-11-10 10:30:15 | DEBUG    | 12345 | wire       | >> sent 19 bytes to 192.168.1.1
2025-11-10 10:30:15 | DEBUG    | 12345 | wire       | FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:0013:04
2025-11-10 10:30:15 | DEBUG    | 12345 | message    | >> KEEPALIVE
2025-11-10 10:30:16 | DEBUG    | 12345 | wire       | << received 19 bytes from 192.168.1.1
2025-11-10 10:30:16 | DEBUG    | 12345 | wire       | FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:0013:04
2025-11-10 10:30:16 | DEBUG    | 12345 | message    | << KEEPALIVE

Parsing ExaBGP Logs

Using Grep

Search for specific events:

# BGP session state changes
grep -E "neighbor.*(up|down)" /var/log/exabgp.log

# Route announcements
grep "announce route" /var/log/exabgp.log

# Route withdrawals
grep "withdraw route" /var/log/exabgp.log

# Errors
grep -E "ERROR|CRITICAL" /var/log/exabgp.log

# Specific neighbor
grep "192.168.1.1" /var/log/exabgp.log

# Today's events
grep "$(date +%Y-%m-%d)" /var/log/exabgp.log

Using Awk

Extract specific fields:

# Count log levels
awk '{print $4}' /var/log/exabgp.log | sort | uniq -c

# Extract timestamps and messages
awk -F'|' '{print $1, $NF}' /var/log/exabgp.log

# Filter by log level
awk '$4 ~ /ERROR/ {print}' /var/log/exabgp.log

# Count events per hour
awk '{print substr($1,1,13)}' /var/log/exabgp.log | sort | uniq -c

Parsing with Python

Advanced log parsing:

#!/usr/bin/env python3
"""
Parse and analyze ExaBGP logs
"""
import re
from datetime import datetime
from collections import defaultdict

# Log line pattern
LOG_PATTERN = re.compile(
    r'(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+\|\s+'
    r'(?P<level>\w+)\s+\|\s+'
    r'(?P<pid>\d+)\s+\|\s+'
    r'(?P<component>\w+)\s+\|\s+'
    r'(?P<message>.+)'
)

def parse_log_line(line):
    """Parse a log line"""
    match = LOG_PATTERN.match(line)
    if match:
        return match.groupdict()
    return None

def analyze_logs(log_file):
    """Analyze log file"""
    stats = {
        'total_lines': 0,
        'by_level': defaultdict(int),
        'by_component': defaultdict(int),
        'bgp_sessions_up': 0,
        'bgp_sessions_down': 0,
        'routes_announced': 0,
        'routes_withdrawn': 0,
        'errors': [],
    }

    with open(log_file, 'r') as f:
        for line in f:
            stats['total_lines'] += 1

            parsed = parse_log_line(line.strip())
            if not parsed:
                continue

            # Count by level
            stats['by_level'][parsed['level']] += 1

            # Count by component
            stats['by_component'][parsed['component']] += 1

            # Track specific events
            if 'neighbor' in parsed['message'] and 'up' in parsed['message']:
                stats['bgp_sessions_up'] += 1
            elif 'neighbor' in parsed['message'] and 'down' in parsed['message']:
                stats['bgp_sessions_down'] += 1
            elif 'announce route' in parsed['message']:
                stats['routes_announced'] += 1
            elif 'withdraw route' in parsed['message']:
                stats['routes_withdrawn'] += 1

            # Collect errors
            if parsed['level'] in ['ERROR', 'CRITICAL']:
                stats['errors'].append(parsed)

    return stats

# Usage
if __name__ == '__main__':
    stats = analyze_logs('/var/log/exabgp.log')

    print(f"Total lines: {stats['total_lines']}")
    print(f"\nBy level:")
    for level, count in stats['by_level'].items():
        print(f"  {level}: {count}")

    print(f"\nBGP sessions up: {stats['bgp_sessions_up']}")
    print(f"BGP sessions down: {stats['bgp_sessions_down']}")
    print(f"Routes announced: {stats['routes_announced']}")
    print(f"Routes withdrawn: {stats['routes_withdrawn']}")

    if stats['errors']:
        print(f"\nErrors ({len(stats['errors'])}):")
        for error in stats['errors'][:10]:  # Show first 10
            print(f"  {error['timestamp']} {error['message']}")

Common Log Patterns

BGP Session Established

Log pattern:

INFO | network    | connected to peer 192.168.1.1
INFO | neighbor   | neighbor 192.168.1.1 up

Search:

grep "neighbor.*up" /var/log/exabgp.log

BGP Session Failed

Log pattern:

ERROR | network    | connection refused by 192.168.1.1
WARNING | neighbor   | neighbor 192.168.1.1 down - connection lost

Search:

grep -E "connection (refused|lost)" /var/log/exabgp.log
grep "neighbor.*down" /var/log/exabgp.log

Route Announcement

Log pattern:

INFO | routes     | announce route 100.10.0.100/32 next-hop self

Search:

grep "announce route" /var/log/exabgp.log | tail -20

Count announcements:

grep -c "announce route" /var/log/exabgp.log

Route Withdrawal

Log pattern:

INFO | routes     | withdraw route 100.10.0.100/32

Search:

grep "withdraw route" /var/log/exabgp.log | tail -20

API Process Events

Log pattern:

INFO | process    | process 'healthcheck' started with pid 12346
WARNING | process    | process 'healthcheck' exited with code 1
ERROR | process    | process 'healthcheck' died

Search:

grep "process.*started" /var/log/exabgp.log
grep "process.*exited" /var/log/exabgp.log
grep "process.*died" /var/log/exabgp.log

Configuration Reload

Log pattern:

INFO | configuration | reload configuration
INFO | configuration | configuration successfully parsed

Search:

grep "reload configuration" /var/log/exabgp.log

BGP NOTIFICATION (Error)

Log pattern:

ERROR | neighbor   | NOTIFICATION sent to peer 192.168.1.1 code 6 (Cease)
ERROR | neighbor   | NOTIFICATION received from peer 192.168.1.1 code 2 (OPEN Message Error)

Search:

grep "NOTIFICATION" /var/log/exabgp.log

Common notification codes:

Code    Subcode     Meaning
-----------------------------------------
1       -           Message Header Error
2       -           OPEN Message Error
2       2           Bad Peer AS
3       -           UPDATE Message Error
4       -           Hold Timer Expired
5       -           Finite State Machine Error
6       -           Cease (graceful shutdown)

Debugging Workflows

Workflow 1: BGP Session Won't Establish

Step 1: Check for connection attempts:

grep -E "connecting|connected" /var/log/exabgp.log | tail -10

Step 2: Look for connection errors:

grep -E "connection (refused|timeout|reset)" /var/log/exabgp.log

Step 3: Check for NOTIFICATION messages:

grep "NOTIFICATION" /var/log/exabgp.log | tail -5

Step 4: Enable debug logging:

export exabgp_log_level=DEBUG
export exabgp_log_network=true
exabgp /etc/exabgp/exabgp.conf 2>&1 | tee debug.log

Workflow 2: Routes Not Being Announced

Step 1: Check if API process is running:

grep "process.*started" /var/log/exabgp.log | tail -5

Step 2: Look for API process errors:

grep "process.*exited\|process.*died" /var/log/exabgp.log | tail -10

Step 3: Check for route announcements:

grep "announce route" /var/log/exabgp.log | tail -20

Step 4: Enable process debugging:

export exabgp_log_processes=true
export exabgp_log_routes=true
exabgp /etc/exabgp/exabgp.conf 2>&1 | tee debug.log

Workflow 3: High Memory/CPU Usage

Step 1: Check for excessive route churn:

# Count route updates per minute
grep -E "announce|withdraw" /var/log/exabgp.log | \
    awk '{print substr($1,1,16)}' | \
    uniq -c | \
    sort -rn | \
    head -20

Step 2: Check for API process restarts:

grep "process.*exited" /var/log/exabgp.log | \
    awk '{print substr($1,1,16)}' | \
    uniq -c

Step 3: Look for errors:

grep -E "ERROR|WARNING" /var/log/exabgp.log | tail -50

Log Aggregation

ELK Stack (Elasticsearch, Logstash, Kibana)

Filebeat configuration:

# /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/exabgp/*.log
  fields:
    service: exabgp
    environment: production
  fields_under_root: true

  # Multiline support for stack traces
  multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
  multiline.negate: true
  multiline.match: after

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

output.elasticsearch:
  hosts: ["localhost:9200"]
  index: "exabgp-%{+yyyy.MM.dd}"

setup.kibana:
  host: "localhost:5601"

Logstash grok pattern:

# /etc/logstash/conf.d/exabgp.conf
input {
  beats {
    port => 5044
  }
}

filter {
  if [service] == "exabgp" {
    grok {
      match => {
        "message" => "%{TIMESTAMP_ISO8601:timestamp} \| %{LOGLEVEL:level}\s+\| %{NUMBER:pid}\s+\| %{WORD:component}\s+\| %{GREEDYDATA:msg}"
      }
    }

    date {
      match => ["timestamp", "yyyy-MM-dd HH:mm:ss"]
      target => "@timestamp"
    }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "exabgp-%{+YYYY.MM.dd}"
  }
}

Start services:

systemctl start elasticsearch
systemctl start logstash
systemctl start kibana
systemctl start filebeat

Splunk Integration

Splunk forwarder configuration:

# /opt/splunkforwarder/etc/system/local/inputs.conf
[monitor:///var/log/exabgp/*.log]
disabled = false
index = exabgp
sourcetype = exabgp

# Field extraction
[exabgp]
EXTRACT-timestamp = ^(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})
EXTRACT-level = \| (?<level>\w+)\s+\|
EXTRACT-component = \| \w+\s+\| (?<component>\w+)
EXTRACT-message = \| \w+\s+\| \w+\s+\| (?<message>.+)

Splunk searches:

# All ExaBGP logs
index=exabgp

# Errors only
index=exabgp level=ERROR OR level=CRITICAL

# BGP session changes
index=exabgp "neighbor" ("up" OR "down")

# Route churn rate
index=exabgp "announce route" OR "withdraw route"
| timechart span=1m count

# Top error messages
index=exabgp level=ERROR
| stats count by message
| sort -count

Graylog Integration

Graylog syslog input + rsyslog:

# /etc/rsyslog.d/exabgp.conf

# Forward ExaBGP logs to Graylog
if $programname == 'exabgp' then {
    action(type="omfwd"
           target="graylog.example.com"
           port="514"
           protocol="tcp")
}

Alerting on Log Events

Simple Email Alerts

Alert on errors:

#!/bin/bash
# /usr/local/bin/exabgp-alert.sh
# Alert on errors in ExaBGP logs

LOG_FILE="/var/log/exabgp/exabgp.log"
ALERT_EMAIL="[email protected]"

# Tail log and alert on errors
tail -F "$LOG_FILE" | while read line; do
    if echo "$line" | grep -qE "ERROR|CRITICAL"; then
        # Send email
        echo "$line" | mail -s "ExaBGP Error Alert" "$ALERT_EMAIL"
    fi
done

Run as systemd service:

# /etc/systemd/system/exabgp-alert.service
[Unit]
Description=ExaBGP Error Alerts
After=exabgp.service

[Service]
ExecStart=/usr/local/bin/exabgp-alert.sh
Restart=always

[Install]
WantedBy=multi-user.target

Slack Alerts

Alert on critical events:

#!/usr/bin/env python3
"""
Send Slack alerts for ExaBGP critical events
"""
import sys
import re
import requests
import json

SLACK_WEBHOOK = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

def send_slack_alert(message, level='warning'):
    """Send alert to Slack"""
    colors = {
        'INFO': '#36a64f',
        'WARNING': '#ff9900',
        'ERROR': '#ff0000',
        'CRITICAL': '#990000',
    }

    payload = {
        "attachments": [{
            "color": colors.get(level, '#808080'),
            "title": f"ExaBGP Alert: {level}",
            "text": message,
            "footer": "ExaBGP Monitoring",
        }]
    }

    requests.post(SLACK_WEBHOOK, json=payload)

# Tail log and send alerts
for line in sys.stdin:
    # Parse log line
    if 'ERROR' in line or 'CRITICAL' in line:
        send_slack_alert(line.strip(), 'ERROR')
    elif 'neighbor' in line and 'down' in line:
        send_slack_alert(line.strip(), 'WARNING')

# Usage: tail -F /var/log/exabgp.log | python3 slack-alert.py

PagerDuty Integration

Alert on critical failures:

#!/usr/bin/env python3
"""
PagerDuty alerts for ExaBGP
"""
import sys
import requests

PAGERDUTY_KEY = "YOUR_ROUTING_KEY"

def trigger_pagerduty(description, severity='error'):
    """Trigger PagerDuty incident"""
    url = "https://events.pagerduty.com/v2/enqueue"

    payload = {
        "routing_key": PAGERDUTY_KEY,
        "event_action": "trigger",
        "payload": {
            "summary": description,
            "severity": severity,
            "source": "exabgp",
        }
    }

    requests.post(url, json=payload)

# Monitor for critical events
for line in sys.stdin:
    if 'CRITICAL' in line:
        trigger_pagerduty(line.strip(), severity='critical')
    elif 'ERROR' in line and 'neighbor' in line and 'down' in line:
        trigger_pagerduty(line.strip(), severity='error')

# Usage: tail -F /var/log/exabgp.log | python3 pagerduty-alert.py

Performance Analysis

Analyze Route Update Rate

Count route updates per time period:

#!/bin/bash
# Analyze route update rate

echo "Route updates per minute:"
grep -E "announce|withdraw" /var/log/exabgp.log | \
    awk '{print substr($1,1,16)}' | \
    uniq -c | \
    sort -rn | \
    head -20

Python version:

#!/usr/bin/env python3
"""
Analyze route update rate
"""
from collections import defaultdict
import re

updates_per_minute = defaultdict(int)

with open('/var/log/exabgp.log', 'r') as f:
    for line in f:
        if 'announce' in line or 'withdraw' in line:
            # Extract timestamp (YYYY-MM-DD HH:MM)
            match = re.match(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2})', line)
            if match:
                minute = match.group(1)
                updates_per_minute[minute] += 1

# Print top 20
for minute, count in sorted(updates_per_minute.items(),
                            key=lambda x: x[1],
                            reverse=True)[:20]:
    print(f"{minute}: {count} updates/min")

Analyze BGP Session Stability

Track session up/down events:

#!/bin/bash
# Track BGP session stability

echo "BGP session state changes:"
grep -E "neighbor.*(up|down)" /var/log/exabgp.log | \
    awk '{print $1, $2, $NF}' | \
    tail -50

Calculate uptime:

#!/usr/bin/env python3
"""
Calculate BGP session uptime
"""
from datetime import datetime
import re

sessions = {}

with open('/var/log/exabgp.log', 'r') as f:
    for line in f:
        # Parse timestamp
        match = re.match(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})', line)
        if not match:
            continue

        timestamp = datetime.strptime(match.group(1), '%Y-%m-%d %H:%M:%S')

        # Check for neighbor events
        if 'neighbor' in line:
            # Extract neighbor IP
            neighbor_match = re.search(r'neighbor (\S+)', line)
            if neighbor_match:
                neighbor = neighbor_match.group(1)

                if 'up' in line:
                    sessions[neighbor] = {'up': timestamp, 'down': None}
                elif 'down' in line:
                    if neighbor in sessions and sessions[neighbor]['down'] is None:
                        sessions[neighbor]['down'] = timestamp

# Calculate uptime
for neighbor, events in sessions.items():
    if events['up'] and events['down']:
        uptime = events['down'] - events['up']
        print(f"{neighbor}: {uptime}")
    elif events['up']:
        print(f"{neighbor}: Currently up since {events['up']}")

Best Practices

Logging Best Practices

1. Use appropriate log levels:

# Production: INFO or WARNING
export exabgp_log_level=INFO

# Troubleshooting: DEBUG
export exabgp_log_level=DEBUG

# Quiet: WARNING
export exabgp_log_level=WARNING

2. Rotate logs regularly:

# Keep 14 days of logs
rotate 14

# Compress old logs
compress

3. Centralize logs:

  • Send to syslog/ELK/Splunk
  • Enable correlation across systems
  • Implement retention policies

4. Monitor log growth:

# Check log size
du -sh /var/log/exabgp/

# Alert if growing too fast
watch -n 300 'du -sh /var/log/exabgp/'

5. Secure log files:

chmod 640 /var/log/exabgp/*.log
chown exabgp:exabgp /var/log/exabgp/*.log

What to Log

Always log:

  • BGP session state changes
  • Route announcements/withdrawals
  • API process crashes
  • Errors and warnings
  • Configuration reloads

Optionally log (debug only):

  • BGP packets (wire format)
  • BGP messages (KEEPALIVE, UPDATE)
  • Timer events
  • Process communication

Log Analysis Checklist

Regular analysis tasks:

  • Check for errors daily
  • Monitor BGP session stability
  • Track route update rates
  • Review API process health
  • Verify log rotation working
  • Test alerting mechanisms
  • Archive old logs
  • Update log parsing scripts

Next Steps

Learn More

Tools


Need help with logs? Join our Slack community β†’


πŸ‘» Ghost written by Claude (Anthropic AI)

Clone this wiki locally