-
Notifications
You must be signed in to change notification settings - Fork 459
Log Analysis
Analyzing, troubleshooting, and monitoring ExaBGP logs
π Logs tell the story - master log analysis for effective troubleshooting
- Overview
- Log Levels
- Log Configuration
- Log Format
- Parsing ExaBGP Logs
- Common Log Patterns
- Debugging Workflows
- Log Aggregation
- Alerting on Log Events
- Performance Analysis
- Best Practices
ExaBGP logs these events:
Event Type Example
-----------------------------------------
BGP session state neighbor up/down
Route announcements announce route X
Route withdrawals withdraw route X
Configuration changes reload config
Process events API process started/exited
Errors connection refused
Debug information BGP packet details
Default log destinations:
# Stdout (default)
exabgp /etc/exabgp/exabgp.conf
# Redirect to file
exabgp /etc/exabgp/exabgp.conf > /var/log/exabgp.log 2>&1
# Systemd journal
journalctl -u exabgp -f
# Syslog
logger -t exabgp "message"ExaBGP log levels (from most to least verbose):
Level Purpose
-----------------------------------------
DEBUG Detailed debugging (BGP packets, internal state)
INFO Normal operations (session up, routes announced)
WARNING Potential issues (retries, delays)
ERROR Errors (connection failures, invalid config)
CRITICAL Critical failures (process exit)
Environment variables:
# Set log level
export exabgp_log_level=INFO # Default
export exabgp_log_level=DEBUG # Verbose
export exabgp_log_level=WARNING # Quiet
exabgp /etc/exabgp/exabgp.confEnable specific subsystems:
# Log BGP packets
export exabgp_log_packets=true
# Log BGP messages (OPEN, UPDATE, KEEPALIVE, NOTIFICATION)
export exabgp_log_message=true
# Log configuration parsing
export exabgp_log_configuration=true
# Log process communication
export exabgp_log_processes=true
# Log network events (TCP connections)
export exabgp_log_network=true
# Log routes
export exabgp_log_routes=true
# Log timers
export exabgp_log_timers=true
# Enable ALL logging
export exabgp_log_all=true
exabgp /etc/exabgp/exabgp.confSystemd service with logging:
# /etc/systemd/system/exabgp.service
[Service]
ExecStart=/usr/local/bin/exabgp /etc/exabgp/exabgp.conf
# Log to file
StandardOutput=append:/var/log/exabgp/exabgp.log
StandardError=append:/var/log/exabgp/exabgp.log
# Also log to journal
SyslogIdentifier=exabgpView logs:
# Tail log file
tail -f /var/log/exabgp/exabgp.log
# View systemd journal
journalctl -u exabgp -f
# View last 100 lines
journalctl -u exabgp -n 100Send logs to syslog:
# ExaBGP wrapper script
#!/bin/bash
# /usr/local/bin/exabgp-wrapper.sh
exabgp /etc/exabgp/exabgp.conf 2>&1 | logger -t exabgp -p daemon.infoView syslog:
# Tail syslog
tail -f /var/log/syslog | grep exabgp
# Search syslog
grep exabgp /var/log/syslogConfigure log rotation:
# /etc/logrotate.d/exabgp
/var/log/exabgp/*.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
create 0640 exabgp exabgp
sharedscripts
postrotate
systemctl reload exabgp > /dev/null 2>&1 || true
endscript
}Test rotation:
logrotate -f /etc/logrotate.d/exabgpTypical log line format:
TIMESTAMP LEVEL COMPONENT MESSAGE
Examples:
2025-11-10 10:30:15 | INFO | 12345 | network | connected to peer 192.168.1.1
2025-11-10 10:30:16 | INFO | 12345 | neighbor | neighbor 192.168.1.1 up
2025-11-10 10:30:17 | INFO | 12345 | routes | announce route 100.10.0.100/32
2025-11-10 10:30:18 | WARNING | 12345 | process | API process exited with code 1
2025-11-10 10:30:19 | ERROR | 12345 | network | connection refused by 192.168.1.2
Components:
- Timestamp: When event occurred
- Level: Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
- PID: Process ID
- Component: Which part of ExaBGP (network, neighbor, routes, process)
- Message: Event description
Debug logs are verbose:
2025-11-10 10:30:15 | DEBUG | 12345 | wire | >> sent 19 bytes to 192.168.1.1
2025-11-10 10:30:15 | DEBUG | 12345 | wire | FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:0013:04
2025-11-10 10:30:15 | DEBUG | 12345 | message | >> KEEPALIVE
2025-11-10 10:30:16 | DEBUG | 12345 | wire | << received 19 bytes from 192.168.1.1
2025-11-10 10:30:16 | DEBUG | 12345 | wire | FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:0013:04
2025-11-10 10:30:16 | DEBUG | 12345 | message | << KEEPALIVE
Search for specific events:
# BGP session state changes
grep -E "neighbor.*(up|down)" /var/log/exabgp.log
# Route announcements
grep "announce route" /var/log/exabgp.log
# Route withdrawals
grep "withdraw route" /var/log/exabgp.log
# Errors
grep -E "ERROR|CRITICAL" /var/log/exabgp.log
# Specific neighbor
grep "192.168.1.1" /var/log/exabgp.log
# Today's events
grep "$(date +%Y-%m-%d)" /var/log/exabgp.logExtract specific fields:
# Count log levels
awk '{print $4}' /var/log/exabgp.log | sort | uniq -c
# Extract timestamps and messages
awk -F'|' '{print $1, $NF}' /var/log/exabgp.log
# Filter by log level
awk '$4 ~ /ERROR/ {print}' /var/log/exabgp.log
# Count events per hour
awk '{print substr($1,1,13)}' /var/log/exabgp.log | sort | uniq -cAdvanced log parsing:
#!/usr/bin/env python3
"""
Parse and analyze ExaBGP logs
"""
import re
from datetime import datetime
from collections import defaultdict
# Log line pattern
LOG_PATTERN = re.compile(
r'(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+\|\s+'
r'(?P<level>\w+)\s+\|\s+'
r'(?P<pid>\d+)\s+\|\s+'
r'(?P<component>\w+)\s+\|\s+'
r'(?P<message>.+)'
)
def parse_log_line(line):
"""Parse a log line"""
match = LOG_PATTERN.match(line)
if match:
return match.groupdict()
return None
def analyze_logs(log_file):
"""Analyze log file"""
stats = {
'total_lines': 0,
'by_level': defaultdict(int),
'by_component': defaultdict(int),
'bgp_sessions_up': 0,
'bgp_sessions_down': 0,
'routes_announced': 0,
'routes_withdrawn': 0,
'errors': [],
}
with open(log_file, 'r') as f:
for line in f:
stats['total_lines'] += 1
parsed = parse_log_line(line.strip())
if not parsed:
continue
# Count by level
stats['by_level'][parsed['level']] += 1
# Count by component
stats['by_component'][parsed['component']] += 1
# Track specific events
if 'neighbor' in parsed['message'] and 'up' in parsed['message']:
stats['bgp_sessions_up'] += 1
elif 'neighbor' in parsed['message'] and 'down' in parsed['message']:
stats['bgp_sessions_down'] += 1
elif 'announce route' in parsed['message']:
stats['routes_announced'] += 1
elif 'withdraw route' in parsed['message']:
stats['routes_withdrawn'] += 1
# Collect errors
if parsed['level'] in ['ERROR', 'CRITICAL']:
stats['errors'].append(parsed)
return stats
# Usage
if __name__ == '__main__':
stats = analyze_logs('/var/log/exabgp.log')
print(f"Total lines: {stats['total_lines']}")
print(f"\nBy level:")
for level, count in stats['by_level'].items():
print(f" {level}: {count}")
print(f"\nBGP sessions up: {stats['bgp_sessions_up']}")
print(f"BGP sessions down: {stats['bgp_sessions_down']}")
print(f"Routes announced: {stats['routes_announced']}")
print(f"Routes withdrawn: {stats['routes_withdrawn']}")
if stats['errors']:
print(f"\nErrors ({len(stats['errors'])}):")
for error in stats['errors'][:10]: # Show first 10
print(f" {error['timestamp']} {error['message']}")Log pattern:
INFO | network | connected to peer 192.168.1.1
INFO | neighbor | neighbor 192.168.1.1 up
Search:
grep "neighbor.*up" /var/log/exabgp.logLog pattern:
ERROR | network | connection refused by 192.168.1.1
WARNING | neighbor | neighbor 192.168.1.1 down - connection lost
Search:
grep -E "connection (refused|lost)" /var/log/exabgp.log
grep "neighbor.*down" /var/log/exabgp.logLog pattern:
INFO | routes | announce route 100.10.0.100/32 next-hop self
Search:
grep "announce route" /var/log/exabgp.log | tail -20Count announcements:
grep -c "announce route" /var/log/exabgp.logLog pattern:
INFO | routes | withdraw route 100.10.0.100/32
Search:
grep "withdraw route" /var/log/exabgp.log | tail -20Log pattern:
INFO | process | process 'healthcheck' started with pid 12346
WARNING | process | process 'healthcheck' exited with code 1
ERROR | process | process 'healthcheck' died
Search:
grep "process.*started" /var/log/exabgp.log
grep "process.*exited" /var/log/exabgp.log
grep "process.*died" /var/log/exabgp.logLog pattern:
INFO | configuration | reload configuration
INFO | configuration | configuration successfully parsed
Search:
grep "reload configuration" /var/log/exabgp.logLog pattern:
ERROR | neighbor | NOTIFICATION sent to peer 192.168.1.1 code 6 (Cease)
ERROR | neighbor | NOTIFICATION received from peer 192.168.1.1 code 2 (OPEN Message Error)
Search:
grep "NOTIFICATION" /var/log/exabgp.logCommon notification codes:
Code Subcode Meaning
-----------------------------------------
1 - Message Header Error
2 - OPEN Message Error
2 2 Bad Peer AS
3 - UPDATE Message Error
4 - Hold Timer Expired
5 - Finite State Machine Error
6 - Cease (graceful shutdown)
Step 1: Check for connection attempts:
grep -E "connecting|connected" /var/log/exabgp.log | tail -10Step 2: Look for connection errors:
grep -E "connection (refused|timeout|reset)" /var/log/exabgp.logStep 3: Check for NOTIFICATION messages:
grep "NOTIFICATION" /var/log/exabgp.log | tail -5Step 4: Enable debug logging:
export exabgp_log_level=DEBUG
export exabgp_log_network=true
exabgp /etc/exabgp/exabgp.conf 2>&1 | tee debug.logStep 1: Check if API process is running:
grep "process.*started" /var/log/exabgp.log | tail -5Step 2: Look for API process errors:
grep "process.*exited\|process.*died" /var/log/exabgp.log | tail -10Step 3: Check for route announcements:
grep "announce route" /var/log/exabgp.log | tail -20Step 4: Enable process debugging:
export exabgp_log_processes=true
export exabgp_log_routes=true
exabgp /etc/exabgp/exabgp.conf 2>&1 | tee debug.logStep 1: Check for excessive route churn:
# Count route updates per minute
grep -E "announce|withdraw" /var/log/exabgp.log | \
awk '{print substr($1,1,16)}' | \
uniq -c | \
sort -rn | \
head -20Step 2: Check for API process restarts:
grep "process.*exited" /var/log/exabgp.log | \
awk '{print substr($1,1,16)}' | \
uniq -cStep 3: Look for errors:
grep -E "ERROR|WARNING" /var/log/exabgp.log | tail -50Filebeat configuration:
# /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/exabgp/*.log
fields:
service: exabgp
environment: production
fields_under_root: true
# Multiline support for stack traces
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
output.elasticsearch:
hosts: ["localhost:9200"]
index: "exabgp-%{+yyyy.MM.dd}"
setup.kibana:
host: "localhost:5601"Logstash grok pattern:
# /etc/logstash/conf.d/exabgp.conf
input {
beats {
port => 5044
}
}
filter {
if [service] == "exabgp" {
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:timestamp} \| %{LOGLEVEL:level}\s+\| %{NUMBER:pid}\s+\| %{WORD:component}\s+\| %{GREEDYDATA:msg}"
}
}
date {
match => ["timestamp", "yyyy-MM-dd HH:mm:ss"]
target => "@timestamp"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "exabgp-%{+YYYY.MM.dd}"
}
}
Start services:
systemctl start elasticsearch
systemctl start logstash
systemctl start kibana
systemctl start filebeatSplunk forwarder configuration:
# /opt/splunkforwarder/etc/system/local/inputs.conf
[monitor:///var/log/exabgp/*.log]
disabled = false
index = exabgp
sourcetype = exabgp
# Field extraction
[exabgp]
EXTRACT-timestamp = ^(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})
EXTRACT-level = \| (?<level>\w+)\s+\|
EXTRACT-component = \| \w+\s+\| (?<component>\w+)
EXTRACT-message = \| \w+\s+\| \w+\s+\| (?<message>.+)Splunk searches:
# All ExaBGP logs
index=exabgp
# Errors only
index=exabgp level=ERROR OR level=CRITICAL
# BGP session changes
index=exabgp "neighbor" ("up" OR "down")
# Route churn rate
index=exabgp "announce route" OR "withdraw route"
| timechart span=1m count
# Top error messages
index=exabgp level=ERROR
| stats count by message
| sort -count
Graylog syslog input + rsyslog:
# /etc/rsyslog.d/exabgp.conf
# Forward ExaBGP logs to Graylog
if $programname == 'exabgp' then {
action(type="omfwd"
target="graylog.example.com"
port="514"
protocol="tcp")
}Alert on errors:
#!/bin/bash
# /usr/local/bin/exabgp-alert.sh
# Alert on errors in ExaBGP logs
LOG_FILE="/var/log/exabgp/exabgp.log"
ALERT_EMAIL="[email protected]"
# Tail log and alert on errors
tail -F "$LOG_FILE" | while read line; do
if echo "$line" | grep -qE "ERROR|CRITICAL"; then
# Send email
echo "$line" | mail -s "ExaBGP Error Alert" "$ALERT_EMAIL"
fi
doneRun as systemd service:
# /etc/systemd/system/exabgp-alert.service
[Unit]
Description=ExaBGP Error Alerts
After=exabgp.service
[Service]
ExecStart=/usr/local/bin/exabgp-alert.sh
Restart=always
[Install]
WantedBy=multi-user.targetAlert on critical events:
#!/usr/bin/env python3
"""
Send Slack alerts for ExaBGP critical events
"""
import sys
import re
import requests
import json
SLACK_WEBHOOK = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
def send_slack_alert(message, level='warning'):
"""Send alert to Slack"""
colors = {
'INFO': '#36a64f',
'WARNING': '#ff9900',
'ERROR': '#ff0000',
'CRITICAL': '#990000',
}
payload = {
"attachments": [{
"color": colors.get(level, '#808080'),
"title": f"ExaBGP Alert: {level}",
"text": message,
"footer": "ExaBGP Monitoring",
}]
}
requests.post(SLACK_WEBHOOK, json=payload)
# Tail log and send alerts
for line in sys.stdin:
# Parse log line
if 'ERROR' in line or 'CRITICAL' in line:
send_slack_alert(line.strip(), 'ERROR')
elif 'neighbor' in line and 'down' in line:
send_slack_alert(line.strip(), 'WARNING')
# Usage: tail -F /var/log/exabgp.log | python3 slack-alert.pyAlert on critical failures:
#!/usr/bin/env python3
"""
PagerDuty alerts for ExaBGP
"""
import sys
import requests
PAGERDUTY_KEY = "YOUR_ROUTING_KEY"
def trigger_pagerduty(description, severity='error'):
"""Trigger PagerDuty incident"""
url = "https://events.pagerduty.com/v2/enqueue"
payload = {
"routing_key": PAGERDUTY_KEY,
"event_action": "trigger",
"payload": {
"summary": description,
"severity": severity,
"source": "exabgp",
}
}
requests.post(url, json=payload)
# Monitor for critical events
for line in sys.stdin:
if 'CRITICAL' in line:
trigger_pagerduty(line.strip(), severity='critical')
elif 'ERROR' in line and 'neighbor' in line and 'down' in line:
trigger_pagerduty(line.strip(), severity='error')
# Usage: tail -F /var/log/exabgp.log | python3 pagerduty-alert.pyCount route updates per time period:
#!/bin/bash
# Analyze route update rate
echo "Route updates per minute:"
grep -E "announce|withdraw" /var/log/exabgp.log | \
awk '{print substr($1,1,16)}' | \
uniq -c | \
sort -rn | \
head -20Python version:
#!/usr/bin/env python3
"""
Analyze route update rate
"""
from collections import defaultdict
import re
updates_per_minute = defaultdict(int)
with open('/var/log/exabgp.log', 'r') as f:
for line in f:
if 'announce' in line or 'withdraw' in line:
# Extract timestamp (YYYY-MM-DD HH:MM)
match = re.match(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2})', line)
if match:
minute = match.group(1)
updates_per_minute[minute] += 1
# Print top 20
for minute, count in sorted(updates_per_minute.items(),
key=lambda x: x[1],
reverse=True)[:20]:
print(f"{minute}: {count} updates/min")Track session up/down events:
#!/bin/bash
# Track BGP session stability
echo "BGP session state changes:"
grep -E "neighbor.*(up|down)" /var/log/exabgp.log | \
awk '{print $1, $2, $NF}' | \
tail -50Calculate uptime:
#!/usr/bin/env python3
"""
Calculate BGP session uptime
"""
from datetime import datetime
import re
sessions = {}
with open('/var/log/exabgp.log', 'r') as f:
for line in f:
# Parse timestamp
match = re.match(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})', line)
if not match:
continue
timestamp = datetime.strptime(match.group(1), '%Y-%m-%d %H:%M:%S')
# Check for neighbor events
if 'neighbor' in line:
# Extract neighbor IP
neighbor_match = re.search(r'neighbor (\S+)', line)
if neighbor_match:
neighbor = neighbor_match.group(1)
if 'up' in line:
sessions[neighbor] = {'up': timestamp, 'down': None}
elif 'down' in line:
if neighbor in sessions and sessions[neighbor]['down'] is None:
sessions[neighbor]['down'] = timestamp
# Calculate uptime
for neighbor, events in sessions.items():
if events['up'] and events['down']:
uptime = events['down'] - events['up']
print(f"{neighbor}: {uptime}")
elif events['up']:
print(f"{neighbor}: Currently up since {events['up']}")1. Use appropriate log levels:
# Production: INFO or WARNING
export exabgp_log_level=INFO
# Troubleshooting: DEBUG
export exabgp_log_level=DEBUG
# Quiet: WARNING
export exabgp_log_level=WARNING2. Rotate logs regularly:
# Keep 14 days of logs
rotate 14
# Compress old logs
compress3. Centralize logs:
- Send to syslog/ELK/Splunk
- Enable correlation across systems
- Implement retention policies
4. Monitor log growth:
# Check log size
du -sh /var/log/exabgp/
# Alert if growing too fast
watch -n 300 'du -sh /var/log/exabgp/'5. Secure log files:
chmod 640 /var/log/exabgp/*.log
chown exabgp:exabgp /var/log/exabgp/*.logAlways log:
- BGP session state changes
- Route announcements/withdrawals
- API process crashes
- Errors and warnings
- Configuration reloads
Optionally log (debug only):
- BGP packets (wire format)
- BGP messages (KEEPALIVE, UPDATE)
- Timer events
- Process communication
Regular analysis tasks:
- Check for errors daily
- Monitor BGP session stability
- Track route update rates
- Review API process health
- Verify log rotation working
- Test alerting mechanisms
- Archive old logs
- Update log parsing scripts
- Debugging - Troubleshooting guide
- Monitoring - Production monitoring
- Performance Tuning - Optimize ExaBGP
- Kibana - Log visualization
- Grafana Loki - Log aggregation
- Splunk - Enterprise log management
Need help with logs? Join our Slack community β
π» Ghost written by Claude (Anthropic AI)
π Home
π Getting Started
π§ API
π‘οΈ Use Cases
π Address Families
βοΈ Configuration
π Operations
π Reference
- Architecture
- BGP State Machine
- Communities (RFC)
- Extended Communities
- BGP Ecosystem
- Capabilities (AFI/SAFI)
- RFC Support
π Migration
π Community
π External
- GitHub Repo β
- Slack β
- Issues β
π» Ghost written by Claude (Anthropic AI)