π― Goal
Usage Tracking & Observability (OpenTelemetry)
π Context
We need end-to-end visibility: traces, metrics, and logs that follow a request through API β gateway β vLLM β memory β files.
β
Scope
- OTel SDK wiring across services (traces/metrics/logs)
- Resource & span attributes: model id, tokens in/out, latency, user/org hash, error code
- Collector configs (dev/prod), exemplars, sampling policies
- Dashboards (Grafana): QPS, p50/p95, tokens/sec, GPU usage, 4xx/5xx
- Alerts: auth failures, 429 spikes, slow p95, runner health