Skip to content

[Bug]: missing metrics for OpenTelemetry as alternative to Prometheus #13646

@nekomeowww

Description

@nekomeowww

What happened?

As mentioned in #13644, since there is no possible way to prevent the /metrics to be accessed from external users without gateway protecting it, I attempt to use OpenTelemetry with OTEL SDK over OTLP packet to send metrics to Prometheus's native OTEL receiver https://prometheus.io/docs/guides/opentelemetry/.

However, when configured with only:

  • OTEL_TRACES_EXPORTER="none"
  • OTEL_LOGS_EXPORTER="none"
  • OTEL_EXPORTER_OTLP_METRICS_PROTOCOL="http/protobuf"
  • OTEL_EXPORTER_OTLP_METRICS_ENDPOINT="http://prometheus:9090/api/v2/otlp/v1/metrics"
  • OTEL_METRIC_EXPORT_INTERVAL="15000"
  • OTEL_SERVICE_NAME="litellm"

Being set.

There is no metrics being sent to Prometheus, but only Traces to OTEL endpoint (well haven't configured but OTEL should be able to cover all.

Related configurations

Minimal reproduction

Full Kubernetes configurations below with minimum reproduction
apiVersion: apps/v1
kind: Deployment
metadata:
  name: litellm
  labels:
    app: litellm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: litellm
  template:
    metadata:
      labels:
        app: litellm
    spec:
      initContainers:
      - name: init-otel
        image: ghcr.io/berriai/litellm:main-stable
        command:
        - sh
        - -c
        - pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
      containers:
      - name: worker
        image: ghcr.io/berriai/litellm:main-stable
        args:
        - --config=/app/config.yaml
        - --debug
        ports:
        - containerPort: 4000
        env:
        - name: 'DEBUG_OTEL'
          value: 'True'
        - name: 'OTEL_EXPORTER_OTLP_METRICS_PROTOCOL'
          value: 'http/protobuf'
        - name: 'OTEL_EXPORTER_OTLP_METRICS_ENDPOINT'
          value: 'http://prometheus:9090/api/v2/otlp/v1/metrics'
        - name: 'OTEL_TRACES_EXPORTER'
          value: 'none'
        - name: 'OTEL_LOGS_EXPORTER'
          value: 'none'
        - name: 'OTEL_METRIC_EXPORT_INTERVAL'
          value: '15000'
        - name: 'OTEL_SERVICE_NAME'
          value: 'litellm'
        - name: 'LITELLM_LICENSE'
          value: ''
        - name: 'OPENROUTER_API_BASE'
          value: 'https://openrouter.ai/api/v1'
        - name: 'OPENROUTER_API_KEY'
          value: ''
        - name: 'NO_DOCS'
          value: 'True'
        volumeMounts:
        - name: litellm-config
          mountPath: /app/config.yaml
          subPath: config.yaml
        resources:
          requests:
            memory: "4Gi"
            cpu: "4"
          limits:
            memory: "8Gi"
            cpu: "8"
      volumes:
      - name: litellm-config
        configMap:
          name: litellm-config
          items:
          - key: config.yaml
            path: config.yaml
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: litellm
  name: litellm
spec:
  type: ClusterIP
  sessionAffinity: None
  ports:
  - port: 4000
    protocol: TCP
    targetPort: 4000
  selector:
    app: litellm
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus
        args:
        - "--config.file=/etc/prometheus/prometheus.yaml"
        - "--web.enable-otlp-receiver"
        - "--storage.tsdb.path=/prometheus"
        - "--storage.tsdb.retention.time=15d"
        ports:
        - containerPort: 9090
        volumeMounts:
        - name: prometheus-data
          mountPath: /prometheus
        - name: prometheus-config
          mountPath: /etc/prometheus
        resources:
          requests:
            memory: "1Gi"
            cpu: "1"
          limits:
            memory: "2Gi"
            cpu: "2"
      volumes:
      - name: prometheus-data
        emptyDir: {}
      - name: prometheus-config
        configMap:
          name: prometheus-config
          items:
          - key: prometheus.yaml
            path: prometheus.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: litellm-config
data:
  config.yaml: |
    model_list:
    - model_name: gpt-4o-mini
      litellm_params:
        model: openai/openai/gpt-4o-mini
        api_base: os.environ/OPENROUTER_API_BASE
        api_key: os.environ/OPENROUTER_API_KEY
    litellm_settings:
      callbacks: 
      # - "prometheus"
      - "otel"
    general_settings:
      master_key: sk-1234
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yaml: |
    global:
      scrape_interval: 15s
scrape_configs:
- job_name: 'litellm'
  static_configs:
  - targets: ['litellm:4000']  # Assuming Litellm exposes metrics at port 4000

Relevant log output

{
    "name": "auth",
    "context": {
        "trace_id": "0xb694a4da1e730ccfb33cf7b61065ed8a",
        "span_id": "0x10a312787a5b87fd",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0xc9654eba8734c9b6",
    "start_time": "2025-08-15T09:11:32.632974Z",
    "end_time": "2025-08-15T09:11:32.648195Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {
        "call_type": "/v1/chat/completions",
        "service": "auth"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "litellm",
            "deployment.environment": "production",
            "model_id": "litellm"
        },
        "schema_url": ""
    }
}
{
    "name": "auth",
    "context": {
        "trace_id": "0xb694a4da1e730ccfb33cf7b61065ed8a",
        "span_id": "0x8de0624c2f93e021",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0xc9654eba8734c9b6",
    "start_time": "2025-08-15T09:11:32.632974Z",
    "end_time": "2025-08-15T09:11:32.648195Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {
        "call_type": "/v1/chat/completions",
        "service": "auth"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "litellm",
            "deployment.environment": "production",
            "model_id": "litellm"
        },
        "schema_url": ""
    }
}
{
    "name": "proxy_pre_call",
    "context": {
        "trace_id": "0xb694a4da1e730ccfb33cf7b61065ed8a",
        "span_id": "0x1c8c9cab02e77b51",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0xc9654eba8734c9b6",
    "start_time": "2025-08-15T09:11:32.662129Z",
    "end_time": "2025-08-15T09:11:32.662223Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {
        "call_type": "add_litellm_data_to_request",
        "service": "proxy_pre_call"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "litellm",
            "deployment.environment": "production",
            "model_id": "litellm"
        },
        "schema_url": ""
    }
}
{
    "name": "proxy_pre_call",
    "context": {
        "trace_id": "0xb694a4da1e730ccfb33cf7b61065ed8a",
        "span_id": "0xbc8f387248abeb66",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0xc9654eba8734c9b6",
    "start_time": "2025-08-15T09:11:32.662129Z",
    "end_time": "2025-08-15T09:11:32.662223Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {
        "call_type": "add_litellm_data_to_request",
        "service": "proxy_pre_call"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "litellm",
            "deployment.environment": "production",
            "model_id": "litellm"
        },
        "schema_url": ""
    }
}
{
    "name": "router",
    "context": {
        "trace_id": "0xb694a4da1e730ccfb33cf7b61065ed8a",
        "span_id": "0x07d24ddf48ba19b1",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0xc9654eba8734c9b6",
    "start_time": "2025-08-15T09:11:32.668822Z",
    "end_time": "2025-08-15T09:11:32.669529Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {
        "call_type": "async_get_available_deployment",
        "service": "router"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "litellm",
            "deployment.environment": "production",
            "model_id": "litellm"
        },
        "schema_url": ""
    }
}
{
    "name": "router",
    "context": {
        "trace_id": "0xb694a4da1e730ccfb33cf7b61065ed8a",
        "span_id": "0x0b9705d50fa611ac",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0xc9654eba8734c9b6",
    "start_time": "2025-08-15T09:11:32.668822Z",
    "end_time": "2025-08-15T09:11:32.669529Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {
        "call_type": "async_get_available_deployment",
        "service": "router"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "litellm",
            "deployment.environment": "production",
            "model_id": "litellm"
        },
        "schema_url": ""
    }
}
{
    "name": "router",
    "context": {
        "trace_id": "0xb694a4da1e730ccfb33cf7b61065ed8a",
        "span_id": "0xe97f3b0ce105b3bd",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0xc9654eba8734c9b6",
    "start_time": "2025-08-15T09:11:32.668347Z",
    "end_time": "2025-08-15T09:11:34.653066Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {
        "call_type": "acompletion",
        "service": "router"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "litellm",
            "deployment.environment": "production",
            "model_id": "litellm"
        },
        "schema_url": ""
    }
}
{
    "name": "router",
    "context": {
        "trace_id": "0xb694a4da1e730ccfb33cf7b61065ed8a",
        "span_id": "0x43cb1c6e95f27607",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0xc9654eba8734c9b6",
    "start_time": "2025-08-15T09:11:32.668347Z",
    "end_time": "2025-08-15T09:11:34.653066Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {
        "call_type": "acompletion",
        "service": "router"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "litellm",
            "deployment.environment": "production",
            "model_id": "litellm"
        },
        "schema_url": ""
    }
}
{
    "name": "raw_gen_ai_request",
    "context": {
        "trace_id": "0xb694a4da1e730ccfb33cf7b61065ed8a",
        "span_id": "0x8d3fe6a3521aa0bf",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0xb2fbccf7f6a39e97",
    "start_time": "2025-08-15T09:11:32.661585Z",
    "end_time": "2025-08-15T09:11:34.750996Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {
        "llm.openai.model": "openai/gpt-4o-mini",
        "llm.openai.messages": "[{'role': 'user', 'content': 'Hi'}]",
        "llm.openai.extra_body": "{}",
        "llm.openai.stream": true
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "litellm",
            "deployment.environment": "production",
            "model_id": "litellm"
        },
        "schema_url": ""
    }
}
{
    "name": "litellm_request",
    "context": {
        "trace_id": "0xb694a4da1e730ccfb33cf7b61065ed8a",
        "span_id": "0xb2fbccf7f6a39e97",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0xc9654eba8734c9b6",
    "start_time": "2025-08-15T09:11:32.661585Z",
    "end_time": "2025-08-15T09:11:34.750996Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {
        "metadata.user_api_key_hash": "88dc28d0f030c55ed4ab77ed8faf098196cb1c05df778539800c9f1243fe6b4b",
        "metadata.user_api_key_alias": "",
        "metadata.user_api_key_team_id": "",
        "metadata.user_api_key_org_id": "",
        "metadata.user_api_key_user_id": "default_user_id",
        "metadata.user_api_key_team_alias": "",
        "metadata.user_api_key_user_email": "",
        "metadata.spend_logs_metadata": "",
        "metadata.requester_ip_address": "10.244.1.1",
        "metadata.requester_metadata": "{}",
        "metadata.user_api_key_end_user_id": "",
        "metadata.prompt_management_metadata": "",
        "metadata.applied_guardrails": "[]",
        "metadata.mcp_tool_call_metadata": "",
        "metadata.vector_store_request_metadata": "",
        "metadata.usage_object": "{'completion_tokens': 10, 'prompt_tokens': 8, 'total_tokens': 18, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0, 'text_tokens': None, 'image_tokens': None}}",
        "metadata.requester_custom_headers": "{}",
        "metadata.user_api_key_request_route": "/v1/chat/completions",
        "gen_ai.request.model": "openai/gpt-4o-mini",
        "llm.request.type": "acompletion",
        "gen_ai.system": "openai",
        "llm.is_streaming": "True",
        "gen_ai.response.id": "gen-1755249096-0GpnV7NeiUfvIBTjekZL",
        "gen_ai.response.model": "openai/gpt-4o-mini",
        "llm.usage.total_tokens": 18,
        "gen_ai.usage.completion_tokens": 10,
        "gen_ai.usage.prompt_tokens": 8,
        "gen_ai.prompt.0.role": "user",
        "gen_ai.prompt.0.content": "Hi",
        "gen_ai.completion.0.finish_reason": "stop",
        "gen_ai.completion.0.role": "assistant",
        "gen_ai.completion.0.content": "Hello! How can I assist you today?"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "litellm",
            "deployment.environment": "production",
            "model_id": "litellm"
        },
        "schema_url": ""
    }
}
{
    "name": "Received Proxy Server Request",
    "context": {
        "trace_id": "0xb694a4da1e730ccfb33cf7b61065ed8a",
        "span_id": "0xc9654eba8734c9b6",
        "trace_state": "[]"
    },
    "kind": "SpanKind.SERVER",
    "parent_id": null,
    "start_time": "2025-08-15T09:11:32.632974Z",
    "end_time": "2025-08-15T09:11:34.754170Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {},
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "litellm",
            "deployment.environment": "production",
            "model_id": "litellm"
        },
        "schema_url": ""
    }
}

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

ghcr.io/berriai/litellm:main-stable with hash 0bd8fa78278fb5fdf3fa2d2d4cf9ec33e3a1c9c67d22c557479664f123479857

Twitter / LinkedIn details

@AyakaNeko

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions