Skip to content

goal: Self-Hosted, Authenticated, OpenAI-Compatible Chat (vLLM-backed) #214

@locnguyen1986

Description

@locnguyen1986

🎯 Goal

Self-Hosted, Authenticated, OpenAI-Compatible Chat (vLLM-backed)

📖 Context

We need a drop-in, OpenAI-style API that teams can self-host. vLLM provides fast inference; Keycloak+Kong provide auth and gateway controls. This is the entry point for all other features.

✅ Scope

  • OpenAI-compatible endpoints: GET /v1/models, POST /v1/chat/completions (JSON + SSE)
  • vLLM runner integration (single/multi-GPU, prompt caching)
  • Auth via Keycloak (OIDC) and Kong (key-auth); guest mode with quotas
  • Usage counters in responses; consistent error envelopes
  • Minimal “hello world” examples (curl, Python, TypeScript)

🛠 Deliverables

  • OpenAPI spec & conformance tests
  • vLLM runner + health probes; model registry config
  • Gateway policy (rate limits, CORS/CSRF)
  • Docker Compose for local; Helm values for prod
  • Example clients + Postman collection

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions