Unified LLM API Gateway is a scalable, extensible platform that aggregates and normalises calls to multiple LLM backends (OpenAI, Hugging Face, Groq, Anthropic, Gemini, and more).
It provides a unified API with built-in caching, rate limiting, authentication, logging, metrics, and production-ready deployment manifests for Docker and Kubernetes.
- API Gateway (Go):
- Accepts client requests
 - Handles authentication, routing and request transformation
 - Aggregates/fans-out to LLM backends
 
 - LLM Adapters (microservices):
- Wrap each provider’s API (OpenAI, Hugging Face, etc.) with a unified internal interface
 
 - Cache Layer:
- Redis for result caching (prompt+params as cache key)
 
 - Rate Limiter:
- Redis-based leaky-bucket or token-bucket (shared across instances)
 
 - Auth & Quotas:
- API keys / JWT, per-key quotas (Redis or DB)
 
 - Observability:
- Structured logs (JSON), Prometheus metrics, traces
 
 - Deployment:
- Docker images, Helm charts, Kubernetes manifests, CI builds
 
 
llm-api-gateway/
├── README.md
├── LICENSE
├── .github/           # CI/CD workflows
├── infra/             # Docker Compose & Kubernetes manifests
│   ├── k8s/
│   └── docker-compose.yml
├── gateway/           # Go API gateway
│   ├── cmd/server/
│   ├── internal/
│   │   ├── handlers/
│   │   ├── adapters/
│   │   ├── cache/
│   │   ├── ratelimit/
│   │   └── metrics/
│   ├── go.mod
│   └── Dockerfile
├── adapters/          # Per-provider adapters (microservices)
│   ├── openai-adapter/
│   └── hf-adapter/
├── admin/             # NestJS admin dashboard (API keys, usage, logs)
│   ├── src/
│   ├── package.json
│   └── Dockerfile
└── tooling/
    └── tests/         # e2e test helpers
- Start all services:
docker-compose up --build
 - Gateway API:
http://localhost:3020/gateway/query - Admin Dashboard:
http://localhost:3040 
- OpenAI (GPT-3.5, GPT-4, GPT-4o, etc.)
 - Hugging Face Inference API
 - Groq
 - OpenRouter
 - Anthropic (Claude)
 - Gemini (Google)
 - More coming soon!
 
- Unified API: One endpoint for all LLMs
 - Authentication: API key/JWT middleware
 - Caching: Redis-based, prompt+params as key
 - Rate Limiting: Per-key, Redis-backed
 - Logging: Structured, JSON logs
 - Monitoring: Prometheus metrics endpoint
 - Adapters: Microservices for each provider
 - Kubernetes & Docker: Production-ready manifests
 
POST /gateway/query
Authorization: <your-gateway-api-key>
Content-Type: application/json
{
  "provider": "openai" | "hf" | "groq" | "openrouter" | "anthropic" | "gemini",
  "prompt": "Your prompt here"
}{
  "cached": false,
  "response": "LLM output"
}-  Unified 
/queryendpoint - OpenAI, Hugging Face, Groq, OpenRouter, Anthropic, Gemini support
 - Redis caching
 - API key authentication
 - Rate limiting
 - Logging
 
- Per-provider adapters as microservices
 - Unified internal API for adapters
 - Docker Compose & K8s manifests
 
- Prometheus metrics
 - Admin dashboard (NestJS)
 - Usage quotas & billing
 - Tracing (OpenTelemetry)
 
- Multi-provider aggregation/fan-out
 - Request/response transforms
 - Fine-grained quotas & billing
 - User/project management
 - Webhooks & streaming
 - Model selection & fallback
 - More adapters (Cohere, Mistral, etc.)
 
- Add more LLM providers & adapters
 - Streaming & webhooks support
 - Advanced admin features (usage, billing, analytics)
 - Helm charts for K8s
 - OpenAPI/Swagger docs
 
Contributions are welcome! Please open issues or PRs for bugs, features, or improvements.
