Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 41 additions & 1 deletion apps/engineering/content/docs/architecture/index.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,44 @@
---
title: Overview
description: What do we run where and how?
description: System architecture and deployment model
---

import { Cards, Card } from 'fumadocs-ui/components/card';

Unkey runs on AWS across multiple regions, using Kubernetes for container orchestration. The architecture is split between the control plane that manages customer deployments and the data plane that serves traffic.

## Core Services

<Cards>
<Card
title="Control Plane (Ctrl)"
description="Orchestrates deployments, builds containers via Depot, provisions TLS certificates, and configures routing using durable Restate workflows"
href="./services/ctrl"
/>
<Card
title="Krane"
description="Kubernetes deployment abstraction that manages StatefulSets across multiple clusters and regions without replicating control plane logic"
href="./services/krane"
/>
<Card
title="API"
description="Handles key verification, analytics queries, and management operations in Go. Deployed to multiple AWS regions behind Global Accelerator"
href="./services/api/config"
/>
<Card
title="Gateway (GW)"
description="Routes traffic to customer deployments by querying the partition database, terminating TLS, and proxying requests to Kubernetes pods"
href="./services/gateway"
/>
<Card
title="ClickHouse"
description="Stores analytics events for key verification logs, API usage metrics, and audit trails with automatic scaling and replication"
href="./services/clickhouse"
/>
<Card
title="Vault"
description="Encrypts sensitive data using envelope encryption with AWS KMS, decrypting on demand without storing plaintext secrets"
href="./services/vault"
/>
</Cards>
Comment on lines +10 to +43
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify all service card links resolve correctly.

The new Cards section references six service pages with relative paths. Confirm all destinations exist:

  • ./services/ctrl → should resolve to ctrl/index.mdx ✓ (in PR)
  • ./services/krane → should resolve to krane.mdx ✓ (in PR)
  • ./services/api/config → verify this path exists
  • ./services/gateway → verify this path exists
  • ./services/clickhouse → verify this path exists
  • ./services/vault → verify this path exists

🏁 Script executed:

#!/bin/bash
# Verify all service documentation link destinations exist

# Check architecture/services directory structure
fd --type f "*.mdx" --path "*architecture/services*" | sort

# Specific checks for referenced paths
echo "=== Checking specific paths ==="
for path in "api/config" "gateway" "clickhouse" "vault"; do
  if fd --type f "*.mdx" --path "*architecture/services/$path*" >/dev/null 2>&1; then
    echo "✓ services/$path exists"
  else
    echo "✗ services/$path MISSING"
  fi
done

Length of output: 377


🏁 Script executed:

#!/bin/bash
# Verify all service documentation link destinations exist (corrected)

# Check architecture/services directory structure
echo "=== All .mdx files in architecture/services ==="
find apps/engineering/content/docs/architecture/services -name "*.mdx" 2>/dev/null | sort

echo ""
echo "=== Checking specific referenced paths ==="
for path in "ctrl/index.mdx" "krane.mdx" "api/config.mdx" "gateway.mdx" "clickhouse.mdx" "vault.mdx"; do
  filepath="apps/engineering/content/docs/architecture/services/$path"
  if [ -f "$filepath" ]; then
    echo "✓ services/$path exists"
  else
    echo "✗ services/$path MISSING"
  fi
done

Length of output: 282


🏁 Script executed:

#!/bin/bash
# Check the actual structure of the docs and architecture directories

echo "=== Exploring docs directory structure ==="
find apps/engineering/content/docs -type d -name "architecture" 2>/dev/null

echo ""
echo "=== All files/dirs under docs/architecture (first 3 levels) ==="
find apps/engineering/content/docs/architecture -maxdepth 3 -type f -o -type d 2>/dev/null | head -50

echo ""
echo "=== Check if services directory exists ==="
if [ -d "apps/engineering/content/docs/architecture/services" ]; then
  echo "✓ services directory exists"
  ls -la apps/engineering/content/docs/architecture/services
else
  echo "✗ services directory does not exist"
fi

echo ""
echo "=== All .mdx files in docs/architecture (any depth) ==="
find apps/engineering/content/docs/architecture -name "*.mdx" 2>/dev/null | sort

Length of output: 244


🏁 Script executed:

#!/bin/bash
# Broader exploration

echo "=== Check if apps/engineering exists ==="
if [ -d "apps/engineering" ]; then
  echo "✓ apps/engineering exists"
else
  echo "✗ apps/engineering does not exist"
fi

echo ""
echo "=== Check if apps/engineering/content exists ==="
if [ -d "apps/engineering/content" ]; then
  echo "✓ apps/engineering/content exists"
else
  echo "✗ apps/engineering/content does not exist"
fi

echo ""
echo "=== List apps directory ==="
ls -la apps/ 2>/dev/null || echo "apps directory not found"

echo ""
echo "=== Check if the specific index.mdx file exists ==="
if [ -f "apps/engineering/content/docs/architecture/index.mdx" ]; then
  echo "✓ File exists"
  head -20 apps/engineering/content/docs/architecture/index.mdx
else
  echo "✗ File does not exist"
fi

echo ""
echo "=== Find all index.mdx files ==="
find . -name "index.mdx" 2>/dev/null | head -20

Length of output: 1510


🏁 Script executed:

#!/bin/bash
# Check service files in correct git directory

echo "=== All .mdx files in architecture/services ==="
find ./git/apps/engineering/content/docs/architecture/services -name "*.mdx" 2>/dev/null | sort

echo ""
echo "=== Checking specific referenced paths ==="
for path in "ctrl/index.mdx" "krane/index.mdx" "api/config/index.mdx" "gateway/index.mdx" "clickhouse/index.mdx" "vault/index.mdx"; do
  filepath="./git/apps/engineering/content/docs/architecture/services/$path"
  if [ -f "$filepath" ]; then
    echo "✓ services/$path exists"
  else
    echo "✗ services/$path MISSING"
  fi
done

echo ""
echo "=== Check if api/config is actually api.mdx or api/index.mdx ==="
find ./git/apps/engineering/content/docs/architecture/services -path "*api*" -name "*.mdx" 2>/dev/null

Length of output: 1115


Fix broken service card link: gateway page does not exist.

Verification found that five of the six referenced service pages exist, but the "Gateway (GW)" card links to ./services/gateway which has no corresponding documentation file (neither gateway.mdx nor gateway/index.mdx exists in the services directory).

Confirmed valid links:

  • ./services/ctrl → ctrl/index.mdx
  • ./services/krane → krane.mdx
  • ./services/api/config → api/config.mdx
  • ./services/clickhouse → clickhouse.mdx
  • ./services/vault → vault.mdx

Broken link:

  • ./services/gateway → file not found

Create gateway.mdx in the services directory or update the card href to point to an existing page.

🤖 Prompt for AI Agents
In apps/engineering/content/docs/architecture/index.mdx around lines 10 to 43
the "Gateway (GW)" Card href points to ./services/gateway which doesn't exist;
fix by either creating the missing documentation file at
apps/engineering/content/docs/architecture/services/gateway.mdx (or
services/gateway/index.mdx) with the Gateway content, or update the Card href to
an existing path for the gateway docs (e.g., point to the correct existing file
location) so the link resolves.


Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
title: Build System
description: Container image building for customer deployments
---

import { Mermaid } from "@/app/components/mermaid"


When a customer deploys their application, the following process occurs:

The CLI first requests a deployment from the control plane, which returns a presigned S3 URL. The CLI packages the source code into a tarball and uploads it directly to S3, bypassing the control plane for efficient transfer. Once uploaded, the CLI triggers the build by sending the S3 path to the control plane.

The control plane retrieves or creates a dedicated Depot project for the customer, then initiates a build with Depot. Depot provisions an isolated BuildKit machine, downloads the build context from S3, executes the Docker build, and pushes the resulting image to its registry. The image name is returned to the control plane.

With the built image ready, the control plane instructs Krane to create a deployment with specified resources (replicas, CPU, memory). Krane creates the necessary Kubernetes resources (StatefulSet and Service) and K8s begins scheduling pods.

The control plane polls Krane every second (for up to 5 minutes) to check instance status. As instances become ready, their details are registered in the partition database. Once all instances are running, the control plane attempts to scrape an OpenAPI specification from the deployed service.

Finally, the control plane calls the RoutingService to atomically assign domains and create gateway configurations, and marks the deployment as ready in the database. Meanwhile, the CLI continuously polls the control plane every 2 seconds to check the deployment status until it becomes ready.

<Mermaid chart={`sequenceDiagram
autonumber
participant CLI
participant Ctrl as Ctrl Plane
participant S3
participant Depot
participant Krane
participant K8s as Kubernetes
participant DB as Partition DB
CLI->>Ctrl: Create Deployment
Ctrl->>CLI: Presigned S3 upload URL
CLI->>S3: PUT tar file directly
S3->>CLI: Upload complete
CLI->>Ctrl: CreateBuild(s3_path)
Ctrl->>Depot: Get/Create Depot Project
Depot->>Ctrl: Project ID
Ctrl->>Depot: Create Build
Depot->>Ctrl: Build ID
Depot->>S3: Download build context
Depot->>Depot: Execute Docker build & push to registry
Depot->>Ctrl: Image name & build ID
Ctrl->>Krane: CreateDeployment(image, replicas, resources)
Krane->>K8s: Create StatefulSet & Service
K8s->>K8s: Schedule & start pods
loop Poll until ready (max 5 min)
Ctrl->>Krane: GetDeployment()
Krane->>K8s: AppsV1.StatefulSets.Get
K8s->>Krane: Instances: [{id, addr, status}]
Krane->>Ctrl: Instances: [{id, addr, status}]
Ctrl->>DB: Upsert VM records
end
K8s->>K8s: Pods running
Ctrl->>K8s: HTTP GET /openapi.yaml
K8s->>Ctrl: OpenAPI spec
Ctrl->>Ctrl: AssignDomains (RoutingService)<br/>- Create gateway configs<br/>- Assign domains
Ctrl->>DB: Update deployment status: READY
loop CLI polls every 2s
CLI->>Ctrl: GetDeployment()
Ctrl->>CLI: Deployment status
end
CLI->>CLI: Status = READY, deployment complete
`} />

## Build Backends

We support two build backends, configurable via the `BUILD_BACKEND` environment variable.

### Depot (Production)

Depot.dev provides isolated, cached, and high-performance container builds. Builds are fast thanks to persistent layer caching across builds. Each customer project gets an isolated build environment with its own cache. No local Docker daemon is required since builds run on remote BuildKit machines. Multi-architecture support allows building for both amd64 and arm64. Registry integration is built-in, pushing images directly to Depot's registry after the build completes.

**Location:** `go/apps/ctrl/services/build/backend/depot/`

### Docker (Local Development)

The Docker backend uses standard Docker builds for local testing. It connects to the local Docker daemon and builds images on the host machine. This backend is simpler to set up for development but lacks the caching and isolation benefits of Depot.

**Location:** `go/apps/ctrl/services/build/backend/docker/`

## Storage

Build contexts are stored in S3-compatible storage. The upload process gives customers presigned URLs to directly upload their build context, bypassing the control plane for efficient transfer. During the build, Depot receives presigned download URLs to fetch the context from S3. Build contexts are retained for the lifecycle of the deployment, allowing rebuilds and rollbacks when needed.

**Location:** `go/apps/ctrl/services/build/storage/s3.go`

Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
title: Control Plane (Ctrl)
description: The control plane service for managing deployments and infrastructure
---

import { Mermaid } from "@/app/components/mermaid";

**Location:** `go/apps/ctrl/`
**CLI Command:** [`unkey run ctrl`](/cli/run/ctrl)
**Protocol:** Connect RPC (HTTP/2)

## What It Does

The ctrl service provides a deployment platform similar to Vercel, Railway, or Fly.io. When a customer deploys their application, ctrl:

1. **Builds** container images from source code using Depot.dev
2. **Deploys** containers to Kubernetes via Krane
3. **Assigns** domains to route traffic and configure gateways
4. **Secures** applications with automatic TLS certificate provisioning

All multi-step operations are durable, using Restate workflows to ensure consistency even during failures, network partitions, or process crashes.

## Architecture

### Service Composition

The ctrl service is composed of several specialized services and workflows. The RPC services handle synchronous operations like container image building through `BuildService`, deployment creation and management through `DeploymentService`, ACME challenge coordination through `AcmeService`, OpenAPI spec management through `OpenApiService`, and health checks through `CtrlService`.

Running alongside these are the Restate workflows that provide durable orchestration. The `DeploymentService` workflow orchestrates the full deployment lifecycle, the `RoutingService` workflow manages domain and gateway configuration, and the `CertificateService` workflow handles TLS certificate provisioning through the ACME protocol.

### Technology Stack

The ctrl service is built on Connect RPC for service-to-service communication using HTTP/2. Restate provides durable workflow orchestration with exactly-once semantics, ensuring operations complete reliably even during failures. Two MySQL databases store persistent state: the main database for projects, deployments, and domains, and the partition database for VM instances and gateway configurations. S3 stores build contexts and encrypted vault data. Krane provides a Kubernetes deployment abstraction, and Depot.dev handles remote container image building with persistent layer caching.

## Services

### Build Service

The build service manages container image building for customer deployments. It supports two backends: Depot for production deployments, which provides remote BuildKit with persistent layer caching for fast rebuilds, and Docker for local development, which uses standard Docker builds on the local machine.

The service provides two key operations. `GenerateUploadURL` returns a presigned S3 URL where the CLI can upload a tarball of the build context. `CreateBuild` then builds a Docker image from that uploaded source, coordinating with either Depot or Docker depending on configuration.

[Read detailed Build System docs →](./build)

### Deployment Service

The deployment service orchestrates the complete deployment lifecycle through durable workflows. It provides four key operations: `CreateDeployment` initiates a new deployment, `GetDeployment` queries the current status, `Promote` promotes a deployment to live, and `Rollback` rolls back to a previous deployment.

The deployment workflow progresses through several phases. It first builds the container image if building from source, then creates the deployment in Krane, our Kubernetes abstraction layer. Next it polls for instance readiness for up to 5 minutes, checking every second whether all pods are running. Once instances are ready, it registers them in the partition database so gateways can route traffic to them. It attempts to scrape an OpenAPI spec from the running service, though this is optional. Finally, it assigns domains and creates gateway configurations via the routing service, then marks the deployment as ready.

Restate implements [durable executions](https://www.restate.dev/what-is-durable-execution) by recording progress in a distributed persistent log. The log is managed by the Restate server. If ctrl crashes during deployment, Restate resumes from the last completed phase rather than restarting from the beginning. This ensures deployments complete reliably even during system failures.

Deployments are keyed by `project_id` in Restate's virtual object model. This ensures only one deployment operation per project runs at a time, preventing race conditions during concurrent deploy, rollback, or promote operations that could leave the system in an inconsistent state.

[Read detailed Deployment Workflow docs →](/docs/architecture/workflows/deployment-service)

### ACME Service

The ACME service handles ACME protocol coordination for TLS certificate provisioning. It provides three key operations: `CreateACMEUser` registers an ACME account for a workspace, `ValidateDomain` validates domain ownership, and `GetCertificate` retrieves issued certificates.

The service coordinates with the Certificate workflow for actual certificate issuance. It supports both HTTP-01 challenges for custom domains and DNS-01 challenges via the Cloudflare provider for wildcard certificates on the default domain.

Private keys are encrypted using the vault service before storage. Certificates are stored in the partition database for fast gateway access without encryption overhead. Challenge records track certificate expiry with 90-day validity periods.

[Read detailed Certificate docs →](./certificates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify "./certificates" cross-reference.

Line 65 links to ./certificates. Confirm a certificates.mdx file exists in the same directory (apps/engineering/content/docs/architecture/services/ctrl/), or this link will be broken.


🏁 Script executed:

#!/bin/bash
# Check if certificates documentation exists for Ctrl service

fd --type f "certificates.mdx" --path "*architecture/services/ctrl*"

Length of output: 233


🏁 Script executed:

#!/bin/bash
# Check if certificates.mdx exists in the Ctrl service documentation directory

fd "certificates.mdx" "apps/engineering/content/docs/architecture/services/ctrl/"

Length of output: 0


🏁 Script executed:

#!/bin/bash
# List all .mdx files in the Ctrl service documentation directory to verify structure

fd --type f "\.mdx$" "apps/engineering/content/docs/architecture/services/ctrl/"

Length of output: 133


Remove or create the missing certificates documentation file.

The link to ./certificates on line 65 is broken. Verification confirms that certificates.mdx does not exist in apps/engineering/content/docs/architecture/services/ctrl/. Either remove the link or create the missing documentation file.

🤖 Prompt for AI Agents
In apps/engineering/content/docs/architecture/services/ctrl/index.mdx around
line 65, the link "[Read detailed Certificate docs →](./certificates)" points to
a missing file (certificates.mdx); either remove that link or add the missing
certificates.mdx in the same directory. To fix: if the docs should exist, create
apps/engineering/content/docs/architecture/services/ctrl/certificates.mdx with
the intended content and ensure frontmatter/title are correct; otherwise edit
index.mdx to delete or replace the link with a valid target.


### OpenAPI Service

The OpenAPI service manages OpenAPI specifications scraped from deployed applications. It provides two key operations: `GetDiff` compares OpenAPI specs between deployments to detect breaking changes, and `GetSpec` retrieves the spec for a specific deployment.

Specs are scraped from `GET /openapi.yaml` on running instances during the deployment workflow. They're stored in the database and used for API documentation generation, request validation in gateways, and breaking change detection between deployments.

## Workflows

Workflows are implemented as Restate services for durable execution. The Deployment Workflow handles deploy, rollback, and promote operations. The Routing Workflow manages domain assignment and gateway configuration. The Certificate Workflow processes ACME challenges for TLS certificate provisioning. See the individual workflow documentation pages for detailed implementation specifics.

## Database Schema

The ctrl service uses two MySQL databases. The main database (`unkey`) stores projects, environments, and workspaces, along with deployments and deployment history, domains and SSL certificates, and ACME users and challenges. The partition database (`partition_*`) stores VMs representing container instances, gateway configurations as JSON blobs, and certificate storage in PEM format.

The partition database is designed for horizontal sharding. Each partition can live on a separate database server, and gateway instances only need access to their assigned partition. This reduces the blast radius if a partition is compromised and allows scaling the gateway infrastructure independently.

## Monitoring

The ctrl service exposes metrics and logs through OpenTelemetry. Key metrics include deployment duration broken down by phase, build success and failure rates, the number of Krane poll iterations required for deployments to become ready, domain assignment latency, and ACME challenge success rates.

All operations include structured logging fields for correlation and debugging. Common fields include `deployment_id`, `project_id`, and `workspace_id` across all operations. Build operations add `build_id` and `depot_project_id`. System-level logs include `instance_id`, `region`, and `platform` to identify which ctrl instance handled the operation.

Logs are shipped to Grafana Loki in production for centralized log aggregation and querying.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"title": "Ctrl",
"icon": "Pencil",
"root": false,
"pages": ["index", "build"]
}
Loading
Loading