Skip to content

Conversation

@chronark
Copy link
Collaborator

@chronark chronark commented Oct 27, 2025

What does this PR do?

This PR adds comprehensive architecture documentation for the Unkey system, focusing on:

  1. Added an architecture overview page with cards for core services
  2. Created detailed documentation for the Control Plane (Ctrl) service
  3. Added build system documentation explaining container image building
  4. Added Krane service documentation for Kubernetes deployment orchestration
  5. Updated Mermaid component to use maxWidth for better diagram display

The documentation includes sequence diagrams for deployment flows, detailed explanations of service responsibilities, and technical implementation details for both production and local development environments.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • Navigate to the architecture documentation in the engineering app
  • Verify that the overview page displays properly with all service cards
  • Check that the Mermaid diagrams render correctly with proper width constraints
  • Ensure all links between documentation pages work correctly
  • Test the documentation in both light and dark themes

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

@changeset-bot
Copy link

changeset-bot bot commented Oct 27, 2025

⚠️ No Changeset found

Latest commit: 52c2faa

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Oct 27, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
engineering Ready Ready Preview Comment Oct 27, 2025 7:43pm
1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
dashboard Ignored Ignored Preview Oct 27, 2025 7:43pm

@CLAassistant
Copy link

CLAassistant commented Oct 27, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ chronark
❌ Andreas


Andreas seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 27, 2025

📝 Walkthrough

Walkthrough

This change updates and expands the architecture documentation. The main index was redesigned with a narrative overview and visual card-based navigation for six core services. Comprehensive documentation was added for two services: Ctrl (control plane orchestration with build and deployment workflows) and Krane (Kubernetes deployment abstraction). Metadata files were updated to register these new documentation pages.

Changes

Cohort / File(s) Change Summary
Architecture Index Redesign
apps/engineering/content/docs/architecture/index.mdx
Updated description and added narrative overview of Unkey's AWS infrastructure and Kubernetes deployment model. Introduced Cards grid with six service entries (Control Plane, Krane, API, Gateway, ClickHouse, Vault) linking to service documentation. Added MDX component imports.
Ctrl Service Documentation
apps/engineering/content/docs/architecture/services/ctrl/index.mdx, apps/engineering/content/docs/architecture/services/ctrl/build.mdx, apps/engineering/content/docs/architecture/services/ctrl/meta.json
New documentation suite for Control Plane service. Index covers architecture, services (Build, Deployment, ACME, OpenAPI), Restate workflows, database schemas, and monitoring. Build document details image creation pipeline with Depot/Docker backends, S3 storage, and presigned URLs. Metadata file registers Ctrl section with two pages.
Krane Service Documentation
apps/engineering/content/docs/architecture/services/krane.mdx
New documentation for Kubernetes deployment orchestration service, covering purpose, StatefulSet rationale, deployment flow with sequence diagram, backend implementations (Kubernetes/Docker), RBAC requirements, and local development workflow.
Services Navigation Metadata
apps/engineering/content/docs/architecture/services/meta.json
Updated pages array to include new "ctrl" and "krane" entries in appropriate sorted positions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Narrative accuracy: Verify architectural descriptions of Unkey's multi-region deployment, control plane/data plane split, and service responsibilities align with actual implementation.
  • Link correctness: Confirm all href paths in the Cards grid and cross-references within Ctrl documentation resolve to existing documentation files.
  • Workflow documentation: Review build pipeline, deployment phases, and Restate workflow descriptions for technical accuracy and completeness.
  • Metadata consistency: Ensure page ordering and naming in meta.json files match actual file structure and documentation hierarchy.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "docs: explain control plane services" is related to the main changes in the pull request. The PR adds comprehensive documentation for the Control Plane (Ctrl) service with multiple files covering the service architecture, build system, and deployment orchestration. The title accurately captures the central focus of the work, as confirmed by the PR objectives which state the PR focuses on Control Plane (Ctrl) service. While the title doesn't comprehensively cover all additions (such as the architecture overview page), it does refer to a real and significant aspect of the change set.
Description Check ✅ Passed The pull request description is well-structured and mostly complete, containing all major required sections: a comprehensive "What does this PR do?" section explaining the five key additions, a properly filled "Type of change" section with Enhancement and documentation update selected, detailed "How should this be tested?" instructions covering navigation, verification, and theme testing, and a thorough checklist with most required and appreciated items completed. The only missing element is an explicit issue reference (Fixes # format), which is a procedural/tracking item rather than substantive content. Since the description is substantially complete with all key sections well-filled, it meets the pass criteria.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 10-27-docs_explain_control_plane_services

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Removed architecture configuration from mermaid initialization.
@github-actions
Copy link
Contributor

Thank you for following the naming conventions for pull request titles! 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
apps/engineering/content/docs/architecture/services/ctrl/build.mdx (1)

9-11: Minor style suggestion: Lead with "first" for improved clarity.

Per the LanguageTool style guide, restructuring this sentence would improve clarity:

-The CLI first requests a deployment from the control plane, which returns a presigned S3 URL.
+First, the CLI requests a deployment from the control plane, which returns a presigned S3 URL.

This is a very minor stylistic improvement and can be deferred if not aligned with your documentation style guide.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62d884b and 52c2faa.

📒 Files selected for processing (6)
  • apps/engineering/content/docs/architecture/index.mdx (1 hunks)
  • apps/engineering/content/docs/architecture/services/ctrl/build.mdx (1 hunks)
  • apps/engineering/content/docs/architecture/services/ctrl/index.mdx (1 hunks)
  • apps/engineering/content/docs/architecture/services/ctrl/meta.json (1 hunks)
  • apps/engineering/content/docs/architecture/services/krane.mdx (1 hunks)
  • apps/engineering/content/docs/architecture/services/meta.json (1 hunks)
🧰 Additional context used
🪛 LanguageTool
apps/engineering/content/docs/architecture/services/ctrl/build.mdx

[style] ~10-~10: Consider placing the discourse marker ‘first’ at the beginning of the sentence for more clarity.
Context: ...ication, the following process occurs: The CLI first requests a deployment from the control ...

(SENT_START_FIRST_PREMIUM)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test Go API Local / Test
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Build / Build
🔇 Additional comments (10)
apps/engineering/content/docs/architecture/services/ctrl/meta.json (1)

1-6: LGTM!

Metadata structure is consistent and correctly references the documentation pages (index and build).

apps/engineering/content/docs/architecture/services/meta.json (1)

1-19: LGTM!

Service entries properly added to navigation registry and positioned logically within the services list.

apps/engineering/content/docs/architecture/services/krane.mdx (3)

27-35: Strong architectural justification provided for design choices.

The StatefulSets rationale (lines 27-35) effectively explains the tradeoff between convenience (stable DNS) and convention (standard Deployments). The acknowledgment of this as a "known design compromise" and mention of future improvements is valuable context for maintainers.


107-131: RBAC configuration example is complete and accurate.

The YAML example is properly formatted and correctly specifies the required permissions for Krane's Kubernetes backend operations.


8-11: All references verified as correct.

  • go/apps/krane/ directory exists ✓
  • go/proto/krane/v1/deployment.proto file exists ✓
  • /cli/run/krane CLI command documentation exists ✓

The documentation references are accurate and point to valid codebase locations.

apps/engineering/content/docs/architecture/services/ctrl/index.mdx (3)

43-43: Confirm "./build" cross-reference is valid.

Line 43 references ./build, which should point to the build.mdx file in the same directory. This appears correct based on the PR changes provided, but verify it renders properly in the documentation build.


8-10: All service metadata verified as accurate.

  • go/apps/ctrl/ exists at ./go/apps/ctrl
  • CLI command /cli/run/ctrl is properly documented at ./apps/engineering/content/docs/cli/run/ctrl/index.mdx
  • Protocol Connect RPC (HTTP/2) is correctly stated and documented in the Technology Stack section

55-55: External documentation link is valid and exists in the codebase.

The referenced file deployment-service.mdx exists at apps/engineering/content/docs/architecture/workflows/deployment-service.mdx. The link path /docs/architecture/workflows/deployment-service correctly maps to this file. No action required.

Likely an incorrect or invalid review comment.

apps/engineering/content/docs/architecture/index.mdx (1)

3-3: New imports and narrative structure look good.

The Cards component import and architecture narrative provide excellent context for users navigating the services.

Also applies to: 6-8

apps/engineering/content/docs/architecture/services/ctrl/build.mdx (1)

75-95: All backend and storage file paths verified as correct.

Verification confirms all documented paths exist in the codebase:

  • go/apps/ctrl/services/build/backend/depot/
  • go/apps/ctrl/services/build/backend/docker/
  • go/apps/ctrl/services/build/storage/s3.go

No corrections needed.

Comment on lines +10 to +43
## Core Services

<Cards>
<Card
title="Control Plane (Ctrl)"
description="Orchestrates deployments, builds containers via Depot, provisions TLS certificates, and configures routing using durable Restate workflows"
href="./services/ctrl"
/>
<Card
title="Krane"
description="Kubernetes deployment abstraction that manages StatefulSets across multiple clusters and regions without replicating control plane logic"
href="./services/krane"
/>
<Card
title="API"
description="Handles key verification, analytics queries, and management operations in Go. Deployed to multiple AWS regions behind Global Accelerator"
href="./services/api/config"
/>
<Card
title="Gateway (GW)"
description="Routes traffic to customer deployments by querying the partition database, terminating TLS, and proxying requests to Kubernetes pods"
href="./services/gateway"
/>
<Card
title="ClickHouse"
description="Stores analytics events for key verification logs, API usage metrics, and audit trails with automatic scaling and replication"
href="./services/clickhouse"
/>
<Card
title="Vault"
description="Encrypts sensitive data using envelope encryption with AWS KMS, decrypting on demand without storing plaintext secrets"
href="./services/vault"
/>
</Cards>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify all service card links resolve correctly.

The new Cards section references six service pages with relative paths. Confirm all destinations exist:

  • ./services/ctrl → should resolve to ctrl/index.mdx ✓ (in PR)
  • ./services/krane → should resolve to krane.mdx ✓ (in PR)
  • ./services/api/config → verify this path exists
  • ./services/gateway → verify this path exists
  • ./services/clickhouse → verify this path exists
  • ./services/vault → verify this path exists

🏁 Script executed:

#!/bin/bash
# Verify all service documentation link destinations exist

# Check architecture/services directory structure
fd --type f "*.mdx" --path "*architecture/services*" | sort

# Specific checks for referenced paths
echo "=== Checking specific paths ==="
for path in "api/config" "gateway" "clickhouse" "vault"; do
  if fd --type f "*.mdx" --path "*architecture/services/$path*" >/dev/null 2>&1; then
    echo "✓ services/$path exists"
  else
    echo "✗ services/$path MISSING"
  fi
done

Length of output: 377


🏁 Script executed:

#!/bin/bash
# Verify all service documentation link destinations exist (corrected)

# Check architecture/services directory structure
echo "=== All .mdx files in architecture/services ==="
find apps/engineering/content/docs/architecture/services -name "*.mdx" 2>/dev/null | sort

echo ""
echo "=== Checking specific referenced paths ==="
for path in "ctrl/index.mdx" "krane.mdx" "api/config.mdx" "gateway.mdx" "clickhouse.mdx" "vault.mdx"; do
  filepath="apps/engineering/content/docs/architecture/services/$path"
  if [ -f "$filepath" ]; then
    echo "✓ services/$path exists"
  else
    echo "✗ services/$path MISSING"
  fi
done

Length of output: 282


🏁 Script executed:

#!/bin/bash
# Check the actual structure of the docs and architecture directories

echo "=== Exploring docs directory structure ==="
find apps/engineering/content/docs -type d -name "architecture" 2>/dev/null

echo ""
echo "=== All files/dirs under docs/architecture (first 3 levels) ==="
find apps/engineering/content/docs/architecture -maxdepth 3 -type f -o -type d 2>/dev/null | head -50

echo ""
echo "=== Check if services directory exists ==="
if [ -d "apps/engineering/content/docs/architecture/services" ]; then
  echo "✓ services directory exists"
  ls -la apps/engineering/content/docs/architecture/services
else
  echo "✗ services directory does not exist"
fi

echo ""
echo "=== All .mdx files in docs/architecture (any depth) ==="
find apps/engineering/content/docs/architecture -name "*.mdx" 2>/dev/null | sort

Length of output: 244


🏁 Script executed:

#!/bin/bash
# Broader exploration

echo "=== Check if apps/engineering exists ==="
if [ -d "apps/engineering" ]; then
  echo "✓ apps/engineering exists"
else
  echo "✗ apps/engineering does not exist"
fi

echo ""
echo "=== Check if apps/engineering/content exists ==="
if [ -d "apps/engineering/content" ]; then
  echo "✓ apps/engineering/content exists"
else
  echo "✗ apps/engineering/content does not exist"
fi

echo ""
echo "=== List apps directory ==="
ls -la apps/ 2>/dev/null || echo "apps directory not found"

echo ""
echo "=== Check if the specific index.mdx file exists ==="
if [ -f "apps/engineering/content/docs/architecture/index.mdx" ]; then
  echo "✓ File exists"
  head -20 apps/engineering/content/docs/architecture/index.mdx
else
  echo "✗ File does not exist"
fi

echo ""
echo "=== Find all index.mdx files ==="
find . -name "index.mdx" 2>/dev/null | head -20

Length of output: 1510


🏁 Script executed:

#!/bin/bash
# Check service files in correct git directory

echo "=== All .mdx files in architecture/services ==="
find ./git/apps/engineering/content/docs/architecture/services -name "*.mdx" 2>/dev/null | sort

echo ""
echo "=== Checking specific referenced paths ==="
for path in "ctrl/index.mdx" "krane/index.mdx" "api/config/index.mdx" "gateway/index.mdx" "clickhouse/index.mdx" "vault/index.mdx"; do
  filepath="./git/apps/engineering/content/docs/architecture/services/$path"
  if [ -f "$filepath" ]; then
    echo "✓ services/$path exists"
  else
    echo "✗ services/$path MISSING"
  fi
done

echo ""
echo "=== Check if api/config is actually api.mdx or api/index.mdx ==="
find ./git/apps/engineering/content/docs/architecture/services -path "*api*" -name "*.mdx" 2>/dev/null

Length of output: 1115


Fix broken service card link: gateway page does not exist.

Verification found that five of the six referenced service pages exist, but the "Gateway (GW)" card links to ./services/gateway which has no corresponding documentation file (neither gateway.mdx nor gateway/index.mdx exists in the services directory).

Confirmed valid links:

  • ./services/ctrl → ctrl/index.mdx
  • ./services/krane → krane.mdx
  • ./services/api/config → api/config.mdx
  • ./services/clickhouse → clickhouse.mdx
  • ./services/vault → vault.mdx

Broken link:

  • ./services/gateway → file not found

Create gateway.mdx in the services directory or update the card href to point to an existing page.

🤖 Prompt for AI Agents
In apps/engineering/content/docs/architecture/index.mdx around lines 10 to 43
the "Gateway (GW)" Card href points to ./services/gateway which doesn't exist;
fix by either creating the missing documentation file at
apps/engineering/content/docs/architecture/services/gateway.mdx (or
services/gateway/index.mdx) with the Gateway content, or update the Card href to
an existing path for the gateway docs (e.g., point to the correct existing file
location) so the link resolves.


Private keys are encrypted using the vault service before storage. Certificates are stored in the partition database for fast gateway access without encryption overhead. Challenge records track certificate expiry with 90-day validity periods.

[Read detailed Certificate docs →](./certificates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify "./certificates" cross-reference.

Line 65 links to ./certificates. Confirm a certificates.mdx file exists in the same directory (apps/engineering/content/docs/architecture/services/ctrl/), or this link will be broken.


🏁 Script executed:

#!/bin/bash
# Check if certificates documentation exists for Ctrl service

fd --type f "certificates.mdx" --path "*architecture/services/ctrl*"

Length of output: 233


🏁 Script executed:

#!/bin/bash
# Check if certificates.mdx exists in the Ctrl service documentation directory

fd "certificates.mdx" "apps/engineering/content/docs/architecture/services/ctrl/"

Length of output: 0


🏁 Script executed:

#!/bin/bash
# List all .mdx files in the Ctrl service documentation directory to verify structure

fd --type f "\.mdx$" "apps/engineering/content/docs/architecture/services/ctrl/"

Length of output: 133


Remove or create the missing certificates documentation file.

The link to ./certificates on line 65 is broken. Verification confirms that certificates.mdx does not exist in apps/engineering/content/docs/architecture/services/ctrl/. Either remove the link or create the missing documentation file.

🤖 Prompt for AI Agents
In apps/engineering/content/docs/architecture/services/ctrl/index.mdx around
line 65, the link "[Read detailed Certificate docs →](./certificates)" points to
a missing file (certificates.mdx); either remove that link or add the missing
certificates.mdx in the same directory. To fix: if the docs should exist, create
apps/engineering/content/docs/architecture/services/ctrl/certificates.mdx with
the intended content and ensure frontmatter/title are correct; otherwise edit
index.mdx to delete or replace the link with a valid target.


The CLI first requests a deployment from the control plane, which returns a presigned S3 URL. The CLI packages the source code into a tarball and uploads it directly to S3, bypassing the control plane for efficient transfer. Once uploaded, the CLI triggers the build by sending the S3 path to the control plane.

The control plane retrieves or creates a dedicated Depot project for the customer, then initiates a build with Depot. Depot provisions an isolated BuildKit machine, downloads the build context from S3, executes the Docker build, and pushes the resulting image to its registry. The image name is returned to the control plane.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does Depot provision the isolated buildkit machine? Is that our infra? Their infra?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also "to its registry".. what is "its" here?


The deployment workflow progresses through several phases. It first builds the container image if building from source, then creates the deployment in Krane, our Kubernetes abstraction layer. Next it polls for instance readiness for up to 5 minutes, checking every second whether all pods are running. Once instances are ready, it registers them in the partition database so gateways can route traffic to them. It attempts to scrape an OpenAPI spec from the running service, though this is optional. Finally, it assigns domains and creates gateway configurations via the routing service, then marks the deployment as ready.

Each phase is durable. If ctrl crashes during deployment, Restate resumes from the last completed phase rather than restarting from the beginning. This ensures deployments complete reliably even during system failures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not important right now but a link to whatever explains how this durability is provided through Restate would be useful


### Ctrl Service

The ctrl service provides health checks and instance metadata. Its primary operation is `Liveness`, which serves as a health check endpoint for Kubernetes probes. This service is minimal by design, handling only operational concerns rather than business logic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is in opposition with the original statement on line 14.


## Kubernetes Backend

The Kubernetes backend runs inside a cluster with appropriate RBAC permissions. It uses in-cluster config to authenticate with the API server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what's "appropriate"? (link to the permission def or inline it)

Copy link
Contributor

@imeyer imeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented with questions/suggestions 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants