Skip to content

Commit 9166bff

Browse files
refactor e2e (#266)
* refactor e2e * update workflow * fix code interpreter e2e * update workflow * fix new tests * use deterministic ports and remove skips * unify * use generate-config * skip exposed ports test and claude amends * consolidate code-interpreter test * fix merge * lower ctx count * debug logs here we go * dont parallize calls maybe? * remove debug logs * also prepare python container in setup step * update process readiness e2e * update docs * add file streaming test * add hidden files and base64 encoding test * complete env override test and poll for process termination
1 parent 45e9676 commit 9166bff

24 files changed

+2896
-5263
lines changed

CLAUDE.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ npm run build:clean # Force rebuild without cache
6464
# Unit tests (runs in Workers runtime with vitest-pool-workers)
6565
npm test
6666

67-
# E2E tests (requires Docker, runs sequentially due to container provisioning)
67+
# E2E tests (requires Docker)
6868
npm run test:e2e
6969

7070
# Run a single E2E test file
@@ -74,7 +74,7 @@ npm run test:e2e -- -- tests/e2e/process-lifecycle-workflow.test.ts
7474
npm run test:e2e -- -- tests/e2e/git-clone-workflow.test.ts -t 'test name'
7575
```
7676

77-
**Important**: E2E tests (`tests/e2e/`) run sequentially (not in parallel) to avoid container resource contention. Each test spawns its own wrangler dev instance.
77+
**Important**: E2E tests share a single sandbox container for performance. Tests run in parallel using unique sessions for isolation.
7878

7979
### Code Quality
8080

@@ -211,11 +211,12 @@ npm run test:e2e -- -- tests/e2e/git-clone-workflow.test.ts -t 'should handle cl
211211
**Architecture:**
212212

213213
- Tests in `tests/e2e/` run against real Cloudflare Workers + Docker containers
214-
- **In CI**: Tests deploy to actual Cloudflare infrastructure and run against deployed workers
215-
- **Locally**: Each test file spawns its own `wrangler dev` instance
214+
- **Shared sandbox**: All tests share ONE container, using sessions for isolation
215+
- **In CI**: Tests deploy to actual Cloudflare infrastructure
216+
- **Locally**: Global setup spawns wrangler dev once, all tests share it
216217
- Config: `vitest.e2e.config.ts` (root level)
217-
- Sequential execution (`singleFork: true`) to prevent container resource contention
218-
- Longer timeouts (2min per test) for container operations
218+
- Parallel execution via thread pool (~30s for full suite)
219+
- See `docs/E2E_TESTING.md` for writing tests
219220

220221
**Build system trust:** The monorepo build system (turbo + npm workspaces) is robust and handles all package dependencies automatically. E2E tests always run against the latest built code - there's no need to manually rebuild or worry about stale builds unless explicitly working on the build setup itself.
221222

CONTRIBUTING.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -165,12 +165,11 @@ Located in `tests/e2e/`:
165165

166166
- Test full workflows against real Workers and containers
167167
- Require Docker
168-
- Slower but comprehensive
168+
- Share a single sandbox container for performance (~30s for full suite)
169+
- Use sessions for test isolation
169170

170171
Run with: `npm run test:e2e`
171172

172-
You can also run specific test files or individual tests:
173-
174173
```bash
175174
# Run a single E2E test file
176175
npm run test:e2e -- -- tests/e2e/process-lifecycle-workflow.test.ts
@@ -179,12 +178,15 @@ npm run test:e2e -- -- tests/e2e/process-lifecycle-workflow.test.ts
179178
npm run test:e2e -- -- tests/e2e/git-clone-workflow.test.ts -t 'should handle cloning to default directory'
180179
```
181180

181+
**See `docs/E2E_TESTING.md` for the complete guide on writing E2E tests.**
182+
182183
### Writing Tests
183184

184185
- Write tests for new features
185186
- Add regression tests for bug fixes
186187
- Ensure tests are deterministic (no flaky tests)
187188
- Use descriptive test names
189+
- For E2E tests: use `getSharedSandbox()` and `createUniqueSession()` for isolation
188190

189191
## Documentation
190192

docs/E2E_TESTING.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# E2E Testing Guide
2+
3+
E2E tests validate full workflows against real Cloudflare Workers and Docker containers.
4+
5+
## Architecture
6+
7+
All E2E tests share a **single sandbox container** for performance. Test isolation is achieved through **sessions** - each test file gets a unique session that provides isolated shell state (env vars, working directory) within the shared container.
8+
9+
```
10+
┌─────────────────────────────────────────────────────┐
11+
│ Shared Sandbox │
12+
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
13+
│ │ Session A │ │ Session B │ │ Session C │ │
14+
│ │ (test 1) │ │ (test 2) │ │ (test 3) │ │
15+
│ └─────────────┘ └─────────────┘ └─────────────┘ │
16+
│ │
17+
│ Shared filesystem & processes │
18+
└─────────────────────────────────────────────────────┘
19+
```
20+
21+
**Key files:**
22+
23+
- `tests/e2e/global-setup.ts` - Creates sandbox before tests, warms containers
24+
- `tests/e2e/helpers/global-sandbox.ts` - Provides `getSharedSandbox()` API
25+
- `vitest.e2e.config.ts` - Configures parallel execution with global setup
26+
27+
## Writing Tests
28+
29+
### Basic Template
30+
31+
```typescript
32+
import { describe, test, expect, beforeAll } from 'vitest';
33+
import {
34+
getSharedSandbox,
35+
createUniqueSession
36+
} from './helpers/global-sandbox';
37+
38+
describe('My Feature', () => {
39+
let workerUrl: string;
40+
let headers: Record<string, string>;
41+
42+
beforeAll(async () => {
43+
const sandbox = await getSharedSandbox();
44+
workerUrl = sandbox.workerUrl;
45+
headers = sandbox.createHeaders(createUniqueSession());
46+
}, 120000);
47+
48+
test('should do something', async () => {
49+
const response = await fetch(`${workerUrl}/api/execute`, {
50+
method: 'POST',
51+
headers,
52+
body: JSON.stringify({ command: 'echo hello' })
53+
});
54+
expect(response.status).toBe(200);
55+
}, 60000);
56+
});
57+
```
58+
59+
### Using Python Image
60+
61+
For tests requiring Python (code interpreter, etc.):
62+
63+
```typescript
64+
beforeAll(async () => {
65+
const sandbox = await getSharedSandbox();
66+
workerUrl = sandbox.workerUrl;
67+
// Use createPythonHeaders instead of createHeaders
68+
headers = sandbox.createPythonHeaders(createUniqueSession());
69+
}, 120000);
70+
```
71+
72+
### File Isolation
73+
74+
Since the filesystem is shared, use unique paths to avoid conflicts:
75+
76+
```typescript
77+
const sandbox = await getSharedSandbox();
78+
const testDir = sandbox.uniquePath('my-feature'); // /workspace/test-abc123/my-feature
79+
80+
await fetch(`${workerUrl}/api/file/write`, {
81+
method: 'POST',
82+
headers,
83+
body: JSON.stringify({
84+
path: `${testDir}/config.json`,
85+
content: '{"key": "value"}'
86+
})
87+
});
88+
```
89+
90+
### Port Usage
91+
92+
Ports must be exposed in the Dockerfile. Currently exposed:
93+
94+
- `8080` - General testing
95+
- `9090`, `9091`, `9092` - Process readiness tests
96+
- `9998` - Process lifecycle tests
97+
- `9999` - WebSocket tests
98+
99+
To use a new port:
100+
101+
1. Add it to both `tests/e2e/test-worker/Dockerfile` and `Dockerfile.python`
102+
2. Document which test uses it
103+
104+
### Process Cleanup
105+
106+
Always clean up background processes:
107+
108+
```typescript
109+
test('should start server', async () => {
110+
const startRes = await fetch(`${workerUrl}/api/process/start`, {
111+
method: 'POST',
112+
headers,
113+
body: JSON.stringify({ command: 'bun run server.js' })
114+
});
115+
const { id: processId } = await startRes.json();
116+
117+
// ... test logic ...
118+
119+
// Cleanup
120+
await fetch(`${workerUrl}/api/process/${processId}`, {
121+
method: 'DELETE',
122+
headers
123+
});
124+
}, 60000);
125+
```
126+
127+
## Test Organization
128+
129+
| File | Purpose |
130+
| --------------------------------------- | ---------------------------- |
131+
| `comprehensive-workflow.test.ts` | Happy path integration tests |
132+
| `process-lifecycle-workflow.test.ts` | Error handling for processes |
133+
| `process-readiness-workflow.test.ts` | waitForLog/waitForPort tests |
134+
| `code-interpreter-workflow.test.ts` | Python/JS code execution |
135+
| `file-operations-workflow.test.ts` | File read/write/list |
136+
| `streaming-operations-workflow.test.ts` | Streaming command output |
137+
| `websocket-workflow.test.ts` | WebSocket connections |
138+
| `bucket-mounting.test.ts` | R2 bucket mounting (CI only) |
139+
140+
## Running Tests
141+
142+
```bash
143+
# All E2E tests
144+
npm run test:e2e
145+
146+
# Single file
147+
npm run test:e2e -- -- tests/e2e/process-lifecycle-workflow.test.ts
148+
149+
# Single test by name
150+
npm run test:e2e -- -- tests/e2e/git-clone-workflow.test.ts -t 'should clone repo'
151+
```
152+
153+
## Debugging
154+
155+
- Tests auto-retry once on failure (`retry: 1` in config)
156+
- Global setup logs sandbox ID on startup - check for initialization errors
157+
- If tests fail on first run only, the container might not be warmed (check global-setup.ts initializes the right image type)
158+
- Port conflicts: check no other test uses the same port
159+
160+
## What NOT to Do
161+
162+
- **Don't create new sandboxes unless strictly necessary** - use `getSharedSandbox()`
163+
- **Don't skip cleanup** - leaked processes affect other tests
164+
- **Don't use hardcoded ports** without adding to Dockerfile
165+
- **Don't rely on filesystem state** from other tests - use unique paths

packages/sandbox/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
"typecheck": "tsc --noEmit",
3737
"docker:local": "cd ../.. && docker build -f packages/sandbox/Dockerfile --target default --platform linux/amd64 --build-arg SANDBOX_VERSION=$npm_package_version -t cloudflare/sandbox-test:$npm_package_version . && docker build -f packages/sandbox/Dockerfile --target python --platform linux/amd64 --build-arg SANDBOX_VERSION=$npm_package_version -t cloudflare/sandbox-test:$npm_package_version-python .",
3838
"test": "vitest run --config vitest.config.ts \"$@\"",
39-
"test:e2e": "cd ../.. && cd tests/e2e/test-worker && ./generate-config.sh && cd ../../.. && vitest run --config vitest.e2e.config.ts \"$@\""
39+
"test:e2e": "cd ../../tests/e2e/test-worker && ./generate-config.sh && cd ../../.. && vitest run --config vitest.e2e.config.ts \"$@\""
4040
},
4141
"exports": {
4242
".": {

tests/e2e/_smoke.test.ts

Lines changed: 16 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
1-
import { describe, test, expect, beforeAll, afterAll, afterEach } from 'vitest';
2-
import { getTestWorkerUrl, WranglerDevRunner } from './helpers/wrangler-runner';
3-
import { createSandboxId, cleanupSandbox } from './helpers/test-fixtures';
1+
import { describe, test, expect, beforeAll } from 'vitest';
2+
import { getSharedSandbox } from './helpers/global-sandbox';
43
import type { HealthResponse } from './test-worker/types';
54

65
/**
@@ -9,50 +8,25 @@ import type { HealthResponse } from './test-worker/types';
98
* This test validates that:
109
* 1. Can get worker URL (deployed in CI, wrangler dev locally)
1110
* 2. Worker is running and responding
12-
* 3. Can cleanup properly
11+
* 3. Shared sandbox initializes correctly
1312
*
14-
* NOTE: This is just infrastructure validation. Real SDK integration
15-
* tests will be in the workflow test suites.
13+
* NOTE: This test runs first (sorted by name) and initializes the shared sandbox.
1614
*/
1715
describe('Integration Infrastructure Smoke Test', () => {
18-
describe('local', () => {
19-
let runner: WranglerDevRunner | null = null;
20-
let workerUrl: string;
21-
let currentSandboxId: string | null = null;
16+
let workerUrl: string;
2217

23-
beforeAll(async () => {
24-
const result = await getTestWorkerUrl();
25-
workerUrl = result.url;
26-
runner = result.runner;
27-
});
18+
beforeAll(async () => {
19+
// Initialize shared sandbox - this will be reused by all other tests
20+
const sandbox = await getSharedSandbox();
21+
workerUrl = sandbox.workerUrl;
22+
}, 120000);
2823

29-
afterEach(async () => {
30-
// Cleanup sandbox container after each test
31-
if (currentSandboxId) {
32-
await cleanupSandbox(workerUrl, currentSandboxId);
33-
currentSandboxId = null;
34-
}
35-
});
24+
test('should verify worker is running with health check', async () => {
25+
// Verify worker is running with health check
26+
const response = await fetch(`${workerUrl}/health`);
27+
expect(response.status).toBe(200);
3628

37-
afterAll(async () => {
38-
if (runner) {
39-
await runner.stop();
40-
}
41-
});
42-
43-
test('should verify worker is running with health check', async () => {
44-
// Verify worker is running with health check
45-
const response = await fetch(`${workerUrl}/health`);
46-
expect(response.status).toBe(200);
47-
48-
const data = (await response.json()) as HealthResponse;
49-
expect(data.status).toBe('ok');
50-
51-
// In local mode, verify stdout captured wrangler startup
52-
if (runner) {
53-
const stdout = runner.getStdout();
54-
expect(stdout).toContain('Ready on');
55-
}
56-
});
29+
const data = (await response.json()) as HealthResponse;
30+
expect(data.status).toBe('ok');
5731
});
5832
});

tests/e2e/bucket-mounting.test.ts

Lines changed: 9 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,8 @@
1-
import { afterAll, afterEach, beforeAll, describe, expect, test } from 'vitest';
1+
import { beforeAll, describe, expect, test } from 'vitest';
22
import {
3-
cleanupSandbox,
4-
createSandboxId,
5-
createTestHeaders
6-
} from './helpers/test-fixtures';
7-
import {
8-
getTestWorkerUrl,
9-
type WranglerDevRunner
10-
} from './helpers/wrangler-runner';
3+
getSharedSandbox,
4+
createUniqueSession
5+
} from './helpers/global-sandbox';
116
import type { ExecResult } from '@repo/shared';
127
import type { SuccessResponse, BucketGetResponse } from './test-worker/types';
138

@@ -33,33 +28,19 @@ describe('Bucket Mounting E2E', () => {
3328
}
3429

3530
describe('local', () => {
36-
let runner: WranglerDevRunner | null;
3731
let workerUrl: string;
38-
let currentSandboxId: string | null = null;
32+
let headers: Record<string, string>;
3933

4034
const TEST_BUCKET = 'sandbox-e2e-test';
4135
const MOUNT_PATH = '/mnt/test-data';
4236
const TEST_FILE = `e2e-test-${Date.now()}.txt`;
4337
const TEST_CONTENT = `Bucket mounting E2E test - ${new Date().toISOString()}`;
4438

4539
beforeAll(async () => {
46-
const result = await getTestWorkerUrl();
47-
workerUrl = result.url;
48-
runner = result.runner;
49-
}, 30000);
50-
51-
afterEach(async () => {
52-
if (currentSandboxId) {
53-
await cleanupSandbox(workerUrl, currentSandboxId);
54-
currentSandboxId = null;
55-
}
56-
});
57-
58-
afterAll(async () => {
59-
if (runner) {
60-
await runner.stop();
61-
}
62-
});
40+
const sandbox = await getSharedSandbox();
41+
workerUrl = sandbox.workerUrl;
42+
headers = sandbox.createHeaders(createUniqueSession());
43+
}, 120000);
6344

6445
test('should mount bucket and perform bidirectional file operations', async () => {
6546
// Verify required credentials are present
@@ -76,9 +57,6 @@ describe('Bucket Mounting E2E', () => {
7657
);
7758
}
7859

79-
currentSandboxId = createSandboxId();
80-
const headers = createTestHeaders(currentSandboxId);
81-
8260
const PRE_EXISTING_FILE = `pre-existing-${Date.now()}.txt`;
8361
const PRE_EXISTING_CONTENT =
8462
'This file was created in R2 before mounting';

0 commit comments

Comments
 (0)