-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Summary
sandbox.exec() calls fail with CommandError: Unknown Error, TODO after the /workspace directory is removed or replaced with a symlink. This occurs even when the symlink target is a valid, accessible directory. The error message "Unknown Error, TODO" indicates an unhandled edge case in the SDK's command execution path.
SDK Version: @cloudflare/[email protected]
Severity: High - Blocks legitimate use cases for persistent storage mounting
Reproducibility: 100% reproducible
Environment
@cloudflare/sandbox: ^0.6.3
wrangler: ^4.20.0
Node.js: v22.x
Container base image: docker.io/cloudflare/sandbox:0.6.3
Platform: Cloudflare Workers (production deployment)
Reproduction Steps
Minimal Reproduction
import { getSandbox } from '@cloudflare/sandbox';
// 1. Get a sandbox instance
const sandbox = getSandbox(env.Sandbox, 'test-project-id');
// 2. Verify baseline works
await sandbox.exec('echo "baseline works"'); // ✅ Success
// 3. Remove /workspace
await sandbox.exec('rm -rf /workspace'); // ✅ Success (command executes)
// 4. Try ANY subsequent exec() call
await sandbox.exec('echo "after removal"');
// ❌ FAILS: CommandError: Failed to execute command 'echo "after removal"'
// in session 'sandbox-test-project-id': Unknown Error, TODOAlternative Reproduction (Symlink)
const sandbox = getSandbox(env.Sandbox, 'test-project-id');
// 1. Create a backup directory
await sandbox.exec('mkdir -p /tmp/workspace_backup');
// 2. Replace /workspace with symlink
await sandbox.exec('rm -rf /workspace && ln -sf /tmp/workspace_backup /workspace');
// ✅ Success (symlink created)
// 3. Try ANY subsequent exec() call
await sandbox.exec('ls -la /workspace');
// ❌ FAILS: CommandError: Unknown Error, TODOExpected Behavior
-
After removing
/workspace, subsequentexec()calls should either:- Continue working (commands execute in whatever CWD the shell defaults to)
- Return a clear error like
WorkingDirectoryNotFound: /workspace does not exist
-
After replacing
/workspacewith a symlink to a valid directory, subsequentexec()calls should:- Resolve the symlink and execute commands normally
- The symlink target (
/tmp/workspace_backup) is a valid, accessible directory
Actual Behavior
Any exec() call after /workspace removal or replacement fails with:
CommandError: Failed to execute command '<any command>' in session '<session-id>': Unknown Error, TODO
Key observations:
- The error occurs for any command, including simple ones like
echo "test" - The error message
Unknown Error, TODOsuggests an unhandled code path - Recovery is impossible - even
mkdir -p /workspacefails with the same error - The sandbox instance is permanently broken; only creating a new sandbox works
Isolation Testing Results
We conducted systematic testing to isolate the root cause:
Test 1: Symlinks in /tmp (non-/workspace directories)
await sandbox.exec('mkdir -p /tmp/source');
await sandbox.writeFile('/tmp/source/test.txt', 'content');
await sandbox.exec('ln -sf /tmp/source /tmp/target');
await sandbox.exec('cat /tmp/target/test.txt');
// ✅ SUCCESS - Returns "content"Result: Symlinks work correctly when /workspace is not involved.
Test 2: Remove /workspace without symlinking
await sandbox.exec('rm -rf /workspace');
await sandbox.exec('echo "test"');
// ❌ FAILS: Unknown Error, TODOResult: Just removing /workspace breaks all subsequent exec() calls.
Test 3: Replace /workspace with symlink
await sandbox.exec('rm -rf /workspace && ln -sf /tmp/valid_dir /workspace');
await sandbox.exec('echo "test"');
// ❌ FAILS: Unknown Error, TODOResult: Same failure as Test 2.
Test 4: Recreate /workspace after removal
await sandbox.exec('rm -rf /workspace');
await sandbox.exec('mkdir -p /workspace');
// ❌ FAILS: Unknown Error, TODO (cannot even recreate the directory)Result: The sandbox is unrecoverable after /workspace removal.
Root Cause Hypothesis
Based on the error message and behavior, the SDK likely:
-
Caches
/workspacestate at sandbox initialization - The SDK may store the inode, mount point, or file descriptor for/workspaceand reuse it for subsequent commands. -
Assumes
/workspaceis immutable - The command execution path doesn't handle the case where/workspaceis removed or replaced. -
Has incomplete error handling - The literal string
"Unknown Error, TODO"in the error message indicates a catch block that was intended to be implemented but wasn't:
// Hypothetical SDK code
try {
// Execute command
} catch (error) {
if (error instanceof SomeSpecificError) {
throw new CommandError(`Specific message: ${error.message}`);
}
// This catch-all was never properly implemented
throw new CommandError('Unknown Error, TODO');
}Impact & Use Case
Blocked Use Case: Persistent Storage via FUSE Mount
We're building a code sandbox platform that needs file persistence across sandbox restarts. The intended architecture:
- Mount R2 bucket at
/mnt/r2usingsandbox.mountBucket() - Symlink
/workspaceto/mnt/r2/projects/{projectId}/workspace - Files are automatically persisted to R2 without explicit sync
This is a common pattern documented in Cloudflare's own examples:
Current Workaround
We're forced to copy files from the FUSE mount to /workspace on startup, which:
- Adds latency to cold starts (copying vs. direct access)
- Requires explicit R2 API sync on every file write
- Defeats the purpose of FUSE mounting (transparent persistence)
// Workaround: Copy instead of symlink
await sandbox.mountBucket(bucketName, '/mnt/r2', options);
await sandbox.exec(`cp -a /mnt/r2/projects/${projectId}/workspace/. /workspace/`);
// Must manually sync to R2 on every writeLive Test Endpoints
We've deployed test endpoints that demonstrate this bug:
# Test 1: Symlinks work in /tmp (proves symlinks aren't broken globally)
curl https://sandbox.forgeagent.app/test/symlink-other-dir
# Test 2: Removing /workspace breaks exec()
curl https://sandbox.forgeagent.app/test/remove-workspace
# Test 3: Comprehensive analysis
curl https://sandbox.forgeagent.app/test/symlink-analysisSample Output from /test/symlink-analysis
{
"testId": "symlink-analysis-1765377724884",
"test": "Comprehensive symlink analysis",
"results": {
"1_baseline": {
"success": true,
"output": "baseline\ntotal 8\ndrwxr-xr-x 2 root root 4096 Dec 8 14:46 .\ndrwxr-xr-x 20 root root 4096 Dec 10 14:41 .."
},
"2_symlink_in_tmp": {
"success": true,
"output": "content"
},
"3_backup": {
"success": true,
"output": "Backup created"
},
"4_replace_workspace": {
"success": true,
"output": "Replaced /workspace with symlink"
},
"5_exec_after_symlink": {
"success": false,
"error": "Error: CommandError: Failed to execute command 'echo \"after symlink\" && pwd && ls -la /workspace 2>&1 || echo \"ls failed\"' in session 'sandbox-symlink-analysis-1765377724884': Unknown Error, TODO"
}
},
"conclusion": "BUG CONFIRMED: Symlinks work in /tmp but fail when /workspace is a symlink.",
"isBug": true
}Suggested Fix
Option A: Handle missing /workspace gracefully
In the command execution path, check if /workspace exists before using it as CWD:
async exec(command: string, options?: ExecOptions): Promise<ExecResult> {
const cwd = options?.cwd || '/workspace';
// Check if CWD exists and is accessible
try {
await this.internalStatPath(cwd);
} catch (error) {
throw new CommandError(
`Working directory '${cwd}' does not exist or is not accessible. ` +
`If you removed /workspace, create it again with mkdir or specify a different cwd.`
);
}
// ... rest of execution
}Option B: Follow symlinks for /workspace
If /workspace is a symlink, resolve it before caching:
async initializeWorkspace(): Promise<void> {
const workspacePath = '/workspace';
const stat = await this.internalStatPath(workspacePath);
if (stat.isSymlink) {
this.resolvedWorkspace = await this.internalReadlink(workspacePath);
} else {
this.resolvedWorkspace = workspacePath;
}
}Option C: Don't cache /workspace at all
Re-resolve the working directory on each exec() call. This is slightly less performant but more robust.
Test Code for Verification
Here's a self-contained worker that can be deployed to verify the fix:
import { getSandbox, type Sandbox as SandboxType } from '@cloudflare/sandbox';
import { Hono } from 'hono';
export { Sandbox } from '@cloudflare/sandbox';
interface Env {
Sandbox: DurableObjectNamespace<SandboxType>;
}
const app = new Hono<{ Bindings: Env }>();
app.get('/test/workspace-removal', async (c) => {
const sandbox = getSandbox(c.env.Sandbox, `test-${Date.now()}`);
const results: Record<string, { ok: boolean; output?: string; error?: string }> = {};
// Step 1: Baseline
try {
const r = await sandbox.exec('echo "baseline"');
results.baseline = { ok: true, output: r.stdout };
} catch (e) {
results.baseline = { ok: false, error: String(e) };
}
// Step 2: Remove /workspace
try {
await sandbox.exec('rm -rf /workspace');
results.removal = { ok: true };
} catch (e) {
results.removal = { ok: false, error: String(e) };
}
// Step 3: Try exec after removal (THIS IS THE BUG)
try {
const r = await sandbox.exec('echo "after removal"');
results.after_removal = { ok: true, output: r.stdout };
} catch (e) {
results.after_removal = { ok: false, error: String(e) };
}
// Step 4: Try to recreate /workspace
try {
await sandbox.exec('mkdir -p /workspace');
results.recreate = { ok: true };
} catch (e) {
results.recreate = { ok: false, error: String(e) };
}
return c.json({
results,
bugPresent: !results.after_removal.ok,
expectedBehavior: 'after_removal and recreate should both succeed',
});
});
export default app;Additional Context
- Worker ID:
forge-sandbox - Routes:
sandbox.forgeagent.app/*,*.sandbox.forgeagent.app/* - Container class:
Sandbox(Durable Object with container) - Base image:
docker.io/cloudflare/sandbox:0.6.3
The Dockerfile extends the base image with FUSE support:
FROM docker.io/cloudflare/sandbox:0.6.3
RUN apt-get update && apt-get install -y fuse3 curl bash git
RUN npm install -g typescript ts-node pnpm vite serve http-server
RUN mkdir -p /mnt/r2/projects /mnt/r2/templates
WORKDIR /workspaceRelated Documentation
Contact
Happy to provide additional logs, test cases, or join a call to debug this together. This is blocking our production deployment of a developer tools platform.