Skip to content

sandbox.exec() fails with "Unknown Error, TODO" after /workspace is removed or replaced with symlink #288

@elijahbowie

Description

@elijahbowie

Summary

sandbox.exec() calls fail with CommandError: Unknown Error, TODO after the /workspace directory is removed or replaced with a symlink. This occurs even when the symlink target is a valid, accessible directory. The error message "Unknown Error, TODO" indicates an unhandled edge case in the SDK's command execution path.

SDK Version: @cloudflare/[email protected]
Severity: High - Blocks legitimate use cases for persistent storage mounting
Reproducibility: 100% reproducible


Environment

@cloudflare/sandbox: ^0.6.3
wrangler: ^4.20.0
Node.js: v22.x
Container base image: docker.io/cloudflare/sandbox:0.6.3
Platform: Cloudflare Workers (production deployment)

Reproduction Steps

Minimal Reproduction

import { getSandbox } from '@cloudflare/sandbox';

// 1. Get a sandbox instance
const sandbox = getSandbox(env.Sandbox, 'test-project-id');

// 2. Verify baseline works
await sandbox.exec('echo "baseline works"'); // ✅ Success

// 3. Remove /workspace
await sandbox.exec('rm -rf /workspace'); // ✅ Success (command executes)

// 4. Try ANY subsequent exec() call
await sandbox.exec('echo "after removal"');
// ❌ FAILS: CommandError: Failed to execute command 'echo "after removal"'
//           in session 'sandbox-test-project-id': Unknown Error, TODO

Alternative Reproduction (Symlink)

const sandbox = getSandbox(env.Sandbox, 'test-project-id');

// 1. Create a backup directory
await sandbox.exec('mkdir -p /tmp/workspace_backup');

// 2. Replace /workspace with symlink
await sandbox.exec('rm -rf /workspace && ln -sf /tmp/workspace_backup /workspace');
// ✅ Success (symlink created)

// 3. Try ANY subsequent exec() call
await sandbox.exec('ls -la /workspace');
// ❌ FAILS: CommandError: Unknown Error, TODO

Expected Behavior

  1. After removing /workspace, subsequent exec() calls should either:

    • Continue working (commands execute in whatever CWD the shell defaults to)
    • Return a clear error like WorkingDirectoryNotFound: /workspace does not exist
  2. After replacing /workspace with a symlink to a valid directory, subsequent exec() calls should:

    • Resolve the symlink and execute commands normally
    • The symlink target (/tmp/workspace_backup) is a valid, accessible directory

Actual Behavior

Any exec() call after /workspace removal or replacement fails with:

CommandError: Failed to execute command '<any command>' in session '<session-id>': Unknown Error, TODO

Key observations:

  • The error occurs for any command, including simple ones like echo "test"
  • The error message Unknown Error, TODO suggests an unhandled code path
  • Recovery is impossible - even mkdir -p /workspace fails with the same error
  • The sandbox instance is permanently broken; only creating a new sandbox works

Isolation Testing Results

We conducted systematic testing to isolate the root cause:

Test 1: Symlinks in /tmp (non-/workspace directories)

await sandbox.exec('mkdir -p /tmp/source');
await sandbox.writeFile('/tmp/source/test.txt', 'content');
await sandbox.exec('ln -sf /tmp/source /tmp/target');
await sandbox.exec('cat /tmp/target/test.txt');
// ✅ SUCCESS - Returns "content"

Result: Symlinks work correctly when /workspace is not involved.

Test 2: Remove /workspace without symlinking

await sandbox.exec('rm -rf /workspace');
await sandbox.exec('echo "test"');
// ❌ FAILS: Unknown Error, TODO

Result: Just removing /workspace breaks all subsequent exec() calls.

Test 3: Replace /workspace with symlink

await sandbox.exec('rm -rf /workspace && ln -sf /tmp/valid_dir /workspace');
await sandbox.exec('echo "test"');
// ❌ FAILS: Unknown Error, TODO

Result: Same failure as Test 2.

Test 4: Recreate /workspace after removal

await sandbox.exec('rm -rf /workspace');
await sandbox.exec('mkdir -p /workspace');
// ❌ FAILS: Unknown Error, TODO (cannot even recreate the directory)

Result: The sandbox is unrecoverable after /workspace removal.


Root Cause Hypothesis

Based on the error message and behavior, the SDK likely:

  1. Caches /workspace state at sandbox initialization - The SDK may store the inode, mount point, or file descriptor for /workspace and reuse it for subsequent commands.

  2. Assumes /workspace is immutable - The command execution path doesn't handle the case where /workspace is removed or replaced.

  3. Has incomplete error handling - The literal string "Unknown Error, TODO" in the error message indicates a catch block that was intended to be implemented but wasn't:

// Hypothetical SDK code
try {
  // Execute command
} catch (error) {
  if (error instanceof SomeSpecificError) {
    throw new CommandError(`Specific message: ${error.message}`);
  }
  // This catch-all was never properly implemented
  throw new CommandError('Unknown Error, TODO');
}

Impact & Use Case

Blocked Use Case: Persistent Storage via FUSE Mount

We're building a code sandbox platform that needs file persistence across sandbox restarts. The intended architecture:

  1. Mount R2 bucket at /mnt/r2 using sandbox.mountBucket()
  2. Symlink /workspace to /mnt/r2/projects/{projectId}/workspace
  3. Files are automatically persisted to R2 without explicit sync

This is a common pattern documented in Cloudflare's own examples:

Current Workaround

We're forced to copy files from the FUSE mount to /workspace on startup, which:

  • Adds latency to cold starts (copying vs. direct access)
  • Requires explicit R2 API sync on every file write
  • Defeats the purpose of FUSE mounting (transparent persistence)
// Workaround: Copy instead of symlink
await sandbox.mountBucket(bucketName, '/mnt/r2', options);
await sandbox.exec(`cp -a /mnt/r2/projects/${projectId}/workspace/. /workspace/`);
// Must manually sync to R2 on every write

Live Test Endpoints

We've deployed test endpoints that demonstrate this bug:

# Test 1: Symlinks work in /tmp (proves symlinks aren't broken globally)
curl https://sandbox.forgeagent.app/test/symlink-other-dir

# Test 2: Removing /workspace breaks exec()
curl https://sandbox.forgeagent.app/test/remove-workspace

# Test 3: Comprehensive analysis
curl https://sandbox.forgeagent.app/test/symlink-analysis

Sample Output from /test/symlink-analysis

{
  "testId": "symlink-analysis-1765377724884",
  "test": "Comprehensive symlink analysis",
  "results": {
    "1_baseline": {
      "success": true,
      "output": "baseline\ntotal 8\ndrwxr-xr-x  2 root root 4096 Dec  8 14:46 .\ndrwxr-xr-x 20 root root 4096 Dec 10 14:41 .."
    },
    "2_symlink_in_tmp": {
      "success": true,
      "output": "content"
    },
    "3_backup": {
      "success": true,
      "output": "Backup created"
    },
    "4_replace_workspace": {
      "success": true,
      "output": "Replaced /workspace with symlink"
    },
    "5_exec_after_symlink": {
      "success": false,
      "error": "Error: CommandError: Failed to execute command 'echo \"after symlink\" && pwd && ls -la /workspace 2>&1 || echo \"ls failed\"' in session 'sandbox-symlink-analysis-1765377724884': Unknown Error, TODO"
    }
  },
  "conclusion": "BUG CONFIRMED: Symlinks work in /tmp but fail when /workspace is a symlink.",
  "isBug": true
}

Suggested Fix

Option A: Handle missing /workspace gracefully

In the command execution path, check if /workspace exists before using it as CWD:

async exec(command: string, options?: ExecOptions): Promise<ExecResult> {
  const cwd = options?.cwd || '/workspace';

  // Check if CWD exists and is accessible
  try {
    await this.internalStatPath(cwd);
  } catch (error) {
    throw new CommandError(
      `Working directory '${cwd}' does not exist or is not accessible. ` +
      `If you removed /workspace, create it again with mkdir or specify a different cwd.`
    );
  }

  // ... rest of execution
}

Option B: Follow symlinks for /workspace

If /workspace is a symlink, resolve it before caching:

async initializeWorkspace(): Promise<void> {
  const workspacePath = '/workspace';
  const stat = await this.internalStatPath(workspacePath);

  if (stat.isSymlink) {
    this.resolvedWorkspace = await this.internalReadlink(workspacePath);
  } else {
    this.resolvedWorkspace = workspacePath;
  }
}

Option C: Don't cache /workspace at all

Re-resolve the working directory on each exec() call. This is slightly less performant but more robust.


Test Code for Verification

Here's a self-contained worker that can be deployed to verify the fix:

import { getSandbox, type Sandbox as SandboxType } from '@cloudflare/sandbox';
import { Hono } from 'hono';

export { Sandbox } from '@cloudflare/sandbox';

interface Env {
  Sandbox: DurableObjectNamespace<SandboxType>;
}

const app = new Hono<{ Bindings: Env }>();

app.get('/test/workspace-removal', async (c) => {
  const sandbox = getSandbox(c.env.Sandbox, `test-${Date.now()}`);
  const results: Record<string, { ok: boolean; output?: string; error?: string }> = {};

  // Step 1: Baseline
  try {
    const r = await sandbox.exec('echo "baseline"');
    results.baseline = { ok: true, output: r.stdout };
  } catch (e) {
    results.baseline = { ok: false, error: String(e) };
  }

  // Step 2: Remove /workspace
  try {
    await sandbox.exec('rm -rf /workspace');
    results.removal = { ok: true };
  } catch (e) {
    results.removal = { ok: false, error: String(e) };
  }

  // Step 3: Try exec after removal (THIS IS THE BUG)
  try {
    const r = await sandbox.exec('echo "after removal"');
    results.after_removal = { ok: true, output: r.stdout };
  } catch (e) {
    results.after_removal = { ok: false, error: String(e) };
  }

  // Step 4: Try to recreate /workspace
  try {
    await sandbox.exec('mkdir -p /workspace');
    results.recreate = { ok: true };
  } catch (e) {
    results.recreate = { ok: false, error: String(e) };
  }

  return c.json({
    results,
    bugPresent: !results.after_removal.ok,
    expectedBehavior: 'after_removal and recreate should both succeed',
  });
});

export default app;

Additional Context

  • Worker ID: forge-sandbox
  • Routes: sandbox.forgeagent.app/*, *.sandbox.forgeagent.app/*
  • Container class: Sandbox (Durable Object with container)
  • Base image: docker.io/cloudflare/sandbox:0.6.3

The Dockerfile extends the base image with FUSE support:

FROM docker.io/cloudflare/sandbox:0.6.3
RUN apt-get update && apt-get install -y fuse3 curl bash git
RUN npm install -g typescript ts-node pnpm vite serve http-server
RUN mkdir -p /mnt/r2/projects /mnt/r2/templates
WORKDIR /workspace

Related Documentation


Contact

Happy to provide additional logs, test cases, or join a call to debug this together. This is blocking our production deployment of a developer tools platform.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions