Skip to content

Conversation

@ghostwriternr
Copy link
Member

Summary

Fixes #262

Problem

Three related issues with process execution:

  1. PID always undefined in onStart: The process record wasn't being updated with the actual PID after spawn
  2. Callbacks never firing: Status and output listeners were lost when the process record was updated
  3. getLogs returns empty for fast commands: Background execution wrote the exit code before output was fully captured

Solution

  1. PID capture: Ensure the process record is updated with PID immediately after spawn
  2. Listener preservation: Update process records in a way that preserves existing listener references
  3. Output synchronization: Wait for output labelers to complete before reading final logs, using a marker file to signal completion

@changeset-bot
Copy link

changeset-bot bot commented Dec 2, 2025

🦋 Changeset detected

Latest commit: bbc5345

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/sandbox Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

agents-git-bot bot pushed a commit to cloudflare/cloudflare-docs that referenced this pull request Dec 2, 2025
Add documentation for process lifecycle callbacks (onStart, onOutput, onExit,
onError) in startProcess() that enable real-time process monitoring without
manual log streaming.

Clarify that getProcessLogs() reliably captures output from fast-completing
commands following race condition fix.

Related to cloudflare/sandbox-sdk#267
@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 2, 2025

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/sandbox-sdk/@cloudflare/sandbox@267

commit: bbc5345

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

🐳 Docker Images Published

Default (no Python):

FROM cloudflare/sandbox:0.0.0-pr-267-6f24c60

With Python:

FROM cloudflare/sandbox:0.0.0-pr-267-6f24c60-python

Version: 0.0.0-pr-267-6f24c60

Use the -python variant if you need Python code execution.

claude[bot]

This comment was marked as outdated.

claude[bot]

This comment was marked as outdated.

For background execution, the exit code file was written before output
labelers finished writing to the log. Added a marker file that signals
when labelers complete, ensuring all output is captured before getLogs
reads the log file.
@ghostwriternr ghostwriternr force-pushed the fix-sandbox-friction-issues branch from 30d4a9d to e892093 Compare December 2, 2025 13:56
claude[bot]

This comment was marked as outdated.

claude[bot]

This comment was marked as outdated.

For fast commands like 'echo', getLogs() needs complete output without
artificial delays. For long-running processes like servers, it should
return current output without blocking.

The subprocess PID provides a deterministic way to distinguish these
cases: dead PID means command finished and we should wait for streaming
callbacks to complete. Live PID means command is still running and we
return immediately.
claude[bot]

This comment was marked as outdated.

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

Verdict: Ready to merge. The implementation correctly fixes all three stated problems with architecturally sound solutions.

What This PR Fixes

  1. PID capture: session.ts:347-363 now waits for PID file before yielding start event, and process-service.ts:153-154 stores it immediately
  2. Listener preservation: process-store.ts:64-65 mutates in-place with Object.assign() instead of creating new objects, preserving ProcessService's listener references
  3. Output synchronization: Shell script creates labelers.done marker file, session.ts:419-443 waits for it, and getProcess() awaits streamingComplete before returning logs

Architecture Alignment

Excellent. All changes follow existing patterns:

  • StreamingComplete promise for async coordination
  • Marker files for IPC (following pid/exit file pattern)
  • Reference preservation for event routing
  • Synchronous first-event processing in SessionManager

Edge Cases Handled Well

  • PID file timeout: logs warning, continues with undefined (process still works)
  • Labelers timeout: logs warning, proceeds (5s is generous)
  • Process read from disk: correctly skips streamingComplete await
  • Early SDK streaming: safe because PID captured before return

Minor Note: PID Reuse

process-service.ts:289-298 uses process.kill(pid, 0) to detect dead processes. This has theoretical PID reuse risk (OS reuses PID for new process between check and await). However, the impact is minimal because:

  • Terminal status check is the primary gate
  • PID check is only a fallback for fast commands
  • Time window is tiny (microseconds)
  • labelers.done provides deterministic sync

This is a pragmatic tradeoff and unlikely to cause real issues.

Suggestions (Non-blocking)

  1. Add comment explaining PID reuse limitation at process-service.ts:286
  2. Consider unit test explicitly validating callback firing (currently covered implicitly by E2E tests)

Testing

E2E test properly validates:

  • PID availability (line 244-246)
  • Fast command log capture (line 258-259)

Changeset correctly formatted and scoped to @cloudflare/sandbox only.

@ghostwriternr ghostwriternr merged commit 204f9ac into main Dec 4, 2025
20 checks passed
@ghostwriternr ghostwriternr deleted the fix-sandbox-friction-issues branch December 4, 2025 13:35
@github-actions github-actions bot mentioned this pull request Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Friction log

1 participant