Skip to content

Commit 86ee7ce

Browse files
committed
Add warning log for output capture timeout
Also removes redundant PID assignment.
1 parent 0c79a0f commit 86ee7ce

File tree

5 files changed

+723
-2
lines changed

5 files changed

+723
-2
lines changed
Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
# Process Callbacks and PID Design
2+
3+
**Date**: 2025-12-01
4+
**Status**: Approved
5+
**Issue**: [#262](https://github.com/cloudflare/sandbox-sdk/issues/262)
6+
7+
## Overview
8+
9+
Fix three related bugs in the process management API:
10+
11+
1. `onOutput`/`onExit` callbacks never fire
12+
2. `getLogs()` returns empty for fast commands (race condition)
13+
3. `pid` is always `undefined`
14+
15+
## Problem Analysis
16+
17+
### Issue 1: Callbacks Never Fire
18+
19+
**Location**: `packages/sandbox/src/sandbox.ts:1247-1300`
20+
21+
The `ProcessOptions` interface defines `onOutput` and `onExit` callbacks, but `startProcess()` only calls `onStart`. The callbacks are defined in the type but never wired up.
22+
23+
**Root cause**: No mechanism exists to deliver events after `startProcess()` returns.
24+
25+
### Issue 2: getLogs() Race Condition
26+
27+
**Location**: `packages/sandbox-container/src/services/session-manager.ts:190-284`
28+
29+
For fast commands like `echo 123`:
30+
31+
1. `startProcess()` returns after processing the first event
32+
2. Remaining events stream in background
33+
3. User calls `getLogs()` immediately
34+
4. Background streaming hasn't updated the ProcessStore yet
35+
36+
**Root cause**: `executeStreamInSession` only awaits the first event, then returns while remaining events process asynchronously.
37+
38+
### Issue 3: PID Always Undefined
39+
40+
**Location**: `packages/sandbox-container/src/services/process-service.ts:126-130`
41+
42+
```typescript
43+
const processRecordData = this.manager.createProcessRecord(
44+
command,
45+
undefined, // PID hardcoded as undefined
46+
options
47+
);
48+
```
49+
50+
**Root cause**: The PID is known when Bun spawns the subprocess, but this value never propagates back up through the layers.
51+
52+
## Design
53+
54+
### Solution 1: Background Stream for Callbacks
55+
56+
When `onOutput` or `onExit` callbacks are provided to `startProcess()`, the SDK automatically opens a background SSE stream to route events.
57+
58+
**SDK changes** (`packages/sandbox/src/sandbox.ts`):
59+
60+
```typescript
61+
async startProcess(command: string, options?: ProcessOptions): Promise<Process> {
62+
const session = sessionId ?? (await this.ensureDefaultSession());
63+
const response = await this.client.processes.startProcess(command, session, requestOptions);
64+
65+
const processObj = this.createProcessFromDTO({...}, session);
66+
67+
// Call onStart immediately
68+
if (options?.onStart) {
69+
options.onStart(processObj);
70+
}
71+
72+
// If callbacks provided, start background streaming
73+
if (options?.onOutput || options?.onExit) {
74+
this.startBackgroundStream(response.processId, options);
75+
}
76+
77+
return processObj;
78+
}
79+
80+
private async startBackgroundStream(processId: string, options: ProcessOptions): Promise<void> {
81+
try {
82+
const stream = await this.client.processes.streamProcessLogs(processId);
83+
84+
for await (const event of parseSSEStream(stream)) {
85+
switch (event.type) {
86+
case 'stdout':
87+
case 'stderr':
88+
options.onOutput?.(event.type, event.data);
89+
break;
90+
case 'exit':
91+
case 'complete':
92+
options.onExit?.(event.exitCode ?? null);
93+
return; // Stream complete, exit loop
94+
}
95+
}
96+
} catch (error) {
97+
options.onError?.(error instanceof Error ? error : new Error(String(error)));
98+
}
99+
}
100+
```
101+
102+
**Lifecycle**: The background stream runs in the Durable Object context. It automatically closes when:
103+
104+
- Process completes (receives `complete`/`exit` event)
105+
- Process is killed
106+
- An error occurs
107+
108+
### Solution 2: Event-Based Completion Detection
109+
110+
Modify `executeStreamInSession` to drain all events if the process completes quickly, rather than using time-based heuristics.
111+
112+
**Container changes** (`packages/sandbox-container/src/services/session-manager.ts`):
113+
114+
```typescript
115+
async executeStreamInSession(
116+
sessionId: string,
117+
command: string,
118+
onEvent: (event: ExecEvent) => Promise<void>,
119+
options: { cwd?: string; env?: Record<string, string> } = {},
120+
commandId: string
121+
): Promise<ServiceResult<{ continueStreaming: Promise<void> }>> {
122+
// ... existing setup code ...
123+
124+
const generator = session.execStream(command, { commandId, cwd, env });
125+
126+
// Process first event
127+
const firstResult = await generator.next();
128+
if (!firstResult.done) {
129+
await onEvent(firstResult.value);
130+
}
131+
132+
// Check if process already completed (generator exhausted or complete event)
133+
if (firstResult.done || firstResult.value?.type === 'complete') {
134+
// Process finished quickly - drain any remaining events synchronously
135+
if (!firstResult.done) {
136+
for await (const event of generator) {
137+
await onEvent(event);
138+
}
139+
}
140+
return {
141+
success: true,
142+
data: { continueStreaming: Promise.resolve() }
143+
};
144+
}
145+
146+
// Process still running - continue in background
147+
const continueStreaming = (async () => {
148+
for await (const event of generator) {
149+
await onEvent(event);
150+
}
151+
})();
152+
153+
return {
154+
success: true,
155+
data: { continueStreaming }
156+
};
157+
}
158+
```
159+
160+
**Key insight**: We don't guess "fast vs slow" by time. We check if the generator is exhausted after the first event. If yes, the process completed quickly and we drain all events before returning.
161+
162+
### Solution 3: Started Event with PID
163+
164+
Add a new `started` event type that includes the PID as the first event yielded by `execStream`.
165+
166+
**Event type addition** (`packages/shared/src/types.ts`):
167+
168+
```typescript
169+
export interface ExecEvent {
170+
type: 'started' | 'stdout' | 'stderr' | 'complete' | 'error';
171+
data?: string;
172+
exitCode?: number;
173+
pid?: number; // Present on 'started' event
174+
}
175+
```
176+
177+
**Session changes** (`packages/sandbox-container/src/session.ts`):
178+
179+
In `execStream`, yield a `started` event immediately after spawning:
180+
181+
```typescript
182+
async *execStream(command: string, options: ExecStreamOptions): AsyncGenerator<ExecEvent> {
183+
const proc = Bun.spawn(['bash', '-c', command], {
184+
cwd: options.cwd,
185+
env: { ...process.env, ...options.env },
186+
stdout: 'pipe',
187+
stderr: 'pipe',
188+
});
189+
190+
// First event: started with PID
191+
yield { type: 'started', pid: proc.pid };
192+
193+
// ... rest of streaming logic ...
194+
}
195+
```
196+
197+
**ProcessService changes** (`packages/sandbox-container/src/services/process-service.ts`):
198+
199+
Capture PID from the started event:
200+
201+
```typescript
202+
async (event) => {
203+
if (event.type === 'started' && event.pid !== undefined) {
204+
processRecord.pid = event.pid;
205+
await this.store.update(processRecord.id, { pid: event.pid });
206+
} else if (event.type === 'stdout' && event.data) {
207+
// ... existing stdout handling ...
208+
}
209+
// ...
210+
};
211+
```
212+
213+
## Implementation Order
214+
215+
1. **Issue 3 (PID)**: Add `started` event - foundational change, minimal risk
216+
2. **Issue 2 (Race condition)**: Event-based completion detection - container-only change
217+
3. **Issue 1 (Callbacks)**: Background streaming - SDK change, depends on working container
218+
219+
## Testing Strategy
220+
221+
### Issue 1: Callbacks
222+
223+
- Test callbacks fire for long-running process
224+
- Test callbacks fire for fast command (`echo 123`)
225+
- Test `onExit` receives correct exit code
226+
- Test error callback on process failure
227+
228+
### Issue 2: Race Condition
229+
230+
- Test `getLogs()` immediately after `startProcess("echo 123")` returns full output
231+
- Test long-running process still returns immediately
232+
- Test multiple rapid `getLogs()` calls return consistent data
233+
234+
### Issue 3: PID
235+
236+
- Test `process.pid` is defined after `startProcess()`
237+
- Test PID matches actual OS process (via `ps` command)
238+
- Test PID in `onStart` callback is defined
239+
240+
## Affected Files
241+
242+
**Container**:
243+
244+
- `packages/sandbox-container/src/services/session-manager.ts`
245+
- `packages/sandbox-container/src/services/process-service.ts`
246+
- `packages/sandbox-container/src/session.ts`
247+
248+
**SDK**:
249+
250+
- `packages/sandbox/src/sandbox.ts`
251+
252+
**Shared**:
253+
254+
- `packages/shared/src/types.ts` (ExecEvent type)

0 commit comments

Comments
 (0)