Skip to content

Conversation

@mihai-chiorean
Copy link

  • Add LISTEN_ADDRESS environment variable (default: ":5000")
  • Update Dockerfile to include LISTEN_ADDRESS env var
  • Change logging from println to log.Printf for consistency
  • Allows binding to localhost (127.0.0.1:5000) or other addresses
  • Maintains backward compatibility with default :5000

Usage examples:

  • Default: docker run ... (listens on :5000)
  • Localhost only: docker run -e LISTEN_ADDRESS=127.0.0.1:5000 ...
  • Custom port: docker run -e LISTEN_ADDRESS=:8080 ...

- Add LISTEN_ADDRESS environment variable (default: ":5000")
- Update Dockerfile to include LISTEN_ADDRESS env var
- Change logging from println to log.Printf for consistency
- Allows binding to localhost (127.0.0.1:5000) or other addresses
- Maintains backward compatibility with default :5000

Usage examples:
- Default: docker run ... (listens on :5000)
- Localhost only: docker run -e LISTEN_ADDRESS=127.0.0.1:5000 ...
- Custom port: docker run -e LISTEN_ADDRESS=:8080 ...
Graceful Shutdown:
- Add signal handling for SIGTERM and SIGINT
- Implement graceful server shutdown with configurable timeout
- Server waits for in-flight requests to complete before stopping
- Prevents data corruption and improves deployment experience

Configurable Timeouts:
- READ_TIMEOUT: timeout for reading requests (default: 5m)
- WRITE_TIMEOUT: timeout for writing responses (default: 5m)
- IDLE_TIMEOUT: timeout for idle keep-alive connections (default: 120s)
- SHUTDOWN_TIMEOUT: max time to wait for graceful shutdown (default: 30s)

Configurable Limits:
- BLOB_LEASE_EXPIRATION: blob lease duration before GC (default: 15m)
- MAX_MANIFEST_SIZE: max manifest size in bytes (default: 4194304 = 4 MiB)

Implementation:
- Replace hardcoded constants with configurable struct fields
- Add helper functions for parsing duration and int64 env vars
- Use http.Server instead of http.ListenAndServe for timeout support
- Update Dockerfile with all new environment variables and defaults
- Maintain backward compatibility with sensible defaults

Resolves issues:
- No graceful shutdown (log.Fatal killed immediately)
- Hardcoded configuration values
Issue:
- log.Panic(err) in containerdBlobWriter.Size() could crash entire server
- Single bad request or timing issue could cause full service outage
- Panic is disproportionate response to a status query failure

Solution:
- Replace log.Panic with log.Printf to log error without crashing
- Return 0 as safe fallback value instead of -1
- Add explanatory comment about error handling strategy
- Maintains server availability even when individual requests fail

Impact:
- Improves server resilience and availability
- Prevents cascading failures from single request issues
- Size() is called after cacheStatus() in normal flow, so errors are rare
- Returning 0 is safer than crashing (descriptor will still be created)
Readiness Endpoint:
- Add GET /readyz endpoint to check containerd connectivity
- Returns 200 OK if containerd is connected and responding
- Returns 503 Service Unavailable if containerd is unreachable
- Uses 2-second timeout to avoid blocking
- Complements existing /v2/ liveness check from OCI spec

Request Logging:
- Add HTTP middleware to log all requests with details
- Captures: method, path, remote address, status, duration, bytes written
- Automatic log levels: ERROR (5xx), WARN (4xx), INFO (2xx/3xx)
- Performance metrics: duration formatting (ms/s), byte size formatting (KB/MB/GB)

Configurable Log Format:
- LOG_FORMAT=text (default): Human-readable for SSH debugging
  Example: 2025-01-20 10:23:45 INFO GET /v2/myapp/manifests/latest 200 1.2s 4.5MB
- LOG_FORMAT=json: Structured JSON for log aggregation
  Example: {"time":"2025-01-20T10:23:45Z","level":"info","method":"GET",...}

Implementation:
- Lightweight responseWriter wrapper to capture status/bytes
- Zero-allocation for common paths
- Logs after request completes (non-blocking)
- Update Dockerfile with LOG_FORMAT environment variable

Benefits:
- Monitor Jetson device health remotely via /readyz
- Debug push/pull failures with detailed request logs
- Choose format based on environment (dev vs production)
- Near-zero performance overhead
@mihai-chiorean
Copy link
Author

I continued making fixes and updates on my fork. didn't realize it's getting pulled into this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant