Skip to content

Conversation

@Bak3y
Copy link

@Bak3y Bak3y commented Sep 15, 2025

Fixes #166

This PR makes probe settings configurable by:

  • Adding comprehensive probe configuration to values.yaml
  • Replacing hardcoded values in DeployType.yaml with template variables
  • Maintaining backward compatibility with existing defaults
  • Including all probe parameters (timeouts, thresholds, periods, delays)

The hardcoded timeoutSeconds: 1 for readiness probes can now be configured based on deployment requirements.

@Bak3y
Copy link
Author

Bak3y commented Dec 1, 2025

@jmcgrath207 am I missing something you needed to review/merge this?

Matt Baker added 2 commits December 1, 2025 08:33
- Add /health endpoint for lightweight readiness checks
- Add timeout to WaitGroup wait to prevent deadlock in getMetrics()
- Update readiness probe to use /health instead of /metrics
- Increase readiness probe timeout from 1s to 2s
- Fix linting issue: use time.Since() instead of time.Now().Sub()
- Add proper error handling for HTTP write operations

Fixes readiness probe failures that occurred without logging errors.
The /health endpoint responds immediately, preventing false failures
when metrics collection is still initializing.
@jmcgrath207
Copy link
Owner

@jmcgrath207 am I missing something you needed to review/merge this?

Sorry about that, @Bak3y, as I was on hiatus on this project. I approve your PR for testing.

@jmcgrath207
Copy link
Owner

@Bak3y Looks good and it's passing the e2e test, but I would like to see this change broken up for the recent commit of the dead lock issue you found. If you have the logs available please add those as well.

- Add /health endpoint for lightweight readiness checks
- Add timeout to WaitGroup wait to prevent deadlock in getMetrics()
- Update readiness probe to use /health instead of /metrics
- Increase readiness probe timeout from 1s to 2s
- Fix linting issue: use time.Since() instead of time.Now().Sub()
- Add proper error handling for HTTP write operations

Fixes readiness probe failures that occurred without logging errors.
The /health endpoint responds immediately, preventing false failures
when metrics collection is still initializing.

After that is cherry-picked away, I can merge and release this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request: make the readiness timeout configurable

2 participants