- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 4.6k
Add Prometheus metric to track callback logging failures in S3 #16102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: litellm_sameer_oct_staging_2
Are you sure you want to change the base?
Add Prometheus metric to track callback logging failures in S3 #16102
Conversation
| The latest updates on your projects. Learn more about Vercel for GitHub. 
 | 
| except Exception as e: | ||
| verbose_logger.exception(f"s3 Layer Error - {str(e)}") | ||
| pass | ||
| self.handle_callback_failure(callback_name="S3Logger") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which would be less work over time:
- requiring each instance to implement this
- OR having integrations just bubble the error and have litellm_logging handle this?
 @Sameerlite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krrishdholakia 2nd one is less work but the problem is periodic_flush and all method used in it don't raise error or propagate it to litellm_logging. Plus there are tasks which are fire-and-forget which I wasn't able find a way to bubble up those errors. The method I used was making sure that if an error comes, it will get logged in Prometheus
Add proxy support to container apis & logging support (#16049)
Title
Add Prometheus metric to track callback logging failures
Relevant issues
Adds monitoring for callback health - tracks when S3, Langfuse, and other callbacks fail to log events.
Pre-Submission checklist
tests/litellm/directorymake test-unitType
🆕 New Feature
Changes
New Prometheus Metric
Added
litellm_callback_logging_failures_metricto track when callbacks (S3, Langfuse, etc.) fail to log events.Metric:
litellm_callback_logging_failures_metriccallback_name(e.g., "S3Logger", "LangFuseLogger")Example:
Files Modified
enterprise/litellm_enterprise/integrations/prometheus.pyincrement_callback_logging_failure()method (line 1733-1750)litellm/integrations/custom_logger.pyhandle_callback_failure()method that all callbacks can use (line 571-624)litellm/integrations/s3_v2.pyhandle_callback_failure()on upload failures