Add Prometheus metric to track callback logging failures in S3 #16102

Sameerlite · 2025-10-30T17:23:43Z

Title

Add Prometheus metric to track callback logging failures

Relevant issues

Adds monitoring for callback health - tracks when S3, Langfuse, and other callbacks fail to log events.

Pre-Submission checklist

I have Added testing in the tests/litellm/ directory
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🆕 New Feature

Changes

New Prometheus Metric

Added litellm_callback_logging_failures_metric to track when callbacks (S3, Langfuse, etc.) fail to log events.

Metric:

Name: litellm_callback_logging_failures_metric
Type: Counter
Label: callback_name (e.g., "S3Logger", "LangFuseLogger")

Example:

litellm_callback_logging_failures_metric_total{callback_name="S3Logger"} 5.0
litellm_callback_logging_failures_metric_total{callback_name="LangFuseLogger"} 2.0

Files Modified

enterprise/litellm_enterprise/integrations/prometheus.py
- Added metric definition (line 302-306)
- Added increment_callback_logging_failure() method (line 1733-1750)
litellm/integrations/custom_logger.py
- Added handle_callback_failure() method that all callbacks can use (line 571-624)
litellm/integrations/s3_v2.py
- Modified exception handlers to call handle_callback_failure() on upload failures
- Tracks failures in async and sync upload methods
- Changes here were done because we don't raise the error from here. So it makes sense to just call the method here itself

vercel · 2025-10-30T17:23:51Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
litellm	Error			Oct 31, 2025 11:40am

krrishdholakia · 2025-10-31T02:52:52Z

litellm/integrations/s3_v2.py

        except Exception as e:
            verbose_logger.exception(f"s3 Layer Error - {str(e)}")
-            pass
+            self.handle_callback_failure(callback_name="S3Logger")


which would be less work over time:

requiring each instance to implement this

OR having integrations just bubble the error and have litellm_logging handle this?
@Sameerlite

@krrishdholakia 2nd one is less work but the problem is periodic_flush and all method used in it don't raise error or propagate it to litellm_logging. Plus there are tasks which are fire-and-forget which I wasn't able find a way to bubble up those errors. The method I used was making sure that if an error comes, it will get logged in Prometheus

Add proxy support to container apis & logging support (#16049)

Sameerlite added 3 commits October 29, 2025 14:05

Add proxy support to container apis

a140480

Add logging support

627e8e0

prometheus metric measures how often s3_v2 is failing

4dc1807

Sameerlite added 3 commits October 30, 2025 22:58

remove not needed files

3885514

remove not needed files

c18cbf0

remove not needed files

0c6db47

vercel bot had a problem deploying to Preview October 30, 2025 17:34 Failure

fix mypy errors

ffaaa14

vercel bot had a problem deploying to Preview October 30, 2025 17:44 Failure

krrishdholakia reviewed Oct 31, 2025

View reviewed changes

Base automatically changed from litellm_container_proxy_integration to litellm_sameer_oct_staging_2 October 31, 2025 03:02

Merge pull request #16128 from BerriAI/litellm_sameer_oct_staging_2

0dafd8d

Add proxy support to container apis & logging support (#16049)

vercel bot had a problem deploying to Preview October 31, 2025 11:40 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Prometheus metric to track callback logging failures in S3 #16102

Add Prometheus metric to track callback logging failures in S3 #16102

Sameerlite commented Oct 30, 2025 •

edited

Loading

Uh oh!

vercel bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

krrishdholakia Oct 31, 2025

Uh oh!

Sameerlite Oct 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add Prometheus metric to track callback logging failures in S3 #16102

Are you sure you want to change the base?

Add Prometheus metric to track callback logging failures in S3 #16102

Conversation

Sameerlite commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Title

Relevant issues

Pre-Submission checklist

Type

Changes

New Prometheus Metric

Files Modified

Uh oh!

vercel bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krrishdholakia Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Sameerlite Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sameerlite commented Oct 30, 2025 •

edited

Loading

vercel bot commented Oct 30, 2025 •

edited

Loading

Sameerlite Oct 31, 2025 •

edited

Loading