-
Notifications
You must be signed in to change notification settings - Fork 833
fix: add structured outputs schema logging for Anthropic and Gemini #3454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add support for logging gen_ai.request.structured_output_schema attribute for Anthropic Claude and Google Gemini APIs, completing coverage across all major LLM providers. Changes: - Anthropic: Log output_format parameter with json_schema type Supports Claude's new structured outputs feature (launched Nov 2025) for Sonnet 4.5 and Opus 4.1 models - Gemini: Log response_schema from generation_config parameter Supports both generation_config.response_schema and direct response_schema kwargs - OpenAI: Already supported (no changes needed) Sample apps added to demonstrate structured outputs for all three providers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
WalkthroughAdds detection and capture of structured output JSON schemas into instrumentation spans for Anthropic and Google Generative AI; introduces three sample apps demonstrating structured outputs for Anthropic, Google Gemini, and OpenAI. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App as Demo App
participant SDK as Model SDK (Anthropic/Gemini/OpenAI)
participant Instr as Instrumentation/span_utils
participant Tracer as Tracing Backend
App->>SDK: send request (includes output_format / response_schema)
SDK->>Instr: instrumentation hook / set model attributes
alt output schema present (dict/json_schema or response_schema)
Instr-->>Instr: extract schema, json-dump
Instr->>Tracer: set LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA attribute
else no schema
Instr->>Tracer: set existing model attributes (unchanged)
end
SDK->>App: model response
Note right of Tracer: Span now contains structured output schema attribute when available
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1)
170-177: Consider loggingstructured_output_schemaeven when prompt capture is disabled
output_formathandling sits undershould_send_prompts(), soSpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMAwon’t be set when prompt/content capture is turned off, even though this schema is typically configuration rather than user content. Consider moving this block outside theshould_send_prompts()guard so the attribute is always populated whenoutput_formatis present, aligning with how other providers log this attribute.packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py (1)
395-414: Avoid silenttry/except/passwhen serializingresponse_schemaBoth blocks swallow all exceptions when calling
json.dumps(...), which makes schema/serialization issues hard to debug and triggers Ruff warnings (S110, BLE001). Consider narrowing the exception type and logging instead of passing silently, e.g.:- if generation_config and hasattr(generation_config, "response_schema"): - try: - _set_span_attribute( - span, - SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA, - json.dumps(generation_config.response_schema), - ) - except Exception: - pass + if generation_config and hasattr(generation_config, "response_schema"): + try: + _set_span_attribute( + span, + SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA, + json.dumps(generation_config.response_schema), + ) + except (TypeError, ValueError) as exc: + logger.debug( + "Failed to serialize generation_config.response_schema for span: %s", + exc, + ) @@ - if "response_schema" in kwargs: - try: - _set_span_attribute( - span, - SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA, - json.dumps(kwargs.get("response_schema")), - ) - except Exception: - pass + if "response_schema" in kwargs: + try: + _set_span_attribute( + span, + SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA, + json.dumps(kwargs.get("response_schema")), + ) + except (TypeError, ValueError) as exc: + logger.debug( + "Failed to serialize kwargs['response_schema'] for span: %s", + exc, + )This keeps failures non-fatal while giving observability into bad schemas.
Please verify with your supported
generation_config.response_schema/response_schematypes thatjson.dumps(...)(or any custom encoder you choose) behaves as expected across the Google Generative AI SDK versions you intend to support.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py(1 hunks)packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py(1 hunks)packages/sample-app/sample_app/anthropic_structured_outputs_demo.py(1 hunks)packages/sample-app/sample_app/gemini_structured_outputs_demo.py(1 hunks)packages/sample-app/sample_app/openai_structured_outputs_demo.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/gemini_structured_outputs_demo.pypackages/sample-app/sample_app/anthropic_structured_outputs_demo.pypackages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.pypackages/sample-app/sample_app/openai_structured_outputs_demo.pypackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
🧬 Code graph analysis (5)
packages/sample-app/sample_app/gemini_structured_outputs_demo.py (3)
packages/traceloop-sdk/traceloop/sdk/__init__.py (2)
Traceloop(37-275)init(49-206)packages/sample-app/sample_app/anthropic_structured_outputs_demo.py (1)
main(15-52)packages/sample-app/sample_app/openai_structured_outputs_demo.py (1)
main(22-35)
packages/sample-app/sample_app/anthropic_structured_outputs_demo.py (3)
packages/traceloop-sdk/traceloop/sdk/__init__.py (2)
Traceloop(37-275)init(49-206)packages/sample-app/sample_app/gemini_structured_outputs_demo.py (1)
main(15-45)packages/sample-app/sample_app/openai_structured_outputs_demo.py (1)
main(22-35)
packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py (2)
packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py (1)
_set_span_attribute(18-22)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)
packages/sample-app/sample_app/openai_structured_outputs_demo.py (3)
packages/traceloop-sdk/traceloop/sdk/__init__.py (2)
Traceloop(37-275)init(49-206)packages/sample-app/sample_app/anthropic_structured_outputs_demo.py (1)
main(15-52)packages/sample-app/sample_app/gemini_structured_outputs_demo.py (1)
main(15-45)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)
set_span_attribute(21-25)
🪛 Flake8 (7.3.0)
packages/sample-app/sample_app/anthropic_structured_outputs_demo.py
[error] 1-1: 'os' imported but unused
(F401)
packages/sample-app/sample_app/openai_structured_outputs_demo.py
[error] 4-4: 'opentelemetry.sdk.trace.export.ConsoleSpanExporter' imported but unused
(F401)
🪛 Ruff (0.14.5)
packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py
403-404: try-except-pass detected, consider logging the exception
(S110)
403-403: Do not catch blind exception: Exception
(BLE001)
413-414: try-except-pass detected, consider logging the exception
(S110)
413-413: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
🔇 Additional comments (1)
packages/sample-app/sample_app/gemini_structured_outputs_demo.py (1)
1-49: Gemini structured outputs demo looks goodThe demo cleanly configures the client from environment, defines a simple JSON schema, and uses
GenerationConfig.response_schemaconsistently with the other providers. No changes needed from my side.
packages/sample-app/sample_app/anthropic_structured_outputs_demo.py
Outdated
Show resolved
Hide resolved
Remove unused imports to fix flake8 lint errors: - Remove unused 'os' import from anthropic_structured_outputs_demo.py - Remove unused 'ConsoleSpanExporter' import from openai_structured_outputs_demo.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/sample-app/sample_app/openai_structured_outputs_demo.py (1)
25-25: Consider aligning the prompt with other demos.The prompt in this demo doesn't explicitly request a rating, while the Anthropic and Gemini demos both ask to "rate it." Although structured outputs will enforce the schema regardless, explicitly requesting the rating improves output quality and consistency across demos.
- messages=[{"role": "user", "content": "Tell me a joke about OpenTelemetry"}], + messages=[{"role": "user", "content": "Tell me a joke about OpenTelemetry and rate it from 1 to 10"}],
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
packages/sample-app/sample_app/anthropic_structured_outputs_demo.py(1 hunks)packages/sample-app/sample_app/openai_structured_outputs_demo.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/sample-app/sample_app/anthropic_structured_outputs_demo.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/openai_structured_outputs_demo.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Instrumentation packages must leverage the semantic conventions package and emit OTel-compliant spans
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: For debugging OpenTelemetry spans, use ConsoleSpanExporter with Traceloop to print spans to console
Applied to files:
packages/sample-app/sample_app/openai_structured_outputs_demo.py
🧬 Code graph analysis (1)
packages/sample-app/sample_app/openai_structured_outputs_demo.py (3)
packages/traceloop-sdk/traceloop/sdk/__init__.py (2)
Traceloop(37-275)init(49-206)packages/sample-app/sample_app/anthropic_structured_outputs_demo.py (1)
main(14-51)packages/sample-app/sample_app/gemini_structured_outputs_demo.py (1)
main(15-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (6)
packages/sample-app/sample_app/openai_structured_outputs_demo.py (6)
1-5: LGTM! Unused import issue resolved.The imports are clean and all used in the code. The previously flagged
ConsoleSpanExporterimport has been removed.
7-9: LGTM! Proper API key handling.Environment variables are loaded correctly, and the API key is retrieved from the environment as per coding guidelines.
11-13: LGTM!Traceloop initialization is correct with an appropriate app name for this demo.
16-18: LGTM!The Pydantic model is well-defined for structured output validation.
37-38: LGTM!Standard entry point implementation is correct.
23-27: Model and beta API endpoint verified as available; note known SDK parsing issues.Verification confirms that
gpt-4o-2024-08-06is still available and actively supported by OpenAI (including for fine-tuning), and theclient.beta.chat.completions.parsebeta endpoint is available. However, the openai-python SDK has known integration bugs with parse() related to JSON validation and edge cases in parsed responses. Test your structured output handling thoroughly and monitor the openai-python repository for bug fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed d6360b2 in 13 minutes and 44 seconds. Click for details.
- Reviewed
21lines of code in2files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/anthropic_structured_outputs_demo.py:1
- Draft comment:
Good removal of unused 'os' import to keep the code clean. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/sample-app/sample_app/openai_structured_outputs_demo.py:4
- Draft comment:
Removed unused 'ConsoleSpanExporter' import; this is a good cleanup. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_IqIYoUKp7bNNE3SH
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Add comprehensive test coverage for Anthropic structured outputs feature: - Three test scenarios: legacy attributes, with content events, without content - Tests verify gen_ai.request.structured_output_schema attribute is logged - Enhanced span_utils.py to handle both json_schema and json output formats Note: Tests are currently skipped as they require anthropic SDK >= 0.50.0 which supports the output_format parameter. The feature was announced in November 2025 but the SDK version (0.49.0) doesn't yet support it. Tests will be enabled once the SDK is updated. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 1de9ffa in 89 minutes and 38 seconds. Click for details.
- Reviewed
214lines of code in5files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py:170
- Draft comment:
Consider handling cases where the provided schema might not be JSON serializable. Logging or error handling would help diagnose issues. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is suggesting defensive programming for json.dumps(), but this appears to be speculative rather than identifying a real issue. The function is already wrapped with @dont_throw decorator which should handle exceptions. Additionally, the same pattern of calling json.dumps() without explicit try-catch is used throughout the file (lines 154, 167, 243, 311), so this would be an inconsistent suggestion unless applied everywhere. The comment doesn't point to a specific bug introduced by this change - it's more of a general code quality suggestion that could apply to many places in the codebase. According to the rules, speculative comments should be removed, and comments should only be kept if there's strong evidence of an issue. Could the schema contain non-serializable objects that would cause json.dumps() to fail? Perhaps the @dont_throw decorator doesn't provide adequate error visibility, and explicit logging would be better for debugging. Maybe this specific case is more prone to serialization issues than the other json.dumps() calls in the file. While it's theoretically possible for the schema to be non-serializable, the comment is speculative and doesn't provide evidence that this is a real issue. The @dont_throw decorator already provides error handling at the function level, and the same pattern is used consistently throughout the file. If this were a real concern, it would apply to all json.dumps() calls, not just this one. The comment doesn't identify a specific problem with the change. This comment should be deleted. It's a speculative suggestion about potential error handling that doesn't identify a specific issue with the code change. The function is already protected by the @dont_throw decorator, and the same json.dumps() pattern is used consistently throughout the file without additional error handling.
2. packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py:395
- Draft comment:
If both generation_config.response_schema and kwargs['response_schema'] are provided, the latter overwrites the former. Verify if this override behavior is intended. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the author to verify if the override behavior is intended, which is against the rules. It does not provide a specific suggestion or ask for a test to be written. Therefore, it should be removed.
3. packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py:396
- Draft comment:
Consider logging exceptions in the try/except blocks when setting the structured output schema to aid future debugging. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This comment is about code that was added in the diff (lines 395-414). It's suggesting a code quality improvement - adding logging to exception handlers. The file already has logging infrastructure in place and uses it elsewhere (lines 70, 118 show similar patterns with logger.warning). The suggestion is actionable and clear. However, I need to consider the rules: "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This seems to fit that criteria. But I also need to consider if this is "obvious or unimportant" - the code currently silently swallows exceptions, which could make debugging harder. The comment is making a reasonable suggestion for better observability. However, the pattern of silent exception handling might be intentional here (similar to the @dont_throw decorator used elsewhere). Looking more carefully, there are other places in the code with similar bare exception handlers that don't log (lines 43-44 in _is_image_part). This suggests the silent exception handling might be a deliberate pattern in this codebase. The silent exception handling might be intentional and consistent with other patterns in the codebase (e.g., _is_image_part function). The comment is suggesting a style preference rather than fixing a clear bug. Without knowing if there's a specific debugging issue or if this pattern is deliberately chosen for performance or other reasons, this might be speculative or stylistic rather than clearly necessary. While there are some instances of silent exception handling in the codebase, there are also examples where exceptions ARE logged (lines 70, 118). The new code is handling JSON serialization which could fail in various ways, and logging would help understand why. However, this is still a "nice to have" suggestion rather than a clear bug fix. The rule states comments should be about clear code changes required, not suggestions for improvements unless they're clearly actionable quality refactors. This is a reasonable code quality suggestion, but it's not a required change - it's more of a "nice to have" improvement. The codebase shows mixed patterns (some exceptions logged, some not), and the silent exception handling might be intentional. Given the rule to only keep comments with STRONG EVIDENCE they are correct and necessary, and that by default we should assume comments are not useful, this should be deleted.
Workflow ID: wflow_QU6hLC1XyLCmcwJo
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed ca5f423 in 34 minutes and 3 seconds. Click for details.
- Reviewed
222lines of code in2files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py:178
- Draft comment:
Consider caching the result of output_format.get('json_schema') in a variable for clarity before accessing the 'schema' key. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py:172
- Draft comment:
Add an inline comment explaining the difference between 'json_schema' and 'json' types in output_format to aid future maintenance. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
3. packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py:43
- Draft comment:
Remove the duplicate pytest.mark.skip decorator to avoid redundancy. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_ZrOmwwx7Az5swzKf
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
galkleinman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
neat: consider moving magic strings to consts
Summary
Adds support for logging the
gen_ai.request.structured_output_schemaattribute for Anthropic Claude and Google Gemini APIs, completing coverage across all major LLM providers.Changes
Anthropic Claude
output_formatparameter withjson_schematypeanthropic-beta: structured-outputs-2025-11-13packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.pyGoogle Gemini
response_schemafromgeneration_configparameterresponse_schemakwargspackages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.pyOpenAI
Sample Apps
Added demonstration apps for all three providers:
packages/sample-app/sample_app/openai_structured_outputs_demo.py(tested ✅)packages/sample-app/sample_app/anthropic_structured_outputs_demo.pypackages/sample-app/sample_app/gemini_structured_outputs_demo.pyTesting
OpenAI sample app tested successfully and shows the
gen_ai.request.structured_output_schemaattribute being logged correctly.Related Documentation
🤖 Generated with Claude Code
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.
Important
Adds structured output schema logging for Anthropic and Google Gemini APIs, with sample apps and tests.
gen_ai.request.structured_output_schemafor Anthropic and Google Gemini APIs.output_formatwithjson_schematype inspan_utils.py.response_schemafromgeneration_configor kwargs inspan_utils.py.test_structured_outputs.pyfor Anthropic, currently skipped due to SDK version.anthropic_structured_outputs_demo.py,gemini_structured_outputs_demo.py, andopenai_structured_outputs_demo.pyfor demonstration.This description was created by
for ca5f423. You can customize this summary. It will automatically update as commits are pushed.