|
| 1 | +# In-House Instrumentation Example: Multi-Agent Travel Planner |
| 2 | + |
| 3 | +This directory shows how to manually instrument an in‑house (LangGraph / LangChain‑based) multi‑agent workflow using the structured GenAI types provided by `opentelemetry.util.genai.types`. |
| 4 | + |
| 5 | +The core types: |
| 6 | + |
| 7 | +* `Workflow` – high‑level orchestration span (end‑to‑end request lifecycle). |
| 8 | +* `AgentInvocation` – one logical agent or tool reasoning step. |
| 9 | +* `LLMInvocation` – a single model call (chat / completion / embeddings). |
| 10 | +* `InputMessage` / `OutputMessage` – structured message parts (role + list of parts). Each part can be a `Text`, image, etc. |
| 11 | + |
| 12 | +Benefits of using these types instead of ad‑hoc span attributes: |
| 13 | + |
| 14 | +1. Consistency – every model call captures inputs, outputs, tokens the same way. |
| 15 | +2. Extensibility – evaluation / replay / redaction layers can rely on stable data shapes. |
| 16 | +3. Safety – avoids leaking PII by keeping messages as typed parts you can filter before export. |
| 17 | +4. Metrics – token counts populate standard semantic fields without manual key guessing. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## Minimal LLMInvocation Example (Single OpenAI Chat Call – Direct OpenAI Client) |
| 22 | + |
| 23 | +```python |
| 24 | +from opentelemetry.util.genai.handler import get_telemetry_handler |
| 25 | +from opentelemetry.util.genai.types import ( |
| 26 | + Workflow, |
| 27 | + LLMInvocation, |
| 28 | + InputMessage, |
| 29 | + OutputMessage, |
| 30 | + Text, |
| 31 | +) |
| 32 | +from openai import OpenAI |
| 33 | + |
| 34 | +# Requires: pip install openai ; environment variable OPENAI_API_KEY set. |
| 35 | + |
| 36 | +handler = get_telemetry_handler() |
| 37 | + |
| 38 | +workflow = Workflow( |
| 39 | + name="demo_workflow", |
| 40 | + workflow_type="single_call", |
| 41 | + description="One-off chat completion", |
| 42 | + initial_input="Hello, can you summarise OpenTelemetry?", |
| 43 | +) |
| 44 | +handler.start_workflow(workflow) |
| 45 | + |
| 46 | +llm_invocation = LLMInvocation( |
| 47 | + request_model="gpt-4o-mini", # model identifier |
| 48 | + operation="chat", |
| 49 | + input_messages=[ |
| 50 | + InputMessage(role="system", parts=[Text(content="You are a concise assistant.")]), |
| 51 | + InputMessage(role="user", parts=[Text(content=workflow.initial_input or "")]), |
| 52 | + ], |
| 53 | +) |
| 54 | +llm_invocation.provider = "openai" |
| 55 | +llm_invocation.framework = "native-client" |
| 56 | +handler.start_llm(llm_invocation) |
| 57 | + |
| 58 | +# Convert InputMessages to OpenAI API format (list of {role, content} dicts) |
| 59 | +openai_messages = [ |
| 60 | + {"role": m.role, "content": "".join(part.content for part in m.parts if hasattr(part, "content"))} |
| 61 | + for m in llm_invocation.input_messages |
| 62 | +] |
| 63 | + |
| 64 | +client = OpenAI() |
| 65 | +response = client.chat.completions.create( |
| 66 | + model=llm_invocation.request_model, |
| 67 | + messages=openai_messages, |
| 68 | + temperature=0.2, |
| 69 | +) |
| 70 | + |
| 71 | +# Extract assistant answer |
| 72 | +choice = response.choices[0] |
| 73 | +assistant_text = choice.message.content |
| 74 | + |
| 75 | +llm_invocation.output_messages = [ |
| 76 | + OutputMessage(role="assistant", parts=[Text(content=assistant_text)], finish_reason=choice.finish_reason or "stop") |
| 77 | +] |
| 78 | + |
| 79 | +# Token usage (OpenAI returns usage.prompt_tokens / usage.completion_tokens / usage.total_tokens) |
| 80 | +if response.usage: |
| 81 | + llm_invocation.input_tokens = response.usage.prompt_tokens |
| 82 | + llm_invocation.output_tokens = response.usage.completion_tokens |
| 83 | + |
| 84 | +handler.stop_llm(llm_invocation) |
| 85 | + |
| 86 | +workflow.final_output = assistant_text |
| 87 | +handler.stop_workflow(workflow) |
| 88 | +``` |
| 89 | + |
| 90 | +Key points: |
| 91 | + |
| 92 | +* All user/system inputs are captured up front (`input_messages`). |
| 93 | +* The model response becomes `output_messages` (list for multi‑turn or tool streaming scenarios). |
| 94 | +* Token counts live on the invocation object – downstream metrics aggregators don’t need to parse raw attributes. |
| 95 | + |
| 96 | +--- |
| 97 | + |
| 98 | +## AgentInvocation + LLMInvocation (Typical Pattern – Direct OpenAI Client) |
| 99 | + |
| 100 | +When an agent first reasons about a task (planning, tool selection) you can represent that with `AgentInvocation`. Inside the agent you usually trigger one or more `LLMInvocation`s. |
| 101 | + |
| 102 | +```python |
| 103 | +from opentelemetry.util.genai.types import ( |
| 104 | + Workflow, |
| 105 | + AgentInvocation, |
| 106 | + LLMInvocation, |
| 107 | + InputMessage, |
| 108 | + OutputMessage, |
| 109 | + Text, |
| 110 | +) |
| 111 | +from opentelemetry.util.genai.handler import get_telemetry_handler |
| 112 | +from openai import OpenAI |
| 113 | + |
| 114 | +handler = get_telemetry_handler() |
| 115 | +workflow = Workflow(name="agent_demo", workflow_type="planner", initial_input="Plan a 2-day trip to Rome") |
| 116 | +handler.start_workflow(workflow) |
| 117 | + |
| 118 | +agent = AgentInvocation( |
| 119 | + name="trip_planner", |
| 120 | + agent_type="planner", |
| 121 | + model="gpt-4o-mini", |
| 122 | + system_instructions="You plan concise city itineraries", |
| 123 | + input_context=workflow.initial_input, |
| 124 | +) |
| 125 | +handler.start_agent(agent) |
| 126 | + |
| 127 | +llm_call = LLMInvocation( |
| 128 | + request_model="gpt-4o-mini", |
| 129 | + operation="chat", |
| 130 | + input_messages=[ |
| 131 | + InputMessage(role="system", parts=[Text(content="You provide day-by-day plans.")]), |
| 132 | + InputMessage(role="user", parts=[Text(content="Plan a 2-day trip to Rome focusing on food and history.")]), |
| 133 | + ], |
| 134 | +) |
| 135 | +llm_call.provider = "openai" |
| 136 | +llm_call.framework = "native-client" |
| 137 | +handler.start_llm(llm_call) |
| 138 | + |
| 139 | +client = OpenAI() |
| 140 | +openai_messages = [ |
| 141 | + {"role": m.role, "content": "".join(p.content for p in m.parts if hasattr(p, "content"))} |
| 142 | + for m in llm_call.input_messages |
| 143 | +] |
| 144 | +response = client.chat.completions.create( |
| 145 | + model=llm_call.request_model, |
| 146 | + messages=openai_messages, |
| 147 | + temperature=0.3, |
| 148 | +) |
| 149 | + |
| 150 | +choice = response.choices[0] |
| 151 | +assistant_text = choice.message.content |
| 152 | +llm_call.output_messages = [ |
| 153 | + OutputMessage(role="assistant", parts=[Text(content=assistant_text)], finish_reason=choice.finish_reason or "stop") |
| 154 | +] |
| 155 | +if response.usage: |
| 156 | + llm_call.input_tokens = response.usage.prompt_tokens |
| 157 | + llm_call.output_tokens = response.usage.completion_tokens |
| 158 | + |
| 159 | +agent.output_result = assistant_text |
| 160 | +handler.stop_llm(llm_call) |
| 161 | +handler.stop_agent(agent) |
| 162 | +workflow.final_output = assistant_text |
| 163 | +handler.stop_workflow(workflow) |
| 164 | +``` |
| 165 | + |
| 166 | +Why this structure helps: |
| 167 | + |
| 168 | +* Multiple `LLMInvocation`s inside one agent (tool lookups, reasoning, synthesis) stay grouped beneath the agent span. |
| 169 | +* You can decorate the agent span with evaluation signals later (e.g. quality score) without touching core LLM spans. |
| 170 | +* Redaction / filtering can operate at message part granularity before export. |
| 171 | + |
| 172 | +--- |
| 173 | + |
| 174 | +## Helper Strategy (Token + Output Auto-Population) |
| 175 | + |
| 176 | +In the travel planner example we use a helper to: |
| 177 | + |
| 178 | +1. Create `output_messages` if the node hasn’t set them yet. |
| 179 | +2. Extract token usage from LangChain’s `usage_metadata` or `response_metadata.token_usage`. |
| 180 | + |
| 181 | +Pattern: |
| 182 | + |
| 183 | +```python |
| 184 | +_apply_llm_response_metadata(response_message, llm_invocation) |
| 185 | +``` |
| 186 | + |
| 187 | +Call this immediately after each model invocation (direct OpenAI response object), then stop the LLM span. |
| 188 | + |
| 189 | +--- |
| 190 | + |
| 191 | +## Adding Evaluations Later |
| 192 | + |
| 193 | +Because inputs/outputs are normalized: |
| 194 | + |
| 195 | +* You can iterate over finished `LLMInvocation`s and feed them to an evaluation agent (latency, toxicity, factuality). |
| 196 | +* Link evaluation spans as children or siblings referencing the `llm_invocation_id`. |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +## Minimal Lifecycle Checklist |
| 201 | + |
| 202 | +1. Start `Workflow` (once per external request). |
| 203 | +2. For each logical reasoning component: start `AgentInvocation`. |
| 204 | +3. Inside agent: start one or more `LLMInvocation` spans. |
| 205 | +4. Populate `input_messages` before the call; populate `output_messages` + tokens right after. |
| 206 | +5. Stop spans in reverse order (LLM → Agent → Workflow). |
| 207 | + |
| 208 | +--- |
| 209 | + |
| 210 | +## Troubleshooting |
| 211 | + |
| 212 | +* Missing tokens? Ensure your client/library actually returns usage metadata; not all providers do. |
| 213 | +* Dropped messages? Confirm you set both `input_messages` and `output_messages` *before* stopping the LLM span. |
| 214 | +* Need streaming? Append incremental `OutputMessage` parts as they arrive; finalise with a finish_reason of `stop` or `length`. |
| 215 | + |
| 216 | +--- |
0 commit comments