Skip to content

Commit 5b419d3

Browse files
Dwij1704dot-agi
andauthored
Better root span management (#999)
* Introduced `start_trace` and `end_trace` functions for user-managed tracing, allowing concurrent traces. * Enhance tracing functionality by introducing new decorators and improving session handling. Added `trace`, `session`, `agent`, `task`, `workflow`, and `operation` decorators for better instrumentation. Updated `log_trace_url` to include titles for improved logging context. Refactored `Client` initialization trace name and adjusted end trace state handling. Improved error handling during trace logging in `TracingCore` and removed deprecated session decorator usage. * cleanup * Enhance `init` function in AgentOps SDK by adding new parameters: `tags`, `auto_init`, `skip_auto_end_session`, and `fail_safe`. Updated documentation to reflect changes and merged `tags` with `default_tags` for improved session management. Refactored client initialization to accommodate new options. * Refactor legacy session handling by replacing `LegacySession` with `Session` in `agentops` module. This change improves code clarity and aligns with the updated session management approach. * Enhance unit tests for URL logging in TracingCore and InternalSpanProcessor. Added tests for start and end trace URL logging, handling failures gracefully, and verifying root span tracking. Improved test coverage for session decorators and ensured proper handling of unsampled spans. Refactored existing tests for clarity and consistency. * Refactor CrewAI workflow instrumentation to enhance span management and attribute tracking. Updated span creation to use dynamic workflow names and improved error handling. Adjusted span kinds from CLIENT to INTERNAL for better clarity in tracing. Streamlined attribute setting for agents and tasks, ensuring accurate logging of results and metrics. * Refactor force_flush method in TracingCore to remove timeout parameter, simplifying the flush process. Updated logging to indicate completion of the flush operation. * Refactor Client initialization logic to clarify re-initialization conditions for the API key. Only trigger a warning if a different non-None API key is provided during re-initialization, enhancing the clarity of client behavior. * Refactor Client initialization to support backward compatibility with legacy session wrapper. Update unit tests to enhance coverage for new session management functionality, including explicit trace handling and decorator behavior. Ensure proper integration between new and legacy APIs for session and trace management. * Improve authentication error handling in Client and V3Client. Added exception handling for token fetching and response processing, ensuring clearer error logging and re-raising of exceptions for better testability. Updated integration tests to reset client state between tests. * Refactor integration tests for session concurrency to improve isolation and error handling. Mock API client and tracing core to avoid real authentication during tests. Simplify concurrency test descriptions and ensure proper cleanup of client state between tests. * Enhance trace management by updating end_trace function to allow ending all active traces when no context is provided. Refactor end_all_sessions to utilize the new end_trace functionality, ensuring legacy global state is cleared. Introduce thread-safe handling of active traces in TracingCore with locking mechanisms for improved concurrency. * Enhance trace ID handling in TracingCore by adding exception handling for invalid trace IDs. This ensures robustness when dealing with mocked spans or non-integer trace IDs, improving overall trace management reliability. * revert crewai * Enhance tracing functionality by adding `trace_name` parameter to configuration and initialization. This allows for customizable trace/session naming, improving trace management and clarity in logs. Updated relevant classes and methods to utilize the new parameter. * Remove unused span variables in entity decorator function to streamline code and improve clarity. --------- Co-authored-by: Pratyush Shukla <[email protected]>
1 parent bd397dd commit 5b419d3

File tree

16 files changed

+1478
-704
lines changed

16 files changed

+1478
-704
lines changed

agentops/__init__.py

Lines changed: 64 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,12 @@
1212
LLMEvent,
1313
) # type: ignore
1414

15-
from typing import List, Optional, Union
15+
from typing import List, Optional, Union, Dict, Any
1616
from agentops.client import Client
17+
from agentops.sdk.core import TracingCore, TraceContext
18+
from agentops.sdk.decorators import trace, session, agent, task, workflow, operation
1719

20+
from agentops.logging.config import logger
1821

1922
# Client global instance; one per process runtime
2023
_client = Client()
@@ -53,6 +56,7 @@ def init(
5356
max_queue_size: Optional[int] = None,
5457
tags: Optional[List[str]] = None,
5558
default_tags: Optional[List[str]] = None,
59+
trace_name: Optional[str] = None,
5660
instrument_llm_calls: Optional[bool] = None,
5761
auto_start_session: Optional[bool] = None,
5862
auto_init: Optional[bool] = None,
@@ -78,6 +82,7 @@ def init(
7882
max_queue_size (int, optional): The maximum size of the event queue. Defaults to 512.
7983
tags (List[str], optional): [Deprecated] Use `default_tags` instead.
8084
default_tags (List[str], optional): Default tags for the sessions that can be used for grouping or sorting later (e.g. ["GPT-4"]).
85+
trace_name (str, optional): Name for the default trace/session. If none is provided, defaults to "default".
8186
instrument_llm_calls (bool): Whether to instrument LLM calls and emit LLMEvents.
8287
auto_start_session (bool): Whether to start a session automatically when the client is created.
8388
auto_init (bool): Whether to automatically initialize the client on import. Defaults to True.
@@ -108,6 +113,7 @@ def init(
108113
max_wait_time=max_wait_time,
109114
max_queue_size=max_queue_size,
110115
default_tags=merged_tags,
116+
trace_name=trace_name,
111117
instrument_llm_calls=instrument_llm_calls,
112118
auto_start_session=auto_start_session,
113119
auto_init=auto_init,
@@ -165,26 +171,80 @@ def configure(**kwargs):
165171
# Check for invalid parameters
166172
invalid_params = set(kwargs.keys()) - valid_params
167173
if invalid_params:
168-
from .logging.config import logger
169-
170174
logger.warning(f"Invalid configuration parameters: {invalid_params}")
171175

172176
_client.configure(**kwargs)
173177

174178

179+
def start_trace(
180+
trace_name: str = "session", tags: Optional[Union[Dict[str, Any], List[str]]] = None
181+
) -> Optional[TraceContext]:
182+
"""
183+
Starts a new trace (root span) and returns its context.
184+
This allows for multiple concurrent, user-managed traces.
185+
186+
Args:
187+
trace_name: Name for the trace (e.g., "session", "my_custom_task").
188+
tags: Optional tags to attach to the trace span (list of strings or dict).
189+
190+
Returns:
191+
A TraceContext object containing the span and context token, or None if SDK not initialized.
192+
"""
193+
tracing_core = TracingCore.get_instance()
194+
if not tracing_core.initialized:
195+
# Optionally, attempt to initialize the client if not already, or log a more severe warning.
196+
# For now, align with legacy start_session that would try to init.
197+
# However, explicit init is preferred before starting traces.
198+
logger.warning("AgentOps SDK not initialized. Attempting to initialize with defaults before starting trace.")
199+
try:
200+
init() # Attempt to initialize with environment variables / defaults
201+
if not tracing_core.initialized:
202+
logger.error("SDK initialization failed. Cannot start trace.")
203+
return None
204+
except Exception as e:
205+
logger.error(f"SDK auto-initialization failed during start_trace: {e}. Cannot start trace.")
206+
return None
207+
208+
return tracing_core.start_trace(trace_name=trace_name, tags=tags)
209+
210+
211+
def end_trace(trace_context: Optional[TraceContext] = None, end_state: str = "Success") -> None:
212+
"""
213+
Ends a trace (its root span) and finalizes it.
214+
If no trace_context is provided, ends all active session spans.
215+
216+
Args:
217+
trace_context: The TraceContext object returned by start_trace. If None, ends all active traces.
218+
end_state: The final state of the trace (e.g., "Success", "Failure", "Error").
219+
"""
220+
tracing_core = TracingCore.get_instance()
221+
if not tracing_core.initialized:
222+
logger.warning("AgentOps SDK not initialized. Cannot end trace.")
223+
return
224+
tracing_core.end_trace(trace_context=trace_context, end_state=end_state)
225+
226+
175227
__all__ = [
176228
"init",
177229
"configure",
178230
"get_client",
179231
"record",
232+
"start_trace",
233+
"end_trace",
180234
"start_session",
181235
"end_session",
182236
"track_agent",
183237
"track_tool",
184238
"end_all_sessions",
185-
"Session",
186239
"ToolEvent",
187240
"ErrorEvent",
188241
"ActionEvent",
189242
"LLMEvent",
243+
"Session",
244+
"trace",
245+
"session",
246+
"agent",
247+
"task",
248+
"workflow",
249+
"operation",
190250
]

agentops/client/api/versions/v3.py

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -30,28 +30,26 @@ def fetch_auth_token(self, api_key: str) -> AuthTokenResponse:
3030

3131
r = self.post(path, data, headers)
3232

33+
if r.status_code != 200:
34+
error_msg = f"Authentication failed: {r.status_code}"
35+
try:
36+
error_data = r.json()
37+
if "error" in error_data:
38+
error_msg = f"{error_data['error']}"
39+
except Exception:
40+
pass
41+
logger.error(f"{error_msg} - Perhaps an invalid API key?")
42+
raise ApiServerException(error_msg)
43+
3344
try:
34-
if r.status_code != 200:
35-
error_msg = f"Authentication failed: {r.status_code}"
36-
try:
37-
error_data = r.json()
38-
if "error" in error_data:
39-
error_msg = f"{error_data['error']}"
40-
except Exception:
41-
pass
42-
raise ApiServerException(error_msg)
45+
jr = r.json()
46+
token = jr.get("token")
47+
if not token:
48+
raise ApiServerException("No token in authentication response")
4349

44-
try:
45-
jr = r.json()
46-
token = jr.get("token")
47-
if not token:
48-
raise ApiServerException("No token in authentication response")
49-
50-
return jr
51-
except Exception as e:
52-
raise ApiServerException(f"Failed to process authentication response: {str(e)}")
50+
return jr
5351
except Exception as e:
54-
logger.error(f"{str(e)} - Perhaps an invalid API key?")
55-
return None
52+
logger.error(f"Failed to process authentication response: {str(e)}")
53+
raise ApiServerException(f"Failed to process authentication response: {str(e)}")
5654

5755
# Add V3-specific API methods here

0 commit comments

Comments
 (0)