-
Notifications
You must be signed in to change notification settings - Fork 414
feat(evaluation): add core evaluation framework #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(evaluation): add core evaluation framework #245
Conversation
2c120c1 to
9aba0da
Compare
9aba0da to
b454fdd
Compare
b454fdd to
f99730b
Compare
|
I saw your question in the community call. I'll have to defer to @mazas-google on roadmap questions regarding Eval. I imagine the API will closely follow adk-python's implementation. |
|
Thanks @ivanmkc!
I'd love to help on this if |
This PR introduces an evaluation framework for testing and measuring AI agent performance, it supports both algorithmic and LLM-as-Judge evaluation methods, with built-in support for response quality, tool usage, safety, and hallucination detection.
Tip
This PR uses atomic commits organized by feature. For the best review experience, I suggest to review commit-by-commit to see the logical progression of the implementation.
Note
I follow conventional commits specification for a structured commit history.
Features:
Usage
Testing
2 examples are provided to demonstrate the features:
Run examples: