Skip to content

Conversation

@lorenss-m
Copy link
Collaborator

No description provided.

@lorenss-m
Copy link
Collaborator Author

lorenss-m commented Aug 13, 2025

Add hud evaluation support

What changed

  • Added hud_eval.py for running hud tasksets with parallel execution support
  • Added support for hud environment in main.py
  • Added support for final response submission to evaluation

New features

  • Run evaluations: python hud_eval.py --taskset OSWorld-Verified --parallel
  • Use hud with main.py: python main.py --query "..." --env="hud"

@ericpts ericpts self-requested a review August 19, 2025 19:04
@ericpts ericpts merged commit 1083d1c into google-gemini:main Aug 19, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants