-
Notifications
You must be signed in to change notification settings - Fork 4
Refactor Heuristic Evaluation Engine to LLM-Based Analysis #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
actually, i realized the mock data (omniparser_client.py) was incorrect. it included an attributes={'level': 'h1'} field that the real omniparser does not actually output. since the real model only returns bounding boxes and text, we cannot rely on font_size or level attributes. please update the logic to infer hierarchy from the element's height (bounds['height']) instead. here is the example output of what omniparser does: |
|
@latishab |
|
I tested your code and I had two issues:
Please make these changes, and possibly provide your API outputs (you can put example raw outputs of OmniParser for testing so you don't have to rely on mock data anymore). |
|
@latishab Have applied the necessary fixes. Everything should now be aligned and working as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested the code so far. And it works well, thank you for your efforts @mohi-devhub .
One note is that I think you should remove .gitignore from your commits in this pull request because it has nothing to do with the overall codebase ai-heuristics-ruxai-firebase-adminsdk-fbsvc-*.json. You can put your keys in secrets/.
Summary
This PR replaces the rule-based
HeuristicEvaluationEnginewith an LLM-driven evaluation pipeline and updates theUIElementmodel to match real OmniParser output. The system now evaluates H1–H5 using GPT-4 with structured, measurable criteria.Key Changes
1. OmniParser Alignment (Commit: 7255ae1)
Updated
UIElementto align with the actual OmniParser format:type,bbox: [x1, y1, x2, y2],interactivity,content.hover_state,confirmation, etc.).width,height.textalias for backward compatibility.from_dict()supporting both old and new formats.infer_heading_level()for text hierarchy calculation.2. Measurable Criteria for H4 & H5 (Commit: 8957369)
Added concrete, structured criteria for LLM evaluation of higher-level heuristics:
3. LLM-Based Evaluation Engine (Commit: ec31f70)
The core evaluation logic is now LLM-driven:
_serialize_elements_for_llm(),_evaluate_with_llm(), and_llm_explain_heuristic().evaluate_heuristic()is now fully LLM-based, supporting H1-H5.llm_explanationevaluation_version="2.0.0-llm"evaluation_method="llm-based"4. Other Fixes & Integrations
OPENAI_BASE_URLand improved LLM client initialization.Impact Summary
Validation
UIElementserialization confirmed to match real OmniParser output.