feat(hints, summary, docs): Refactor O3 parameters to generic names and restructure GAIA validation docs #58

BinWang28 · 2025-10-01T03:23:30Z

Summary

Renamed O3-specific parameters to generic model-agnostic names across all configs and code
Extracted common GAIA validation content into gaia_validation_prerequisites.md
Renamed config files for consistency: agent_gaia-validation_mirothinker.yaml, agent_gaia-validation_claude37sonnet.yaml
Added multiple runs section for MiroThinker with performance variance explanation
Moved API keys configuration to model-specific documentation

Changes

Parameters: o3_hint → hint_generation, o3_final_answer → final_answer_extraction, etc.
Code: Updated function names and references in orchestrator.py and summary_utils.py
Configs: Updated all YAML files with new parameter names
Docs: Restructured GAIA validation documentation with model-specific files
Scripts: Updated script references to use new config file names

…ia validation

Copilot

Pull Request Overview

This pull request refactors O3-specific parameter names to generic model-agnostic names and restructures GAIA validation documentation. The changes make the codebase more flexible and maintainable by removing model-specific naming conventions.

Key changes include:

Renamed parameters: o3_hint → hint_generation, o3_final_answer → final_answer_extraction
Updated function names and references throughout the codebase
Extracted common GAIA validation content into separate documentation files
Added support for configurable base URLs for hint generation and final answer extraction

Reviewed Changes

Copilot reviewed 35 out of 36 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
utils/util_llm_simple_voting.py	Updated string replacement to use generic "LLM" instead of "O3"
utils/util_llm_parallel_thinking.py	Updated string replacement and improved function parameter formatting
src/utils/summary_utils.py	Renamed functions to generic names and added base_url parameters
src/core/orchestrator.py	Updated function calls and variable names to use generic terminology
config/*.yaml	Updated all configuration files to use new parameter names
docs/mkdocs/docs/*.md	Restructured GAIA validation docs and updated references
scripts/*.sh	Updated script to use new config file names

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-01T03:28:42Z

utils/prepare_benchmark/gen_finsearchcomp.py

-            
+
            yield task
    return


Remove the unnecessary return statement at the end of the generator function. Generator functions don't need explicit empty return statements.

Suggested change

return

Copilot · 2025-10-01T03:28:42Z

src/utils/summary_utils.py

        reasoning_effort="high",
    )
+
+    # response = await client.chat.completions.create(messages = [{"role": "user", "content": content}], model="dummy")


Remove the commented-out debug code. This appears to be leftover debugging code that should be cleaned up.

Suggested change

# response = await client.chat.completions.create(messages = [{"role": "user", "content": content}], model="dummy")

Copilot · 2025-10-01T03:28:43Z

src/utils/summary_utils.py

        model="o3",
        messages=[{"role": "user", "content": f"[{message_id}] {full_prompt}"}],
-        reasoning_effort="medium",
+        # reasoning_effort="medium",


Remove the commented-out parameter. If this parameter is no longer needed, the commented line should be removed to keep the code clean.

Suggested change

# reasoning_effort="medium",

Copilot

Pull Request Overview

Copilot reviewed 35 out of 36 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-01T03:33:29Z

src/utils/summary_utils.py

        reasoning_effort="high",
    )
+
+    # response = await client.chat.completions.create(messages = [{"role": "user", "content": content}], model="dummy")


This commented-out line appears to be debug/test code that should be removed from production code.

Suggested change

# response = await client.chat.completions.create(messages = [{"role": "user", "content": content}], model="dummy")

Copilot · 2025-10-01T03:33:30Z

src/utils/summary_utils.py

+async def get_gaia_answer_type(
+    task_description: str, api_key: str, base_url: str = "https://api.openai.com/v1"
+) -> str:
+    # client = AsyncOpenAI(api_key=api_key, timeout=600)


This commented-out line should be removed as it's redundant with the active line below it.

Suggested change

# client = AsyncOpenAI(api_key=api_key, timeout=600)

Copilot · 2025-10-01T03:33:30Z

src/utils/summary_utils.py

    message_id = _generate_message_id()
    response = await client.chat.completions.create(
        model="o3",
        messages=[{"role": "user", "content": f"[{message_id}] {full_prompt}"}],


This commented-out parameter should either be removed or properly documented if it's intentionally disabled for generic LLM compatibility.

Suggested change

messages=[{"role": "user", "content": f"[{message_id}] {full_prompt}"}],

messages=[{"role": "user", "content": f"[{message_id}] {full_prompt}"}],

# The 'reasoning_effort' parameter is intentionally disabled for compatibility with generic LLM APIs.

# Uncomment and set as needed if using an LLM that supports this parameter, e.g., OpenAI's proprietary models.

BinWang28 added 2 commits October 1, 2025 11:21

update o3 hints and summary to be more general and update docs for ga…

b1cc62b

…ia validation

to pass lint

c84325f

BinWang28 requested a review from Copilot October 1, 2025 03:28

Copilot AI reviewed Oct 1, 2025

View reviewed changes

improve retry

7623afc

BinWang28 requested a review from Copilot October 1, 2025 03:32

Copilot AI reviewed Oct 1, 2025

View reviewed changes

to incorporate copilot

f6a9797

BinWang28 merged commit 6ec4972 into miroflow-v0.3 Oct 1, 2025
1 of 2 checks passed

BinWang28 deleted the mirothinker branch October 1, 2025 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(hints, summary, docs): Refactor O3 parameters to generic names and restructure GAIA validation docs #58

feat(hints, summary, docs): Refactor O3 parameters to generic names and restructure GAIA validation docs #58

Uh oh!

BinWang28 commented Oct 1, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 1, 2025

Uh oh!

Copilot AI Oct 1, 2025

Uh oh!

Copilot AI Oct 1, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 1, 2025

Uh oh!

Copilot AI Oct 1, 2025

Uh oh!

Copilot AI Oct 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(hints, summary, docs): Refactor O3 parameters to generic names and restructure GAIA validation docs #58

feat(hints, summary, docs): Refactor O3 parameters to generic names and restructure GAIA validation docs #58

Uh oh!

Conversation

BinWang28 commented Oct 1, 2025

Summary

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants