Skip to content

Commit 4787a48

Browse files
author
maksimov maksim
committed
fix pre-commit
1 parent f2bb35f commit 4787a48

File tree

1 file changed

+16
-15
lines changed

1 file changed

+16
-15
lines changed

README.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -573,31 +573,32 @@ We conducted a comprehensive benchmark evaluation using the [SimpleQA](https://h
573573
![SimpleQA Benchmark Comparison](assets/simpleqa_benchmark_comprasion.png)
574574

575575
**Performance Metrics:**
576+
576577
- **Accuracy:** 86.08%
577578
- **Correct:** 3,724 answers
578579
- **Incorrect:** 554 answers
579580
- **Not Attempted:** 48 answers
580581

581582
**Benchmark Configuration:**
582583

583-
| Component | Parameter | Value |
584-
|-----------|-----------|-------|
585-
| **Search Engine** | Provider | Tavily Basic Search |
586-
| | Scraping Enabled | Yes |
587-
| | Max Pages | 5 |
588-
| | Content Limit | 33,000 characters |
589-
| **Agent** | Name | sgr_tool_calling_agent |
590-
| | Max Steps | 20 |
591-
| **LLM (Agent)** | Model | gpt-4o-mini |
592-
| | Max Tokens | 12,000 |
593-
| | Temperature | 0.2 |
594-
| **LLM (Judge)** | Model | gpt-4o |
595-
| | Max Tokens | Default |
596-
| | Temperature | Default |
584+
| Component | Parameter | Value |
585+
| ----------------- | ---------------- | ---------------------- |
586+
| **Search Engine** | Provider | Tavily Basic Search |
587+
| | Scraping Enabled | Yes |
588+
| | Max Pages | 5 |
589+
| | Content Limit | 33,000 characters |
590+
| **Agent** | Name | sgr_tool_calling_agent |
591+
| | Max Steps | 20 |
592+
| **LLM (Agent)** | Model | gpt-4o-mini |
593+
| | Max Tokens | 12,000 |
594+
| | Temperature | 0.2 |
595+
| **LLM (Judge)** | Model | gpt-4o |
596+
| | Max Tokens | Default |
597+
| | Temperature | Default |
597598

598599
Detailed benchmark results are available in [this spreadsheet](assets/simpleqa_result.xlsx).
599600

600-
---
601+
______________________________________________________________________
601602

602603
The project includes benchmarking capabilities using the **SimpleQA** dataset from DeepMind/Kaggle. The benchmark automatically runs the SGR agent on each question and uses an LLM judge to grade the answers.
603604

0 commit comments

Comments
 (0)