Skip to content

Commit 3f00871

Browse files
authored
Update llm-benchmark.md
1 parent bee82e3 commit 3f00871

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

docs/kagi/ai/llm-benchmark.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The Kagi "offline" Benchmark is an **unpolluted benchmark** to assess large lang
88

99
Unlike standard benchmarks, the tasks in this benchmark are unpublished, not found in training data, or "gamed" in fine-tuning. The task set changes over time (mostly getting more difficult) to better represent the current state of the art.
1010

11-
Last task list revision: **November 7th, 2025**
11+
Last task list revision: **November 19th, 2025**
1212
Tasks: **116**
1313
Input Tokens (all tasks): **15256**
1414

@@ -20,6 +20,7 @@ Please see notes below the table if you see results you find surprising, or get
2020

2121
| model | %accuracy | Cost($) | time/task | tokens | TPS | provider |
2222
|-------------------------------------|-------------|-----------|-------------|----------|-------|------------|
23+
| gemini-3-pro | 80.1 | 0.4 | 54.9 | 15114 | 0.8 | kagi (ult) |
2324
| gpt-5-pro | 76.8 | 31.8 | 193.2 | 330943 | 5.2 | kagi (ult) |
2425
| claude-4-opus-thinking | 74.3 | 22.4 | 13.3 | 17058 | 11.0 | kagi (ult) |
2526
| grok-4 | 73.6 | 1.0 | 65.1 | 3660 | 0.5 | kagi (ult) |

0 commit comments

Comments
 (0)