Skip to content

Evaluator Contribution

guanxinyi edited this page Jun 5, 2025 · 5 revisions

Evaluator Contribution

1. Prepare

Dev env see Installation.

2. Run Evaluator

Complete Configuration and run:

rush dev:eval

3. Evaluator Guide

Main structure of web-bench could see Evaluator

โ”œโ”€ tools
โ”‚ โ”œโ”€ evaluator
โ”‚ โ”‚ โ”œโ”€ src
โ”‚ โ”‚ โ”‚ โ”œโ”€ ignore
โ”‚ โ”‚ โ”‚ โ”œโ”€ log
โ”‚ โ”‚ โ”‚ โ”œโ”€ parser
โ”‚ โ”‚ โ”‚ โ”œโ”€ plugins
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€ evaluator-runner.ts
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€ project-runner.ts
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€ task-runner
โ”‚ โ”‚ โ”‚ โ”œโ”€ runner
โ”‚ โ”‚ โ”‚ โ”œโ”€ settings
โ”‚ โ”‚ โ”‚ โ”œโ”€ utils
  • runner:

    • evaluator-runner: Evaluation entry๏ผŒthe runner processes m*n project-runner (m: projects count, n: models count)
    • project-runner: The runner processes tasks in sequence. Upon reaching the retry limit (2 attempts), it terminates. Evaluator-Workflow Step 2 and Step 9.
    • task-runner: The runner will call agent, rewrite files, init envs,build files, tests and retry. Evaluator-Workflow Step 1 and Step 3-8.
  • plugins:

4. Test

Execute the following command to run evaluations and view the results in apps/eval/report:

rush eval

5. Tips

In the development environment, configure parameters in apps/eval/src/config.json5:

  • logLevel: 'debug', get more information.
  • projects: ['@web-bench/xxxx'], not process all projects.

More details in Config Parameters.

Clone this wiki locally