-
Notifications
You must be signed in to change notification settings - Fork 19
Evaluator Contribution
guanxinyi edited this page Jun 5, 2025
·
5 revisions
Dev env see Installation.
Complete Configuration and run:
rush dev:evalMain structure of web-bench could see Evaluator
โโ tools
โ โโ evaluator
โ โ โโ src
โ โ โ โโ ignore
โ โ โ โโ log
โ โ โ โโ parser
โ โ โ โโ plugins
โ โ โ โ โโ evaluator-runner.ts
โ โ โ โ โโ project-runner.ts
โ โ โ โ โโ task-runner
โ โ โ โโ runner
โ โ โ โโ settings
โ โ โ โโ utils
-
runner:
- evaluator-runner: Evaluation entry๏ผthe runner processes m*n project-runner (m: projects count, n: models count)
- project-runner: The runner processes tasks in sequence. Upon reaching the retry limit (2 attempts), it terminates. Evaluator-Workflow Step 2 and Step 9.
- task-runner: The runner will call agent, rewrite files, init envs,build files, tests and retry. Evaluator-Workflow Step 1 and Step 3-8.
-
plugins:
- In Evaluation Workflow, each step is injected in the form of a Plugin, which includes both the plugin schedule and the specific implementation of each step plugin in Evaluation Workflow.
Execute the following command to run evaluations and view the results in apps/eval/report:
rush evalIn the development environment, configure parameters in apps/eval/src/config.json5:
- logLevel: 'debug', get more information.
- projects: ['@web-bench/xxxx'], not process all projects.
More details in Config Parameters.