Skip to content

[EVAL] Add kyrgyzLLM benchmark #1036

@golden-ratio

Description

@golden-ratio

Hi,

We just open-sourced the Kyrgyz LLM Evaluation Dataset.

Evaluation short description

  • Why is this evaluation interesting?

KyrgyzLLM-Bench is the first comprehensive benchmark suite for deep language understanding in Kyrgyz. It is interesting because it provides broad, culturally grounded coverage by combining native benchmarks (such as KyrgyzMMLU and KyrgyzRC) with carefully translated and post-edited international benchmarks (such as HellaSwag, WinoGrande, BoolQ, GSM8K, and TruthfulQA).

  • How used is it in the community?
    As the benchmark was released recently, its adoption by the community is just beginning. It is significant because it's the first comprehensive benchmark suite for deep language understanding, specifically in the Kyrgyz language, providing a new and essential tool for researchers and developers.

Evaluation metadata

Thanks you!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions