Skip to content

[QUESTION] About COMET Train and Test Data #241

@moore3930

Description

@moore3930

Hi, I have some questions about the dataset provided here: https://github.com/Unbabel/COMET/tree/master/data

  1. If I understand correctly, the training data for each year's WMT is accumulated from all previous WMTs. Does this mean that here (https://github.com/Unbabel/COMET/tree/master/data), DA data for 2021 is a subset of 2022?

  2. The training data here (https://github.com/Unbabel/COMET/blob/master/configs/models/referenceless_model.yaml) is set to data/1720-da.csv. Does this mean that just merge all DA data here (https://github.com/Unbabel/COMET/tree/master/data) from 2017 to 2020? Are there any duplication issues?

  3. Where can I find the test data if I want to formally test my metric on the WMT21 DA Task?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions