Meaning of fold in the results parquet file

Hi! I was trying to match some of my run results with the ones reported in tabarena and had a question about folds and repeats reporting in the provided results dataframe. I am using the following dataframe:

```python
df = pd.read_parquet("https://tabarena.s3.us-west-2.amazonaws.com/results/df_results_leaderboard.parquet")
```

It has the following columns:

```python

# df.columns -> returns
Index(['dataset', 'fold', 'method', 'metric_error', 'time_train_s',
       'time_infer_s', 'metric_error_val', 'config_selected', 'seed',
       'method_metadata', 'ensemble_weight', 'problem_type', 'metric',
       'method_type', 'method_subtype', 'config_type'],
      dtype='object')
```

There are up to 30 "folds" – I get that the paper used 3-fold outer validation for scoring with up to 10 repeats. Can you please help me understand how do dataset fold indexes and repeat indexes are converted to these 30 folds.

Basically, which of the two options is correct (do we first iterate over fold and then over repeat or is it the other way around):

```python
# just an example
df_fold = 13

# option 1:
openml_fold = df_fold // 10
openml_repeat = df_fold % 10

# or option 2:
openml_fold = df_fold % 3
openml_repeat = df_fold // 3
```

---

To help you better understand where my question is coming from. I wanted to quickly check some results on a few different small dataset splits (e.g. fold=0, repeat in range(0,5) -- in openml terms when loading the data). And now I want to match results that I have to the dataframe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Meaning of fold in the results parquet file #209

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Meaning of fold in the results parquet file #209

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions