Skip to content

[TabArena] Model Usage Workflow Brainstorming #159

@LennartPurucker

Description

@LennartPurucker

Below are some thoughts on how we could create the optimal model usage workflow. The idea is to enable all models implemented as part of TabArena to be easily usable by anyone on new datasets, just like any other scikit-learn compatible model.

The current workflow is shown here: https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabarena_minimal_example/run_tabarena_model.py

My ideal workflow would be something like this:

from sklearn.datasets import load_breast_cancer
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

# Get Data
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

# Import a TabArena model
from tabarena.pipelines import RealMLP # or any other model

# Train TabArena Model
clf = RealMLP()
clf.fit(X=X_train, y=y_train)

# Predict and score
prediction_probabilities = clf.predict_proba(X=X_test)
print("ROC AUC:", roc_auc_score(y_test, prediction_probabilities))

Some requirements that would be nice to have:

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions