Add hyperparameter optimization tutorial

dzoanka · dzoanka · commit 3d362bd5b187 · 2025-07-25T14:59:29.000-07:00
diff --git a/notebooks/hyperparameter-optimization.ipynb b/notebooks/hyperparameter-optimization.ipynb
@@ -0,0 +1,177 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "93eda882-3c30-46cf-a33c-0a6f29ca18b4",
+   "metadata": {},
+   "source": [
+    "# Hyperparameter Optimization with Optuna in cellmaps_vnn\n",
+    "\n",
+    "This tutorial shows how to define a training configuration with search spaces for Optuna, run training using the config file, and then use the resulting optimized parameters for prediction."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4eb7598-d496-4864-ad70-44596bbfcfc8",
+   "metadata": {},
+   "source": [
+    "### Step 1: Define Your Configuration \n",
+    "Below we define a configuration dictionary for training.\n",
+    "\n",
+    "If a parameter is given a list of values, Optuna will treat it as a search space. If it is a single value, it will remain fixed during training."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3fe289d2-2a8b-46e3-a48a-e3d7021736dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define or modify your hyperparameter configuration here\n",
+    "config = {\n",
+    "    # Training settings\n",
+    "    'epoch': 20,\n",
+    "    'cuda': 0,\n",
+    "    'zscore_method': 'auc',\n",
+    "\n",
+    "    # Optimization settings\n",
+    "    'optimize': 1,  # Set to 1 to enable Optuna optimization\n",
+    "    'n_trials': 2,  # Number of trials for Optuna\n",
+    "\n",
+    "    # Parameters (if parameter is given a list of values, it will be considered for optimization)\n",
+    "    'batchsize': [32, 64, 128],\n",
+    "    'lr': [0.1, 0.01, 0.001],\n",
+    "    'wd': [0.0001, 0.001, 0.01],\n",
+    "    'alpha': 0.3,\n",
+    "    'genotype_hiddens': 4,\n",
+    "    'patience': 30,\n",
+    "    'delta': [0.001, 0.002, 0.003],\n",
+    "    'min_dropout_layer': 2,\n",
+    "    'dropout_fraction': 0.3,\n",
+    "\n",
+    "    # Input files\n",
+    "    'training_data': '../examples/training_data.txt',\n",
+    "    'predict_data': '../examples/test_data.txt',\n",
+    "    'gene2id': '../examples/gene2ind.txt',\n",
+    "    'cell2id': '../examples/cell2ind.txt',\n",
+    "    'mutations': '../examples/cell2mutation.txt',\n",
+    "    'cn_deletions': '../examples/cell2cndeletion.txt',\n",
+    "    'cn_amplifications': '../examples/cell2cnamplification.txt'\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38754843-3c40-4f08-a081-3fa022ada41f",
+   "metadata": {},
+   "source": [
+    "### Step 2: Save Configuration to YAML\n",
+    "We'll save the configuration to a YAML file. The training pipeline will load this file and extract parameter values and ranges."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6ad5102e-441c-4bc3-a95f-793c9580511c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import yaml\n",
+    "\n",
+    "# Save to a YAML config file\n",
+    "config_path = './vnn_config.yaml'\n",
+    "\n",
+    "with open(config_path, 'w') as f:\n",
+    "    yaml.dump(config, f, default_flow_style=False, sort_keys=False)\n",
+    "\n",
+    "print(f'Configuration saved to {config_path}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f00ed72c-33dc-42c8-82b6-df115eefa715",
+   "metadata": {},
+   "source": [
+    "### Step 3: Train the VNN Model with Optuna\n",
+    "Use **cellmaps_vnncmd.py train** and provide the config file via **--config_file**.\n",
+    "This will automatically trigger Optuna-based optimization for any parameter listed with multiple values.\n",
+    "\n",
+    "After training completes, the output folder (out_train_optuna) will contain a config.yaml file — a flattened version of the original config with the best parameters from Optuna."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "28c0456c-96b8-44b3-aa3e-6561d7d92788",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import subprocess\n",
+    "\n",
+    "train_out = 'out_train_optuna'\n",
+    "inputdir = '../examples/'\n",
+    "command = (\n",
+    "    f\"cellmaps_vnncmd.py train {train_out} --inputdir {inputdir} --config_file {config_path}\"\n",
+    ")\n",
+    "subprocess.run(command, shell=True, check=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f65a487-7fd0-45ca-bd32-f1b6b90466fc",
+   "metadata": {},
+   "source": [
+    "### Step 2: Make predictions with the Optimized Model\n",
+    "Use the saved config.yaml from training (with best parameters) to perform prediction."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "058378cd-8ae9-433b-a19b-917c3ce18993",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "test_out = './out_test'\n",
+    "new_config = f\"{train_out}/config.yaml\"\n",
+    "\n",
+    "\n",
+    "command = (\n",
+    "    f\"cellmaps_vnncmd.py predict {test_out} --inputdir {train_out} \"\n",
+    "    f\"--config_file {new_config}\"\n",
+    ")\n",
+    "subprocess.run(command, shell=True, check=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1fdcac84-fbd2-4446-9a8c-df247e182a87",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}