diff --git a/notebooks/risk_control/theoretical_validity_tests.ipynb b/notebooks/risk_control/theoretical_validity_tests.ipynb
index ef4af5e18..27bc94772 100644
--- a/notebooks/risk_control/theoretical_validity_tests.ipynb
+++ b/notebooks/risk_control/theoretical_validity_tests.ipynb
@@ -1,106 +1,105 @@
 {
  "cells": [
   {
-   "metadata": {},
    "cell_type": "markdown",
-   "source": "# Binary classification risk control - Theoretical tests to validate implementation",
-   "id": "ed592eb3f8989aa8"
+   "id": "0",
+   "metadata": {},
+   "source": [
+    "# Binary classification risk control - Theoretical tests to validate implementation"
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "markdown",
+   "id": "1",
+   "metadata": {},
    "source": [
     "# Protocol description\n",
-    "Testing theoretical guarantees of risk control in binary classification using a random classifier and synthetic data.\n",
+    "We test the theoretical guarantees of risk control in binary classification by using a logistic classifier and synthetic data. The aim is to evaluate the effectiveness of the BinaryClassificationController in maintaining a predefined risk level under different conditions.\n",
     "\n",
-    "Each test case looks at a combination of parameters, for which we repeat the experiment `n_repeat` times. The model is the same for all experiments (basically a random classifier), but the data is different each time.\n",
+    "Each test case corresponds to a unique set of parameters. We repeat the experiment `n_repeat` times for each combination. The model remains the same across experiments, while the data is resampled for each repetition to account for variability.\n",
     "\n",
-    "Each experiment consists of the following:\n",
-    " - We calibrate a BinaryClassificationController. It gives us the list of lambda values that control the risk according to LTT.\n",
-    " - Because we know that the model is random, we know the theoretical risk associated with each lambda value. So we are able to check if the lambda values given by LTT actually control the risk. If not, we count 1 \"error\". Note that *each* lambda value should control the risk, not just one of them.\n",
+    "Each experiment consists of the following steps:  \n",
     "\n",
-    "After n_repeat experiments, we compute the proportion of errors, that should be less than delta (1 - confidence_level).\n",
+    "- **Calibrate the controller**  \n",
+    "  - We use a **BinaryClassificationController**, which provides a list of lambda values intended to control the risk according to **LTT**.  \n",
+    "\n",
+    "- **Verify risk control**  \n",
+    "  - Since the model is a known logistic model, we can compute the **theoretical risk** associated with each lambda value.  \n",
+    "  - We then check whether each lambda value from LTT actually controls the risk.  \n",
+    "  - If a lambda does not meet the risk guarantee, we count **one \"error\"**.  \n",
+    "  - **Note:** every lambda value must individually control the risk — it is not enough for only some to succeed.  \n",
+    "\n",
+    "After repeating the experiment `n_repeat` times, we calculate the **proportion of errors**, which should remain below `delta` = 1 - `confidence_level`.\n",
     "\n",
     "# Results\n",
-    "The risk is controlled in all the test cases. Overall, LTT seems very conservative (to achieve a high percentage of errors, we need to lower the confidence level significantly (0.01) and use only one threshold to avoid the Bonferroni effect). But this is likely due to the model being random, and thus having a lot of variance. It would be interesting to see how this evolves with a better model."
-   ],
-   "id": "8c1746b673c148dd"
+    "***TBC***"
+   ]
   },
   {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-09-15T16:21:19.107147Z",
-     "start_time": "2025-09-15T16:21:19.071278Z"
-    }
-   },
    "cell_type": "code",
+   "execution_count": null,
+   "id": "2",
+   "metadata": {},
+   "outputs": [],
    "source": [
     "%reload_ext autoreload\n",
     "%autoreload 2"
-   ],
-   "id": "9b1422ae620955fd",
-   "outputs": [],
-   "execution_count": 1
+   ]
   },
   {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-09-15T16:21:20.596927Z",
-     "start_time": "2025-09-15T16:21:19.127705Z"
-    }
-   },
    "cell_type": "code",
+   "execution_count": 1,
+   "id": "3",
+   "metadata": {},
+   "outputs": [],
    "source": [
     "from sklearn.datasets import make_classification\n",
+    "from sklearn.metrics import precision_score, recall_score, accuracy_score\n",
+    "from sklearn.utils import check_random_state\n",
     "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
     "from mapie.risk_control import precision, accuracy, recall, BinaryClassificationController\n",
     "from itertools import product\n",
     "from decimal import Decimal"
-   ],
-   "id": "faeb2f47a92dbf35",
-   "outputs": [],
-   "execution_count": 2
+   ]
   },
   {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-09-15T16:21:20.802766Z",
-     "start_time": "2025-09-15T16:21:20.652168Z"
-    }
-   },
    "cell_type": "code",
+   "execution_count": 2,
+   "id": "4",
+   "metadata": {},
+   "outputs": [],
    "source": [
-    "# Using sklearn.dummy.DummyClassifier would be cleaner\n",
-    "class RandomClassifier:\n",
-    "    def __init__(self, seed=42, threshold=0.5):\n",
-    "        self.seed = seed\n",
+    "# Define a simple logistic classifier\n",
+    "class LogisticClassifier:\n",
+    "    \"\"\"Deterministic sigmoid-based binary classifier.\"\"\"\n",
+    "\n",
+    "    def __init__(self, scale=1.0, threshold=0.5):\n",
+    "        self.scale = scale\n",
     "        self.threshold = threshold\n",
     "\n",
     "    def _get_prob(self, x):\n",
-    "        local_seed = hash((x, self.seed)) % (2**32)\n",
-    "        rng = np.random.RandomState(local_seed)\n",
-    "        return np.round(rng.rand(), 2)\n",
+    "        \"\"\"Probability of class 1 for input x.\"\"\"\n",
+    "        inf_, sup_ = 0.1, 1.0\n",
+    "        return (sup_ - inf_) / (1 + np.exp(-self.scale * x)) + inf_\n",
     "\n",
     "    def predict_proba(self, X):\n",
+    "        \"\"\"Return probabilities [p(y=0), p(y=1)] for each sample in X.\"\"\"\n",
     "        probs = np.array([self._get_prob(x) for x in X])\n",
     "        return np.vstack([1 - probs, probs]).T\n",
     "\n",
     "    def predict(self, X):\n",
+    "        \"\"\"Return predicted class labels based on threshold.\"\"\"\n",
     "        probs = self.predict_proba(X)[:, 1]\n",
     "        return (probs >= self.threshold).astype(int)"
-   ],
-   "id": "eefafd6d1697fb9c",
-   "outputs": [],
-   "execution_count": 3
+   ]
   },
   {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-09-15T16:21:54.095452Z",
-     "start_time": "2025-09-15T16:21:20.810388Z"
-    }
-   },
    "cell_type": "code",
+   "execution_count": null,
+   "id": "6",
+   "metadata": {},
+   "outputs": [],
    "source": [
     "N = [100, 5]  # size of the calibration set\n",
     "risk = [\n",
@@ -120,7 +119,7 @@
     "    alpha = float(Decimal(\"1\") - Decimal(str(target_level))) # to avoid floating point issues\n",
     "    delta = float(Decimal(\"1\") - Decimal(str(confidence_level))) # to avoid floating point issues\n",
     "\n",
-    "    clf = RandomClassifier()\n",
+    "    clf = LogisticClassifier(scale=2.0, threshold=0.5)\n",
     "    nb_errors = 0  # number of iterations where the risk is not controlled (i.e., not all the valid thresholds found by LTT are actually valid)\n",
     "    total_nb_valid_params = 0\n",
     "\n",
@@ -147,19 +146,48 @@
     "            confidence_level=confidence_level,\n",
     "        )\n",
     "        controller._predict_params = predict_params\n",
-    "        controller.calibrate(X_calibrate, y_calibrate)\n",
+    "        controller = controller.calibrate(X_calibrate, y_calibrate)\n",
     "        valid_parameters = controller.valid_predict_params\n",
     "        total_nb_valid_params += len(valid_parameters)\n",
     "\n",
     "        # In the following, we check that all the valid thresholds found by LTT actually control the risk.\n",
-    "        # Instead of sampling a large test set, we use the fact that we know the theoretical risk of a random classifier.\n",
-    "        # The calculations here are valid only for a balanced data generator.\n",
-    "        if risk[\"risk\"] == precision or risk[\"risk\"] == accuracy:\n",
-    "            if target_level > 0.5 and len(valid_parameters) >= 1:\n",
-    "                nb_errors += 1\n",
-    "        elif risk[\"risk\"] == recall:\n",
-    "            if any(x > alpha for x in valid_parameters) and len(valid_parameters) >= 1:\n",
-    "                nb_errors += 1\n",
+    "        # We sample a large test set and estimate the risk for each valid_parameters using the logistic classifier.\n",
+    "        X_test, y_test = make_classification(\n",
+    "            n_samples=100,\n",
+    "            n_features=1,\n",
+    "            n_informative=1,\n",
+    "            n_redundant=0,\n",
+    "            n_repeated=0,\n",
+    "            n_classes=2,\n",
+    "            n_clusters_per_class=1,\n",
+    "            weights=[0.5, 0.5],\n",
+    "            flip_y=0,\n",
+    "            random_state=None\n",
+    "        )\n",
+    "        X_test = X_test.squeeze()\n",
+    "        probs = clf.predict_proba(X_test)[:, 1]\n",
+    "        \n",
+    "        # If no valid parameters found, risk is not controlled\n",
+    "        if len(valid_parameters) >= 1:\n",
+    "            for lambda_ in valid_parameters:\n",
+    "                y_pred = (probs >= lambda_).astype(int)\n",
+    "\n",
+    "                if risk[\"risk\"] == precision:\n",
+    "                    empirical_metric = precision_score(y_test, y_pred, zero_division=0)\n",
+    "                elif risk[\"risk\"] == recall:\n",
+    "                    empirical_metric = recall_score(y_test, y_pred, zero_division=0)\n",
+    "                elif risk[\"risk\"] == accuracy:\n",
+    "                    empirical_metric = accuracy_score(y_test, y_pred)\n",
+    "\n",
+    "                # Check if the risk control fails\n",
+    "                if risk[\"risk\"].higher_is_better:\n",
+    "                    if empirical_metric <= target_level:\n",
+    "                        nb_errors += 1\n",
+    "                        break \n",
+    "                else:\n",
+    "                    if empirical_metric > target_level:\n",
+    "                        nb_errors += 1\n",
+    "                        break\n",
     "\n",
     "    print(f\"\\n{N=}, {risk['name']=}, {len(predict_params)=}, {target_level=}, {confidence_level=}\")\n",
     "\n",
@@ -178,337 +206,7 @@
     "    print(\"Some experiments failed.\")\n",
     "else:\n",
     "    print(\"All good!\")"
-   ],
-   "id": "1fdffae392bb7a65",
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "N=100, risk['name']='precision', len(predict_params)=100, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 90\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='precision', len(predict_params)=100, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 92\n",
-      "Valid experiment\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/vlaurent/code/pro/MAPIE/MAPIE/mapie/risk_control.py:891: UserWarning: No predict parameters were found to control the risk at the given target and confidence levels. Try using a larger calibration set or a better model.\n",
-      "  warnings.warn(\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "N=100, risk['name']='precision', len(predict_params)=100, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='precision', len(predict_params)=100, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='precision', len(predict_params)=1, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 1\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='precision', len(predict_params)=1, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 1\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='precision', len(predict_params)=1, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='precision', len(predict_params)=1, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='recall', len(predict_params)=100, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 74\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='recall', len(predict_params)=100, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.01\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 78\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='recall', len(predict_params)=100, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='recall', len(predict_params)=100, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 3\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='recall', len(predict_params)=1, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 1\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='recall', len(predict_params)=1, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 1\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='recall', len(predict_params)=1, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='recall', len(predict_params)=1, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='accuracy', len(predict_params)=100, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 100\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='accuracy', len(predict_params)=100, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 100\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='accuracy', len(predict_params)=100, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='accuracy', len(predict_params)=100, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='accuracy', len(predict_params)=1, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 1\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='accuracy', len(predict_params)=1, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 1\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='accuracy', len(predict_params)=1, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=100, risk['name']='accuracy', len(predict_params)=1, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='precision', len(predict_params)=100, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='precision', len(predict_params)=100, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='precision', len(predict_params)=100, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='precision', len(predict_params)=100, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='precision', len(predict_params)=1, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='precision', len(predict_params)=1, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 1\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='precision', len(predict_params)=1, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='precision', len(predict_params)=1, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='recall', len(predict_params)=100, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='recall', len(predict_params)=100, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='recall', len(predict_params)=100, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='recall', len(predict_params)=100, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='recall', len(predict_params)=1, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 1\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='recall', len(predict_params)=1, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 1\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='recall', len(predict_params)=1, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='recall', len(predict_params)=1, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='accuracy', len(predict_params)=100, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 16\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='accuracy', len(predict_params)=100, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 15\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='accuracy', len(predict_params)=100, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='accuracy', len(predict_params)=100, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='accuracy', len(predict_params)=1, target_level=0.1, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 1\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='accuracy', len(predict_params)=1, target_level=0.1, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 1\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='accuracy', len(predict_params)=1, target_level=0.9, confidence_level=0.8\n",
-      "Proportion of times the risk is not controlled: 0.0\n",
-      "Delta: 0.2\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "N=5, risk['name']='accuracy', len(predict_params)=1, target_level=0.9, confidence_level=0.2\n",
-      "Proportion of times the risk is not controlled: 0.02\n",
-      "Delta: 0.8\n",
-      "Mean number of valid thresholds found per iteration: 0\n",
-      "Valid experiment\n",
-      "\n",
-      "\n",
-      "\n",
-      "\n",
-      "All good!\n"
-     ]
-    }
-   ],
-   "execution_count": 4
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-09-15T16:22:49.189292Z",
-     "start_time": "2025-09-15T16:22:48.871875Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "assert not invalid_experiment",
-   "id": "4c3f437f0b2897a1",
-   "outputs": [],
-   "execution_count": 5
+   ]
   }
  ],
  "metadata": {
@@ -516,7 +214,7 @@
    "provenance": []
   },
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": ".venv-dev",
    "language": "python",
    "name": "python3"
   },
@@ -530,7 +228,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.17"
+   "version": "3.10.18"
   }
  },
  "nbformat": 4,