@@ -22,6 +22,36 @@ We will discuss instance hardness in this document and explain how to use the
2222
2323Instance hardness and average precision
2424=======================================
25+ Instance hardness is defined as 1 minus the probability of the most probable class:
26+
27+ .. math ::
28+
29+ H(x) = 1 - P(\hat {y}|x)
30+
31+ In this equation :math: `H(x)` is the instance hardness for a sample with features
32+ :math: `x` and :math: `P(\hat {y}|x)` the probability of predicted label :math: `\hat {y}`
33+ given the features. If the model predicts label 0 and gives a `predict_proba ` output
34+ of [0.9, 0.1], the probability of the most probable class (0) is 0.9 and the
35+ instance hardness is 1-0.9=0.1.
36+
37+ Samples with large instance hardness have significant effect on the area under
38+ precision-recall curve, or average precision. Especially samples with label 0
39+ with large instance hardness (so the model predicts label 1) reduce the average
40+ precision a lot as these points affect the precision-recall curve in the left
41+ where the area is largest; the precision is lowered in the range of low recall
42+ and high thresholds. When doing cross validation, e.g. in case of hyperparameter
43+ tuning or recursive feature elimination, random gathering of these points in
44+ some folds introduce variance in CV results that deteriorates robustness of the
45+ cross validation task. The `InstanceHardnessCV `
46+ splitter aims to distribute the samples with large instance hardness over the
47+ folds in order to reduce undesired variance. Note that one should use this
48+ splitter to make model *selection * tasks robust like hyperparameter tuning and
49+ feature selection but not for model *performance estimation * for which you also
50+ want to know the variance of performance to be expected in production.
51+
52+
53+ Create imbalanced dataset with samples with large instance hardness
54+ ===================================================================
2555
2656Let’s start by creating a dataset to work with. We create a dataset with 5% class
2757imbalance using scikit-learn’s `make_blobs ` function.
@@ -54,6 +84,9 @@ Now we add some samples with large instance hardness
5484 :target: ./auto_examples/cross_validation/plot_instance_hardness_cv.html
5585 :align: center
5686
87+ Assess cross validation performance variance using InstanceHardnessCV splitter
88+ ==============================================================================
89+
5790Then we take a `LogisticRegressionClassifier ` and assess the cross validation
5891performance using a `StratifiedKFold ` cv splitter and the `cross_validate `
5992function.
@@ -78,6 +111,7 @@ the `InstanceHardnessCV` splitter is lower than for the `StratifiedKFold` splitt
78111 >>> plt.boxplot([skf_result[' test_score' ], ih_result[' test_score' ]],
79112 ... tick_labels= [" StratifiedKFold" , " InstanceHardnessCV" ],
80113 ... vert= False )
114+ >>> plt.xlabel(' Average precision' )
81115 >>> plt.tight_layout()
82116
83117.. image :: ./auto_examples/cross_validation/images/sphx_glr_plot_instance_hardness_cv_003.png
0 commit comments