You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.rst
+89Lines changed: 89 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -731,10 +731,99 @@ Matthew correlation coefficient (MCC)
731
731
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
732
732
733
733
734
+
Compute the Matthews correlation coefficient (MCC)
735
+
736
+
The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The MCC is in essence a correlation coefficient value between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 an average random prediction and -1 an inverse prediction. The statistic is also known as the phi coefficient.
737
+
738
+
739
+
.. code:: python
740
+
741
+
from sklearn.metrics import matthews_corrcoef
742
+
y_true = [+1, +1, +1, -1]
743
+
y_pred = [+1, -1, +1, +1]
744
+
matthews_corrcoef(y_true, y_pred)
745
+
746
+
747
+
734
748
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
735
749
Receiver operating characteristics (ROC)
736
750
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
737
751
752
+
ROC curves are typically used in binary classification to study the output of a classifier. In order to extend ROC curve and ROC area to multi-class or multi-label classification, it is necessary to binarize the output. One ROC curve can be drawn per label, but one can also draw a ROC curve by considering each element of the label indicator matrix as a binary prediction (micro-averaging).
753
+
754
+
Another evaluation measure for multi-class classification is macro-averaging, which gives equal weight to the classification of each label. [`sources <http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html>`__]
755
+
756
+
.. code:: python
757
+
758
+
import numpy as np
759
+
import matplotlib.pyplot as plt
760
+
from itertools import cycle
761
+
762
+
from sklearn import svm, datasets
763
+
from sklearn.metrics import roc_curve, auc
764
+
from sklearn.model_selection import train_test_split
765
+
from sklearn.preprocessing import label_binarize
766
+
from sklearn.multiclass import OneVsRestClassifier
767
+
from scipy import interp
768
+
769
+
# Import some data to play with
770
+
iris = datasets.load_iris()
771
+
X = iris.data
772
+
y = iris.target
773
+
774
+
# Binarize the output
775
+
y = label_binarize(y, classes=[0, 1, 2])
776
+
n_classes = y.shape[1]
777
+
778
+
# Add noisy features to make the problem harder
779
+
random_state = np.random.RandomState(0)
780
+
n_samples, n_features = X.shape
781
+
X = np.c_[X, random_state.randn(n_samples, 200* n_features)]
782
+
783
+
# shuffle and split training and test sets
784
+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
0 commit comments