Skip to content

Commit 5778027

Browse files
authored
Update README.rst
1 parent 08cf42c commit 5778027

1 file changed

Lines changed: 89 additions & 0 deletions

File tree

README.rst

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -731,10 +731,99 @@ Matthew correlation coefficient (MCC)
731731
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
732732

733733

734+
Compute the Matthews correlation coefficient (MCC)
735+
736+
The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The MCC is in essence a correlation coefficient value between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 an average random prediction and -1 an inverse prediction. The statistic is also known as the phi coefficient.
737+
738+
739+
.. code:: python
740+
741+
from sklearn.metrics import matthews_corrcoef
742+
y_true = [+1, +1, +1, -1]
743+
y_pred = [+1, -1, +1, +1]
744+
matthews_corrcoef(y_true, y_pred)
745+
746+
747+
734748
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
735749
Receiver operating characteristics (ROC)
736750
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
737751

752+
ROC curves are typically used in binary classification to study the output of a classifier. In order to extend ROC curve and ROC area to multi-class or multi-label classification, it is necessary to binarize the output. One ROC curve can be drawn per label, but one can also draw a ROC curve by considering each element of the label indicator matrix as a binary prediction (micro-averaging).
753+
754+
Another evaluation measure for multi-class classification is macro-averaging, which gives equal weight to the classification of each label. [`sources <http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html>`__]
755+
756+
.. code:: python
757+
758+
import numpy as np
759+
import matplotlib.pyplot as plt
760+
from itertools import cycle
761+
762+
from sklearn import svm, datasets
763+
from sklearn.metrics import roc_curve, auc
764+
from sklearn.model_selection import train_test_split
765+
from sklearn.preprocessing import label_binarize
766+
from sklearn.multiclass import OneVsRestClassifier
767+
from scipy import interp
768+
769+
# Import some data to play with
770+
iris = datasets.load_iris()
771+
X = iris.data
772+
y = iris.target
773+
774+
# Binarize the output
775+
y = label_binarize(y, classes=[0, 1, 2])
776+
n_classes = y.shape[1]
777+
778+
# Add noisy features to make the problem harder
779+
random_state = np.random.RandomState(0)
780+
n_samples, n_features = X.shape
781+
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]
782+
783+
# shuffle and split training and test sets
784+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
785+
random_state=0)
786+
787+
# Learn to predict each class against the other
788+
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True,
789+
random_state=random_state))
790+
y_score = classifier.fit(X_train, y_train).decision_function(X_test)
791+
792+
# Compute ROC curve and ROC area for each class
793+
fpr = dict()
794+
tpr = dict()
795+
roc_auc = dict()
796+
for i in range(n_classes):
797+
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
798+
roc_auc[i] = auc(fpr[i], tpr[i])
799+
800+
# Compute micro-average ROC curve and ROC area
801+
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
802+
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])
803+
804+
805+
806+
Plot of a ROC curve for a specific class
807+
808+
809+
.. code:: python
810+
811+
plt.figure()
812+
lw = 2
813+
plt.plot(fpr[2], tpr[2], color='darkorange',
814+
lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])
815+
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
816+
plt.xlim([0.0, 1.0])
817+
plt.ylim([0.0, 1.05])
818+
plt.xlabel('False Positive Rate')
819+
plt.ylabel('True Positive Rate')
820+
plt.title('Receiver operating characteristic example')
821+
plt.legend(loc="lower right")
822+
plt.show()
823+
824+
825+
.. image:: /docs/pic/sphx_glr_plot_roc_001.png
826+
738827

739828
~~~~~~~~~~~~~~~~~~~~~~~
740829
Area under curve~(AUC)

0 commit comments

Comments
 (0)