---
jupyter:
jupytext:
notebook_metadata_filter: all
text_representation:
extension: .md
format_name: markdown
format_version: '1.3'
jupytext_version: 1.14.1
kernelspec:
display_name: Python 3
language: python
name: python3
language_info:
codemirror_mode:
name: ipython
version: 3
file_extension: .py
mimetype: text/x-python
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.8.8
plotly:
description: Visualize scikit-learn's k-Nearest Neighbors (kNN) classification
in Python with Plotly.
display_as: ai_ml
language: python
layout: base
name: kNN Classification
order: 2
page_type: u-guide
permalink: python/knn-classification/
thumbnail: thumbnail/knn-classification.png
---
## Basic binary classification with kNN
This section gets us started with displaying basic binary classification using 2D data. We first show how to display training versus testing data using [various marker styles](https://plot.ly/python/marker-style/), then demonstrate how to evaluate our classifier's performance on the **test split** using a continuous color gradient to indicate the model's predicted score.
We will use [Scikit-learn](https://scikit-learn.org/) for training our model and for loading and splitting data. Scikit-learn is a popular Machine Learning (ML) library that offers various tools for creating and training ML algorithms, feature engineering, data cleaning, and evaluating and testing models. It was designed to be accessible, and to work seamlessly with popular libraries like NumPy and Pandas.
We will train a [k-Nearest Neighbors (kNN)](https://scikit-learn.org/stable/modules/neighbors.html) classifier. First, the model records the label of each training sample. Then, whenever we give it a new sample, it will look at the `k` closest samples from the training set to find the most common label, and assign it to our new sample.
### Display training and test splits
Using Scikit-learn, we first generate synthetic data that form the shape of a moon. We then split it into a training and testing set. Finally, we display the ground truth labels using [a scatter plot](https://plotly.com/python/line-and-scatter/).
In the graph, we display all the negative labels as squares, and positive labels as circles. We differentiate the training and test set by adding a dot to the center of test data.
In this example, we will use [graph objects](/python/graph-objects/), Plotly's low-level API for building figures.
```python
import plotly.graph_objects as go
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
# Load and split data
X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
X, y.astype(str), test_size=0.25, random_state=0)
trace_specs = [
[X_train, y_train, '0', 'Train', 'square'],
[X_train, y_train, '1', 'Train', 'circle'],
[X_test, y_test, '0', 'Test', 'square-dot'],
[X_test, y_test, '1', 'Test', 'circle-dot']
]
fig = go.Figure(data=[
go.Scatter(
x=X[y==label, 0], y=X[y==label, 1],
name=f'{split} Split, Label {label}',
mode='markers', marker_symbol=marker
)
for X, y, label, split, marker in trace_specs
])
fig.update_traces(
marker_size=12, marker_line_width=1.5,
marker_color="lightyellow"
)
fig.show()
```
### Visualize predictions on test split with [`plotly.express`](https://plotly.com/python/plotly-express/)
Now, we train the kNN model on the same training data displayed in the previous graph. Then, we predict the confidence score of the model for each of the data points in the test set. We will use shapes to denote the true labels, and the color will indicate the confidence of the model for assign that score.
In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures. Notice that `px.scatter` only require 1 function call to plot both negative and positive labels, and can additionally set a continuous color scale based on the `y_score` output by our kNN model.
```python
import plotly.express as px
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
# Load and split data
X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
X, y.astype(str), test_size=0.25, random_state=0)
# Fit the model on training data, predict on test data
clf = KNeighborsClassifier(15)
clf.fit(X_train, y_train)
y_score = clf.predict_proba(X_test)[:, 1]
fig = px.scatter(
X_test, x=0, y=1,
color=y_score, color_continuous_scale='RdBu',
symbol=y_test, symbol_map={'0': 'square-dot', '1': 'circle-dot'},
labels={'symbol': 'label', 'color': 'score of
first class'}
)
fig.update_traces(marker_size=12, marker_line_width=1.5)
fig.update_layout(legend_orientation='h')
fig.show()
```
## Probability Estimates with `go.Contour`
Just like the previous example, we will first train our kNN model on the training set.
Instead of predicting the conference for the test set, we can predict the confidence map for the entire area that wraps around the dimensions of our dataset. To do this, we use [`np.meshgrid`](https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html) to create a grid, where the distance between each point is denoted by the `mesh_size` variable.
Then, for each of those points, we will use our model to give a confidence score, and plot it with a [contour plot](https://plotly.com/python/contour-plots/).
In this example, we will use [graph objects](/python/graph-objects/), Plotly's low-level API for building figures.
```python
import plotly.graph_objects as go
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
mesh_size = .02
margin = 0.25
# Load and split data
X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
X, y.astype(str), test_size=0.25, random_state=0)
# Create a mesh grid on which we will run our model
x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin
y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin
xrange = np.arange(x_min, x_max, mesh_size)
yrange = np.arange(y_min, y_max, mesh_size)
xx, yy = np.meshgrid(xrange, yrange)
# Create classifier, run predictions on grid
clf = KNeighborsClassifier(15, weights='uniform')
clf.fit(X, y)
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
Z = Z.reshape(xx.shape)
# Plot the figure
fig = go.Figure(data=[
go.Contour(
x=xrange,
y=yrange,
z=Z,
colorscale='RdBu'
)
])
fig.show()
```
Now, let's try to combine our `go.Contour` plot with the first scatter plot of our data points, so that we can visually compare the confidence of our model with the true labels.
```python
import plotly.graph_objects as go
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
mesh_size = .02
margin = 0.25
# Load and split data
X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
X, y.astype(str), test_size=0.25, random_state=0)
# Create a mesh grid on which we will run our model
x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin
y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin
xrange = np.arange(x_min, x_max, mesh_size)
yrange = np.arange(y_min, y_max, mesh_size)
xx, yy = np.meshgrid(xrange, yrange)
# Create classifier, run predictions on grid
clf = KNeighborsClassifier(15, weights='uniform')
clf.fit(X, y)
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
Z = Z.reshape(xx.shape)
trace_specs = [
[X_train, y_train, '0', 'Train', 'square'],
[X_train, y_train, '1', 'Train', 'circle'],
[X_test, y_test, '0', 'Test', 'square-dot'],
[X_test, y_test, '1', 'Test', 'circle-dot']
]
fig = go.Figure(data=[
go.Scatter(
x=X[y==label, 0], y=X[y==label, 1],
name=f'{split} Split, Label {label}',
mode='markers', marker_symbol=marker
)
for X, y, label, split, marker in trace_specs
])
fig.update_traces(
marker_size=12, marker_line_width=1.5,
marker_color="lightyellow"
)
fig.add_trace(
go.Contour(
x=xrange,
y=yrange,
z=Z,
showscale=False,
colorscale='RdBu',
opacity=0.4,
name='Score',
hoverinfo='skip'
)
)
fig.show()
```
## k-NN classification in Dash
[Dash](https://plotly.com/dash/) is the best way to build analytical apps in Python using Plotly figures. To run the app below, run `pip install dash`, click "Download" to get the code and run `python app.py`.
Get started with [the official Dash docs](https://dash.plotly.com/installation) and **learn how to effortlessly [style](https://plotly.com/dash/design-kit/) & [deploy](https://plotly.com/dash/app-manager/) apps like this with Dash Enterprise.**
```python hide_code=true
from IPython.display import IFrame
snippet_url = 'https://python-docs-dash-snippets.herokuapp.com/python-docs-dash-snippets/'
IFrame(snippet_url + 'knn-classification', width='100%', height=1200)
```
Sign up for Dash Club → Free cheat sheets plus updates from Chris Parmer and Adam Schroeder delivered to your inbox every two months. Includes tips and tricks, community apps, and deep dives into the Dash architecture. Join now.