Introduction
This blog post introduces and exemplifies the Python package “OutlierIdentifiers” which provides 1D outlier identifier functions. It follows closely the Wolfram Language (WL) paclet [AAp1], the R package [AAp2], and the Raku package [AAp3].
Remark: Since I use those outlier identifiers in R, Raku, and WL, a lot and often, I have to have that package in order to transfer to Python different statistical or machine learning workflows created in R, Raku, or WL.
Installation
From PyPI.org:
python3 -m pip install OutlierIdentifiers
From GitHub:
python3 -m pip install git+https://github.com/antononcube/Python-packages.git#egg=OutlierIdentifiers\&subdirectory=OutlierIdentifiers
Usage examples
Load packages:
import numpy as np
import plotly.graph_objects as go
from OutlierIdentifiers import *
Generate a vector with random numbers:
np.random.seed(148)
vec = np.random.normal(loc=10, scale=20, size=50)
print(vec)
[-11.72170904 14.55374553 47.75335493 36.87806789 12.69444889
8.2250113 20.83029617 44.23448925 -18.65374135 23.93151423
-3.97345704 -19.05099802 0.87310981 -10.56871239 -29.7677599
48.80181962 45.55051758 3.00608296 12.08663517 80.52839423
-8.21300671 -24.80501442 17.67287628 -14.28033884 5.31536862
23.47504393 39.11579282 18.77033001 41.99179563 18.45360056
33.33802297 6.29308271 6.20961175 13.44694737 -1.2817423
18.23874752 5.91890326 36.85941897 17.55470851 35.89537439
54.16304716 24.50380733 11.14757566 -1.89050164 -11.59280058
26.75050328 12.29007492 -7.9674614 22.91433048 24.18794845]
Plot the vector:
# Create a scatter plot with markers
fig = go.Figure(data=go.Scatter(y=vec, mode='markers', name='data'))
# Add labels and title
fig.update_layout(title='Vector of numbers', xaxis_title='Index', yaxis_title='Value', template = "plotly", width=800, height=600)
# Display the plot
fig.show()

Find outlier positions:
outlier_identifier(vec, identifier=hampel_identifier_parameters)
array([ True, False, True, True, False, False, False, True, True,
False, False, True, False, True, True, True, True, False,
False, True, True, True, False, True, False, False, True,
False, True, False, False, False, False, False, False, False,
False, True, False, True, True, False, False, False, True,
False, False, True, False, False])
Find outlier values:
outlier_identifier(vec, identifier=hampel_identifier_parameters, value = True)
array([-11.72170904, 47.75335493, 36.87806789, 44.23448925,
-18.65374135, -19.05099802, -10.56871239, -29.7677599 ,
48.80181962, 45.55051758, 80.52839423, -8.21300671,
-24.80501442, -14.28033884, 39.11579282, 41.99179563,
36.85941897, 35.89537439, 54.16304716, -11.59280058,
-7.9674614 ])
Find top outlier positions and values:
outlier_identifier(vec, identifier = lambda v: top_outliers(hampel_identifier_parameters(v)))
array([False, False, True, True, False, False, False, True, False,
False, False, False, False, False, False, True, True, False,
False, True, False, False, False, False, False, False, True,
False, True, False, False, False, False, False, False, False,
False, True, False, True, True, False, False, False, False,
False, False, False, False, False])
outlier_identifier(vec, identifier = lambda v: top_outliers(hampel_identifier_parameters(v)), value=True)
array([47.75335493, 36.87806789, 44.23448925, 48.80181962, 45.55051758,
80.52839423, 39.11579282, 41.99179563, 36.85941897, 35.89537439,
54.16304716])
Find bottom outlier positions and values (using quartiles-based identifier):
# R-style positions
pred = outlier_identifier(vec, identifier = lambda v: bottom_outliers(quartile_identifier_parameters(v)))
pred
array([False, False, False, False, False, False, False, False, True,
False, False, True, False, False, True, False, False, False,
False, False, False, True, False, True, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False])
# Values
vec_outliers = outlier_identifier(vec, identifier = lambda v: bottom_outliers(quartile_identifier_parameters(v)), value=True)
vec_outliers
array([-18.65374135, -19.05099802, -29.7677599 , -24.80501442,
-14.28033884])
Here is another way to get the outlier values:
vec[pred]
array([-18.65374135, -19.05099802, -29.7677599 , -24.80501442,
-14.28033884])
If position indexes are needed (instead of True/False vector) then outlier_position can be used:
outlier_indexes = outlier_position(vec, identifier = lambda v: bottom_outliers(quartile_identifier_parameters(v)))
outlier_indexes
array([ 8, 11, 14, 21, 23])
Here is a plot of the data and found outliers:
# Create a scatter plot with markers
fig = go.Figure(data=go.Scatter(y=vec, mode='markers', name='data'))
# Add labels and title
fig.update_layout(title='Vector of numbers and outliers', xaxis_title='Index', yaxis_title='Value', template = "plotly", width=800, height=400)
# Find outliers positions and values
vec_outlier_indexes = outlier_position(vec, identifier=quartile_identifier_parameters)
vec_outlier_values = outlier_identifier(vec, identifier=quartile_identifier_parameters, value = True)
# Add outlier trace
fig.add_trace(go.Scatter(x=vec_outlier_indexes, y=vec_outlier_values, mode="markers", name="outliers"))
# Display the plot
fig.show()

The available outlier parameters functions are:
hampel_identifier_parameterssplus_quartile_identifier_parametersquartile_identifier_parameters
[ f(vec) for f in (hampel_identifier_parameters, splus_quartile_identifier_parameters, quartile_identifier_parameters)]
[(-7.059486458286688, 35.060179357392244),
(-41.140817116725, 66.58661713715973),
(-12.931512113918409, 40.93220501302396)]
References
[AA1] Anton Antonov, “Outlier detection in a list of numbers”, (2013), MathematicaForPrediction at WordPress.
[AAp1] Anton Antonov, OutlierIdentifiers WL paclet, (2023), Wolfram Language Paclet Repository.
[AAp2] Anton Antonov, OutlierIdentifiers R package, (2019), R-packages at GitHub/antononcube.
[AAp3] Anton Antonov, OutlierIdentifiers Raku package, (2022), GitHub/antononcube.