Skip to content

Dumbo-programmer/statify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statify

A lightweight and versatile statistics library for Rust that provides essential statistical functions for data analysis.

Features

  • Descriptive Statistics: Mean, median, mode, variance, standard deviation (both sample and population)
  • Distribution Metrics: Percentiles, quartiles, interquartile range (IQR)
  • Range Statistics: Min, max, range, sum
  • Correlation Analysis: Pearson correlation coefficient and covariance
  • Normalization: Min-max normalization, standard normalization, custom range scaling
  • Linear Regression: Simple linear regression with slope, intercept, R², and predictions
  • Normal Distribution: Probability density function (PDF) and cumulative distribution function (CDF)
  • Advanced Metrics: Skewness, kurtosis, coefficient of variation, standard error
  • Standardization: Z-scores for individual values or entire datasets
  • Type Support: Works with both f64 and f32 floating-point types
  • Error Handling: Robust error handling with descriptive error types

Installation

Add this to your Cargo.toml:

[dependencies]
statify = "0.1.0"

Usage

The library extends Vec<f64> and Vec<f32> with the Stats trait, making it simple to calculate statistics on your data:

use statify::Stats;

fn main() {
    let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];
    
    // Descriptive statistics
    let mean = data.mean().unwrap();
    let median = data.median().unwrap();
    let std_dev = data.std_dev().unwrap();
    
    println!("Mean: {}", mean);
    println!("Median: {}", median);
    println!("Standard Deviation: {}", std_dev);
    
    // Percentiles and quartiles
    let q1 = data.quartile_1().unwrap();
    let q3 = data.quartile_3().unwrap();
    let iqr = data.iqr().unwrap();
    
    println!("Q1: {}, Q3: {}, IQR: {}", q1, q3, iqr);
}

Correlation and Covariance

use statify::{correlation, covariance};

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.0, 6.0, 8.0, 10.0];

let corr = correlation(&x, &y).unwrap();
let cov = covariance(&x, &y).unwrap();

println!("Correlation: {}", corr);
println!("Covariance: {}", cov);

Z-Scores

use statify::{z_score, z_scores, Stats};

// Single value z-score
let score = z_score(75.0, 50.0, 10.0).unwrap();
println!("Z-score: {}", score);

// Z-scores for entire dataset
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let scores = z_scores(&data).unwrap();
println!("Z-scores: {:?}", scores);

Normalization

use statify::{normalize_min_max, normalize_standard, normalize_range};

let data = vec![10.0, 20.0, 30.0, 40.0, 50.0];

// Min-max normalization (0 to 1)
let normalized = normalize_min_max(&data).unwrap();

// Standard normalization (z-scores)
let standardized = normalize_standard(&data).unwrap();

// Custom range normalization (-1 to 1)
let custom = normalize_range(&data, -1.0, 1.0).unwrap();

Linear Regression

use statify::linear_regression;

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.1, 3.9, 6.2, 7.8, 10.1];

let result = linear_regression(&x, &y).unwrap();

println!("Slope: {}", result.slope);
println!("Intercept: {}", result.intercept);
println!("R²: {}", result.r_squared);

// Make predictions
let prediction = result.predict(6.0);
println!("Predicted y for x=6: {}", prediction);

Normal Distribution

use statify::{normal_pdf, normal_cdf, standard_normal_pdf, standard_normal_cdf};

// Custom normal distribution (mean=100, std_dev=15)
let pdf = normal_pdf(100.0, 100.0, 15.0).unwrap();
let cdf = normal_cdf(115.0, 100.0, 15.0).unwrap();

// Standard normal distribution (mean=0, std_dev=1)
let std_pdf = standard_normal_pdf(0.0);
let std_cdf = standard_normal_cdf(1.96);

println!("Standard normal CDF at 1.96: {}", std_cdf); // ~0.975

Advanced Metrics

use statify::{skewness, kurtosis, coefficient_of_variation, standard_error};

let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];

let skew = skewness(&data).unwrap();
let kurt = kurtosis(&data).unwrap();
let cv = coefficient_of_variation(&data).unwrap();
let se = standard_error(&data).unwrap();

println!("Skewness: {}", skew);
println!("Kurtosis: {}", kurt);
println!("Coefficient of Variation: {}%", cv);
println!("Standard Error: {}", se);

API Overview

Trait Methods (Stats)

All methods return a StatsResult<T> which handles errors gracefully:

  • mean() - Arithmetic mean
  • median() - Middle value when sorted
  • mode() - Most frequent values
  • variance() - Sample variance
  • std_dev() - Sample standard deviation
  • variance_pop() - Population variance
  • std_dev_pop() - Population standard deviation
  • min() - Minimum value
  • max() - Maximum value
  • range() - Difference between max and min
  • sum() - Sum of all values
  • percentile(p) - Value at the p-th percentile
  • quartile_1() - 25th percentile
  • quartile_3() - 75th percentile
  • iqr() - Interquartile range (Q3 - Q1)

Standalone Functions

Correlation & Covariance

  • correlation(x, y) - Pearson correlation coefficient
  • covariance(x, y) - Covariance between two datasets

Normalization

  • normalize_min_max(data) - Min-max normalization (0 to 1)
  • normalize_standard(data) - Standard normalization (z-scores)
  • normalize_range(data, min, max) - Normalize to custom range

Linear Regression

  • linear_regression(x, y) - Returns LinearRegressionResult with:
    • slope - Regression line slope
    • intercept - Y-intercept
    • r_squared - Coefficient of determination
    • predict(x) - Predict y for given x
    • predict_many(x_values) - Predict multiple values

Normal Distribution

  • normal_pdf(x, mean, std_dev) - Probability density function
  • normal_cdf(x, mean, std_dev) - Cumulative distribution function
  • standard_normal_pdf(x) - Standard normal PDF (μ=0, σ=1)
  • standard_normal_cdf(x) - Standard normal CDF (μ=0, σ=1)

Standardization

  • z_score(value, mean, std_dev) - Standard score for a single value
  • z_scores(data) - Standard scores for all values in a dataset

Advanced Metrics

  • standard_error(data) - Standard error of the mean
  • coefficient_of_variation(data) - CV expressed as percentage
  • skewness(data) - Measure of distribution asymmetry
  • kurtosis(data) - Measure of distribution tailedness (excess kurtosis)

Error Handling

The library uses a custom StatsError enum for error handling:

  • EmptyDataset - Dataset is empty
  • InsufficientData - Not enough data for the operation
  • DivisionByZero - Division by zero would occur

All statistical functions return StatsResult<T> which is a Result<T, StatsError>.

License

MIT

Contributing

Contributions are welcome. Please ensure tests pass before submitting pull requests.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages