FABE13-HX is a high-performance C math library that delivers ultra-fast trigonometric functions (sin, cos, sincos) using advanced SIMD vectorization. Powered by the innovative Ψ-Hyperbasis algorithm, it outperforms traditional math libraries by up to 8.4× while maintaining high precision.
FABE13-HX revolutionizes trigonometric computation for:
- Machine Learning & AI Acceleration - Optimize neural network performance
- Scientific Simulations & HPC - Accelerate physics, engineering, and computational modeling
- Real-time Signal Processing - Enhance DSP, audio, and sensor data analysis
- Graphics & Visualization Systems - Improve rendering performance
- Embedded Computing - Efficient performance on resource-constrained systems
- ⚡ Up to 8.4× Faster Than Standard Math Libraries across various platforms and input sizes
- 🔄 Cross-Architecture Optimization with support for AVX512F, AVX2+FMA (x86), NEON (ARM)
- 🎯 High Precision with maximum error ≤ 2e-11 compared to standard libm
- 🧠 Novel Rational-Function Architecture based on Ψ-Hyperbasis instead of traditional polynomials
- 🔢 Extreme-Range Support accurate up to |x| ≈ 1e308 via advanced Payne–Hanek reduction
- 🧩 Unified API for both scalar and vectorized operations
- 🛡️ Robust Error Handling with proper NaN/Inf/0 behavior
Designed for numerical computing, AI acceleration, and scientific simulation, it replaces traditional polynomial approximations with a fused rational + correction model that's more efficient and vectorization-friendly.
fabe13/ # Core source
├── fabe13.c # HX implementation
├── fabe13.h # Public API
├── benchmark_fabe13.c # Benchmark main
tests/
└── test_fabe13.c # Optional unit tests
CMakeLists.txt # Cross-platform CMake
Makefile # Minimalist legacy build
build.sh # Recommended build script (cross-platform)
./build.shThis script:
- Cleans and configures the build (Release mode)
- Enables both benchmarking and testing
- Compiles using aggressive
-Ofast,-ffast-math,-march=nativeflags - Runs all unit tests and benchmarks automatically
mkdir -p build && cd build
cmake .. -DFABE13_ENABLE_BENCHMARK=ON -DFABE13_ENABLE_TEST=ON
make
./fabe13_test
./fabe13_benchmarkmake all
make run-benchmarkFABE13-HX delivers consistent speedups over standard libm, across platforms and input sizes. These benchmarks highlight its advantage for both cloud-based and local environments.
- 🟨 FABE13-HX: SIMD-accelerated (
AVX2+FMA, Ψ-core) - 🔴 libm: Standard C math (
math.h) - 🧠 Input size:
N ∈ [10 ... 1,000,000,000]doubles - ⚙️ Timing: Full-array
sincos()throughput - 📐 Aligned memory: 64 bytes
- 🎯 Accuracy: ≤ 2e-11 max diff (sin/cos)
✅ FABE13-HX is consistently faster than libm — up to 8.4× for large inputs.
- Platform: Replit Linux
- SIMD: AVX2 + FMA
- Compiler: Clang 14 (nix)
- libm: GNU
math.h
🟨 FABE13-HX outperforms libm with up to 8.4× higher throughput on AppleClang (AVX2).
- Platform: macOS 14.x (MacBook Pro 16")
- SIMD: AVX2 + FMA
- Compiler: AppleClang 16.0
- libm: macOS system
math.h
FABE13 Active Implementation: NEON (AArch64) (SIMD Width: 2)
Benchmark Alignment: 64 bytes
8.4× throughput improvement for large array processing compared to standard libm
| Array Size | FABE13 (sec) | Libm (sec) | FABE13 (M ops/sec) | Libm (M ops/sec) | Speedup |
|---|---|---|---|---|---|
| 10 | 0.0000 | 0.0000 | 50.00 | 50.00 | 1.00x |
| 100 | 0.0000 | 0.0000 | 166.67 | 71.43 | 2.33x |
| 1,000 | 0.0000 | 0.0000 | 185.19 | 72.46 | 2.56x |
| 10,000 | 0.0001 | 0.0001 | 173.01 | 71.02 | 2.44x |
| 100,000 | 0.0006 | 0.0009 | 177.12 | 115.82 | 1.53x |
| 1,000,000 | 0.0016 | 0.0072 | 614.85 | 138.34 | 4.44x |
| 10,000,000 | 0.0164 | 0.0720 | 611.30 | 138.95 | 4.40x |
| 100,000,000 | 0.1673 | 0.7296 | 597.63 | 137.07 | 4.36x |
| 1,000,000,000 | 1.8044 | 10.4989 | 554.19 | 95.25 | 5.82x |
FABE13: 0.0016 sec | 614.85 M ops/sec
libm: 0.0072 sec | 138.34 M ops/sec
Speedup: 4.44x
Memory: Allocated 0.04 GB
Peak RSS: ~29 MB (FABE13), ~45 MB (Libm)
CPU: 100.0% utilization for both implementations
Max diff vs libm: sin=1.224e-11, cos=1.225e-11
- All test cases maintain acceptable numerical accuracy compared to libm
- Maximum difference observed: ~10⁻¹¹ for both sin and cos operations
- Properly handles edge cases (0, inf, nan) with correct behavior
// Core rational transformation
Ψ(x) = x / (1 + (3/8)x²)
// sin(x) approximation
sin(x) ≈ Ψ ⋅ (1 - a1⋅Ψ² + a2⋅Ψ⁴ - a3⋅Ψ⁶)
// cos(x) approximation
cos(x) ≈ 1 - b1⋅Ψ² + b2⋅Ψ⁴ - b3⋅Ψ⁶This allows both functions to share a unified base, optimizing performance and memory access.
#include "fabe13/fabe13.h"
// Scalar API
double fabe13_sin(double x);
double fabe13_cos(double x);
double fabe13_sinc(double x); // sin(x)/x
double fabe13_tan(double x);
double fabe13_cot(double x);
double fabe13_atan(double x);
double fabe13_asin(double x); // [-1, 1]
double fabe13_acos(double x); // [-1, 1]
// SIMD vector API
void fabe13_sincos(const double* in, double* sin_out, double* cos_out, int n);- ✅ Branchless Quadrant Correction
- ✅ NaN/Inf/0-safe logic
- ✅ Prefetch-friendly & unrolled scalar fallback
- ✅ SIMD-ready backend design (NEON / AVX2 / AVX512)
- ✅ Precision-preserving range reduction
- Extended SIMD Ψ-Hyperbasis implementation (AVX2 / NEON / AVX512)
- Additional functions:
cosm1,expm1,log1pwith Ψ-Hyperbasis optimization - Single-precision
float32support (fabe13_sinf, etc.) - Ultra-fast LUT-based variants for performance-critical applications
- Language bindings for Python, Rust, and C++
- Documentation and examples for common use cases
MIT License © 2025 Faruk Alpay
See LICENSE
Faruk Alpay
https://Frontier2075.com
https://lightcap.ai
FABE13-HX is part of the Lightcap Initiative — building the most precise and elegant math primitives in open source.

