|
| 1 | +# SochDB Competitive Benchmarks |
| 2 | + |
| 3 | +This directory contains comprehensive benchmarks comparing SochDB against major vector database competitors. |
| 4 | + |
| 5 | +## Quick Start |
| 6 | + |
| 7 | +```bash |
| 8 | +# Install dependencies |
| 9 | +pip install chromadb qdrant-client lancedb faiss-cpu python-dotenv openai |
| 10 | + |
| 11 | +# Run the ultimate showdown |
| 12 | +python benchmarks/ultimate_showdown.py |
| 13 | + |
| 14 | +# Run real embedding demo (requires Azure OpenAI) |
| 15 | +python benchmarks/real_search_demo.py |
| 16 | +``` |
| 17 | + |
| 18 | +## Benchmark Scripts |
| 19 | + |
| 20 | +### 1. `ultimate_showdown.py` - Comprehensive Comparison |
| 21 | +Tests SochDB against all available competitors: |
| 22 | +- **ChromaDB** - Python-based, simple embedded database |
| 23 | +- **Qdrant** - Rust-based with excellent filtering |
| 24 | +- **FAISS** - Facebook's C++ library (no persistence) |
| 25 | +- **LanceDB** - Columnar embedded database |
| 26 | + |
| 27 | +Dimensions tested: 384 (MiniLM), 768 (BERT), 1536 (OpenAI) |
| 28 | + |
| 29 | +### 2. `real_search_demo.py` - Real Embedding Demo |
| 30 | +Demonstrates semantic search using actual Azure OpenAI embeddings. Requires `.env` with: |
| 31 | +``` |
| 32 | +AZURE_OPENAI_API_KEY=your-key |
| 33 | +AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/ |
| 34 | +``` |
| 35 | + |
| 36 | +### 3. `competitive_benchmark.py` - Full Competitive Suite |
| 37 | +Extensive benchmark with real embeddings across multiple test sizes. |
| 38 | + |
| 39 | +### 4. `rag_benchmark.py` - RAG-Realistic Workloads |
| 40 | +Simulates actual RAG (Retrieval-Augmented Generation) workloads: |
| 41 | +- Document ingestion |
| 42 | +- Semantic search |
| 43 | +- Batch queries (concurrent users) |
| 44 | +- Filtered search |
| 45 | +- Memory usage |
| 46 | + |
| 47 | +### 5. `feature_benchmark.py` - Feature Differentiators |
| 48 | +Tests SochDB's unique features: |
| 49 | +- All commercial embedding dimensions (128-3072) |
| 50 | +- Concurrent read/write access |
| 51 | +- Batch operation efficiency |
| 52 | +- Real embedding performance |
| 53 | + |
| 54 | +## Expected Results |
| 55 | + |
| 56 | +Based on testing, SochDB provides: |
| 57 | + |
| 58 | +| Metric | SochDB | ChromaDB | Qdrant | FAISS | LanceDB | |
| 59 | +|--------|--------|----------|--------|-------|---------| |
| 60 | +| Insert (vec/s) | 2,000-10,000 | 3,000-5,000 | 5,000-10,000 | 50,000+ | 15,000+ | |
| 61 | +| Search p50 | 0.3-0.5ms | 1-2ms | 0.5-1ms | 0.1-0.2ms | 5-10ms | |
| 62 | +| Filtering | ✅ | ✅ | ✅ | ❌ | ✅ | |
| 63 | +| Embedded | ✅ | ✅ | ❌ | ✅ | ✅ | |
| 64 | +| SQL Interface | ✅ | ❌ | ❌ | ❌ | ❌ | |
| 65 | + |
| 66 | +## SochDB Advantages |
| 67 | + |
| 68 | +1. **🚀 Rust-Native Performance** - SIMD-accelerated distance calculations (NEON/AVX2) |
| 69 | +2. **📦 Truly Embedded** - No server required, like SQLite for vectors |
| 70 | +3. **🔢 All Dimensions** - Supports 128-3072 (MiniLM to OpenAI text-embedding-3-large) |
| 71 | +4. **💾 SQL Interface** - Query vectors with familiar SQL syntax |
| 72 | +5. **🔒 MVCC Transactions** - Safe concurrent reads and writes |
| 73 | +6. **🕸️ Graph + Vector** - Hybrid knowledge graph + semantic search |
| 74 | +7. **🐍 Python Simplicity** - Native Python bindings via FFI |
| 75 | + |
| 76 | +## Competitors Overview |
| 77 | + |
| 78 | +| Database | Type | Best For | Limitations | |
| 79 | +|----------|------|----------|-------------| |
| 80 | +| **Pinecone** | Cloud | Managed simplicity | Cloud-only, cost | |
| 81 | +| **Weaviate** | Server | Hybrid search | Requires server | |
| 82 | +| **Milvus** | Distributed | Large scale | Complexity | |
| 83 | +| **Qdrant** | Server | Filtering | Requires server | |
| 84 | +| **ChromaDB** | Embedded | Simple Python | Slower performance | |
| 85 | +| **FAISS** | Library | Raw speed | No persistence | |
| 86 | +| **LanceDB** | Embedded | Analytics | Slower search | |
| 87 | +| **pgvector** | Extension | PostgreSQL users | Limited scale | |
| 88 | +| **SochDB** | Embedded | AI/ML apps | Feature-rich | |
| 89 | + |
| 90 | +## Running Benchmarks |
| 91 | + |
| 92 | +```bash |
| 93 | +# Full competitive analysis |
| 94 | +cd sochdb-python-sdk |
| 95 | +python benchmarks/ultimate_showdown.py |
| 96 | + |
| 97 | +# Real embeddings (requires Azure OpenAI) |
| 98 | +python benchmarks/real_search_demo.py |
| 99 | + |
| 100 | +# RAG-realistic workloads |
| 101 | +python benchmarks/rag_benchmark.py |
| 102 | + |
| 103 | +# Feature tests |
| 104 | +python benchmarks/feature_benchmark.py |
| 105 | +``` |
| 106 | + |
| 107 | +## Environment Setup |
| 108 | + |
| 109 | +For real embedding benchmarks, create `.env` in the project root: |
| 110 | + |
| 111 | +```env |
| 112 | +AZURE_OPENAI_API_KEY=your-key |
| 113 | +AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/ |
| 114 | +AZURE_OPENAI_API_VERSION=2024-12-01-preview |
| 115 | +``` |
| 116 | + |
| 117 | +## Results |
| 118 | + |
| 119 | +Results are saved to: |
| 120 | +- `showdown_results.json` - Ultimate showdown results |
| 121 | +- `benchmark_results.json` - Competitive benchmark results |
| 122 | +- `rag_benchmark_results.json` - RAG benchmark results |
| 123 | +- `feature_benchmark_results.json` - Feature benchmark results |
| 124 | + |
| 125 | +--- |
| 126 | + |
| 127 | +## 📊 Industry-Standard Performance Metrics |
| 128 | + |
| 129 | +Based on **ANN-Benchmarks** (ann-benchmarks.com), **VectorDBBench** (Zilliz), and **Qdrant Benchmarks**: |
| 130 | + |
| 131 | +### Primary Metrics (Required for Credible Benchmarks) |
| 132 | + |
| 133 | +| Metric | Definition | Why It Matters | |
| 134 | +|--------|------------|----------------| |
| 135 | +| **Recall@k** | Fraction of true k-nearest neighbors found | Measures search accuracy - the most critical metric | |
| 136 | +| **QPS (Queries Per Second)** | Number of queries processed per second | Raw throughput for parallel workloads | |
| 137 | +| **Latency p50/p95/p99** | Response time percentiles | User-perceived performance | |
| 138 | +| **Index Build Time** | Time to construct the HNSW index | Critical for data ingestion pipelines | |
| 139 | +| **Index Size (Memory)** | RAM required for the index | Cost and scalability factor | |
| 140 | + |
| 141 | +### Recall vs QPS Tradeoff (The Gold Standard) |
| 142 | + |
| 143 | +> **"The speed of vector databases should only be compared if they achieve the same precision."** |
| 144 | +> — Qdrant Benchmarks |
| 145 | +
|
| 146 | +ANN search is fundamentally about trading **precision for speed**. Any benchmark comparing two systems must use the **same recall threshold** (typically 0.95 or 0.99). |
| 147 | + |
| 148 | +``` |
| 149 | +Recall@10 = (# of true neighbors in results) / 10 |
| 150 | +``` |
| 151 | + |
| 152 | +### Standard Benchmark Datasets (ANN-Benchmarks) |
| 153 | + |
| 154 | +| Dataset | Vectors | Dimensions | Distance | Use Case | |
| 155 | +|---------|---------|------------|----------|----------| |
| 156 | +| **SIFT-1M** | 1,000,000 | 128 | Euclidean | Classic image descriptors | |
| 157 | +| **GloVe-100** | 1,200,000 | 100 | Cosine | Word embeddings | |
| 158 | +| **Fashion-MNIST** | 60,000 | 784 | Euclidean | Image classification | |
| 159 | +| **GIST-960** | 1,000,000 | 960 | Euclidean | Scene recognition | |
| 160 | +| **DBpedia-OpenAI-1M** | 1,000,000 | 1536 | Cosine | Real OpenAI embeddings | |
| 161 | +| **Deep-Image-96** | 10,000,000 | 96 | Cosine | Large-scale images | |
| 162 | + |
| 163 | +### VectorDBBench Scenarios |
| 164 | + |
| 165 | +VectorDBBench (github.com/zilliztech/VectorDBBench) tests: |
| 166 | + |
| 167 | +| Case Type | Vectors | Dimensions | Purpose | |
| 168 | +|-----------|---------|------------|---------| |
| 169 | +| Performance768D1M | 1M | 768 | BERT-class embeddings | |
| 170 | +| Performance768D10M | 10M | 768 | Scale test | |
| 171 | +| Performance1536D500K | 500K | 1536 | OpenAI embeddings | |
| 172 | +| Performance1536D5M | 5M | 1536 | Large OpenAI scale | |
| 173 | +| CapacityDim128 | Max | 128 | Stress test (SIFT) | |
| 174 | +| CapacityDim960 | Max | 960 | Stress test (GIST) | |
| 175 | + |
| 176 | +### Latency Percentiles Explained |
| 177 | + |
| 178 | +| Percentile | Meaning | Target | |
| 179 | +|------------|---------|--------| |
| 180 | +| **p50 (median)** | Half of requests faster than this | < 1ms | |
| 181 | +| **p95** | 95% of requests faster than this | < 5ms | |
| 182 | +| **p99** | 99% of requests faster than this | < 10ms | |
| 183 | +| **p999** | 99.9% (tail latency) | < 50ms | |
| 184 | + |
| 185 | +High p99/p999 indicates **tail latency issues** that affect user experience. |
| 186 | + |
| 187 | +### HNSW Index Parameters |
| 188 | + |
| 189 | +| Parameter | Effect on Recall | Effect on Speed | Effect on Memory | |
| 190 | +|-----------|------------------|-----------------|------------------| |
| 191 | +| **M** (connections) | ↑ M = ↑ Recall | ↑ M = ↓ Speed | ↑ M = ↑ Memory | |
| 192 | +| **ef_construction** | ↑ ef = ↑ Recall | ↑ ef = ↓ Build | No effect | |
| 193 | +| **ef_search** | ↑ ef = ↑ Recall | ↑ ef = ↓ QPS | No effect | |
| 194 | + |
| 195 | +Typical configurations: |
| 196 | +- **High Recall (0.99+)**: M=32, ef_construction=256, ef_search=256 |
| 197 | +- **Balanced (0.95-0.98)**: M=16, ef_construction=128, ef_search=100 |
| 198 | +- **High Speed (0.90-0.95)**: M=8, ef_construction=64, ef_search=50 |
| 199 | + |
| 200 | +### Benchmark Methodology (Best Practices) |
| 201 | + |
| 202 | +1. **Same Hardware**: All systems must run on identical hardware |
| 203 | +2. **Same Dataset**: Use standard datasets (SIFT, GloVe, DBpedia) |
| 204 | +3. **Same Recall**: Only compare at equivalent precision thresholds |
| 205 | +4. **Warm Cache**: Run warmup queries before measurement |
| 206 | +5. **Multiple Runs**: Report median of 5+ runs |
| 207 | +6. **Separate Client/Server**: Use different machines for client and server (if applicable) |
| 208 | + |
| 209 | +### Reference Hardware (VectorDBBench Standard) |
| 210 | + |
| 211 | +``` |
| 212 | +Client: 8 vCPUs, 16 GB RAM (Azure Standard D8ls v5) |
| 213 | +Server: 8 vCPUs, 32 GB RAM (Azure Standard D8s v3) |
| 214 | +CPU: Intel Xeon Platinum 8375C @ 2.90GHz |
| 215 | +Memory Limit: 25 GB (to ensure fairness) |
| 216 | +``` |
| 217 | + |
| 218 | +### How to Interpret Results |
| 219 | + |
| 220 | +#### Good Benchmark Report Shows: |
| 221 | +✅ Recall@k vs QPS curves (the gold standard chart) |
| 222 | +✅ Multiple precision thresholds (0.90, 0.95, 0.99) |
| 223 | +✅ Latency percentiles (p50, p95, p99) |
| 224 | +✅ Index build time and memory usage |
| 225 | +✅ Dataset and hardware specifications |
| 226 | + |
| 227 | +#### Red Flags in Benchmarks: |
| 228 | +❌ No recall measurement (speed without accuracy is meaningless) |
| 229 | +❌ Single data point (no precision/speed tradeoff shown) |
| 230 | +❌ Unknown or unreproducible hardware |
| 231 | +❌ Proprietary datasets |
| 232 | + |
| 233 | +--- |
| 234 | + |
| 235 | +## 🏆 SochDB Performance Targets |
| 236 | + |
| 237 | +Based on industry benchmarks, SochDB targets: |
| 238 | + |
| 239 | +| Metric | Target | Compared To | |
| 240 | +|--------|--------|-------------| |
| 241 | +| Recall@10 | ≥ 0.95 | Standard ANN threshold | |
| 242 | +| QPS (single-thread) | ≥ 1,000 | ChromaDB baseline | |
| 243 | +| Latency p50 | < 1ms | Qdrant/Milvus class | |
| 244 | +| Latency p99 | < 10ms | Production-ready | |
| 245 | +| Index Build | < 60s/1M vectors | Competitive | |
| 246 | +| Memory | < 2x raw vector size | Efficient | |
| 247 | + |
| 248 | +### Distance Metrics Supported |
| 249 | + |
| 250 | +| Metric | Formula | Use Case | |
| 251 | +|--------|---------|----------| |
| 252 | +| **Cosine** | 1 - (a·b / \|a\|\|b\|) | Text embeddings (default) | |
| 253 | +| **Euclidean (L2)** | √Σ(aᵢ-bᵢ)² | Image features | |
| 254 | +| **Dot Product** | -a·b | Pre-normalized vectors | |
| 255 | + |
| 256 | +--- |
| 257 | + |
| 258 | +## 📚 References |
| 259 | + |
| 260 | +- **ANN-Benchmarks**: https://ann-benchmarks.com/ |
| 261 | +- **VectorDBBench**: https://github.com/zilliztech/VectorDBBench |
| 262 | +- **Qdrant Benchmarks**: https://qdrant.tech/benchmarks/ |
| 263 | +- **Zilliz Leaderboard**: https://zilliz.com/benchmark |
| 264 | +- **Erik Bernhardsson's ANN Benchmarks**: https://github.com/erikbern/ann-benchmarks |
0 commit comments