Track, visualize, and optimize your OpenAI and Anthropic API spending.
Two lines of Python. Zero config. Instant cost visibility.
LLM Cost Report — Last 7 Days
========================================
Total: $847.32 | 2.4M tokens | 12,847 calls
By Feature:
summarizer $412.80 (48.7%) ████████████████████
chatbot $203.11 (24.0%) ████████████
classifier $89.40 (10.5%) █████
content_gen $78.22 (9.2%) ████
extraction $41.50 (4.9%) ██
untagged $22.29 (2.6%) █
Warnings:
⚠ summarizer: 34% of calls are retries ($140.15 wasted)
⚠ chatbot: avg 3,200 input tokens but only 180 output tokens (context bloat)
⚠ classifier: using gpt-4o but output is always <10 tokens (cheaper model works)
I ran this on my own project and found $1,240/month in waste — duplicate calls that should be cached, an expensive model doing a job a cheap one handles fine, and retry loops burning money on failures. All fixable in an afternoon.
If you're building with GPT-4, GPT-4o, Claude, or any LLM API, costs add up fast — and they're invisible until the bill arrives. Most teams discover they're overspending only after it's too late.
LLM Cost Profiler gives you real-time cost tracking per feature, per model, per line of code — without changing how you write code. It detects the five most common sources of LLM waste:
- Duplicate calls that should be cached (often 30-60% of total spend)
- Retry loops burning money on repeated failures
- Expensive models doing jobs that cheaper models handle identically
- Context bloat from unbounded conversation history
- Sequential calls that could be batched
Works with OpenAI (GPT-4, GPT-4o, GPT-4o-mini, o1, o3) and Anthropic (Claude Opus, Sonnet, Haiku). Supports sync and async clients. Zero dependencies.
- Quick Start
- CLI Commands
- Features
- How It Works
- API Reference
- Uninstall
- Requirements
- Contributing
- License
pip install llm-spend-profilerfrom openai import OpenAI
from llm_cost_profiler import wrap
client = wrap(OpenAI()) # that's it — every call is tracked nowYour code works exactly as before. Every API call is silently logged to a local SQLite database. If logging ever fails, it fails silently — your app is never affected.
from anthropic import Anthropic
client = wrap(Anthropic())from openai import AsyncOpenAI
client = wrap(AsyncOpenAI())llmcost reportThat's it. You're tracking.
All commands work out of the box once you've wrapped a client and made some API calls.
| Command | What it does |
|---|---|
llmcost report |
Spending breakdown by feature and model |
llmcost hotspots |
Top cost hotspots by code location |
llmcost compare |
Period-over-period cost comparison |
llmcost optimize |
Actionable savings with estimated dollar amounts |
llmcost latency |
Latency percentiles by model and call site |
llmcost dashboard |
Local web dashboard at http://127.0.0.1:8177 |
llmcost report # last 7 days (default)
llmcost report --days 30 # last 30 daysShows total spend, breakdown by feature and model, and automatic warnings for retry waste, context bloat, and overpriced model usage.
llmcost hotspots # top 10 (default)
llmcost hotspots --top 20 # top 20Top Cost Hotspots:
1. features/summarizer.py:47 summarize_doc() $412.80/week 4,201 calls ████████████████████
2. api/chat.py:123 handle_message() $203.11/week 3,892 calls ██████████
3. pipeline/classify.py:34 classify_text() $89.40/week 2,847 calls ████
Auto-detected from the call stack. No manual annotation needed.
llmcost compare # week-over-week (default)
llmcost compare --days 30 # month-over-monthWeek-over-Week Comparison:
Total: $847.32 → was $623.10 (+36% ⚠)
Biggest increases:
summarizer: +$180 (+77%)
chatbot: +$44 (+28%)
llmcost optimize # last 30 days (default)
llmcost optimize --days 90 # last 90 daysLLM Cost Optimization Report
========================================
Current monthly spend (projected): $2,847
Potential savings found: $1,240/month (43.5%)
#1 CACHE — classifier.py:34 [SAVE $310/mo]
85% of calls are exact duplicates (723 of 847/week)
→ Add @cache decorator
Confidence: HIGH
#2 RETRY FIX — content_gen.py:112 [SAVE $180/mo]
28% retry rate from JSON parse errors
→ Fix prompt to return raw JSON
Confidence: HIGH
#3 MODEL DOWNGRADE — classifier.py:34 [SAVE $71/mo]
Output is always <10 tokens, one of 5 fixed labels
→ Switch gpt-4o to gpt-4o-mini
Confidence: MEDIUM
Five analyses: cache detection, retry waste, model downgrade, context bloat, batching opportunities.
llmcost latency # last 7 days (default)
llmcost latency --days 30 # last 30 daysLLM Latency Report — Last 7 Days
========================================
Overall: p50 320ms | p95 1,240ms | p99 3,100ms | 12,847 calls
By Model:
gpt-4o p50 450ms p95 1,800ms p99 4,200ms 4,201 calls
gpt-4o-mini p50 180ms p95 520ms p99 1,100ms 3,892 calls
Slowest Call Sites:
1. features/summarizer.py:47 p95 3,200ms 4,201 calls ████████████████████
2. api/chat.py:123 p95 1,800ms 3,892 calls ███████████
Shows p50, p95, and p99 latency percentiles — overall, per model, and per call site. Warns when p95 exceeds 3 seconds.
llmcost dashboard # default port 8177
llmcost dashboard --port 9000Dark-themed local web dashboard with cost cards, feature treemap, spend timeline, model breakdown, hotspots table, and optimization waterfall. Auto-refreshes every 30 seconds. Single HTML file — no npm, no build step.
Group costs by feature, customer, environment — whatever matters to you:
from llm_cost_profiler import tag
with tag(feature="summarizer", customer="acme_corp"):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this document..."}]
)Tags nest naturally. Inner tags merge with outer tags:
with tag(feature="pipeline"):
with tag(step="extract"):
# tagged as feature=pipeline, step=extract
client.chat.completions.create(...)
with tag(step="transform"):
# tagged as feature=pipeline, step=transform
client.chat.completions.create(...)Stop paying for duplicate calls:
from llm_cost_profiler import cache
@cache(ttl=3600) # cache for 1 hour
def classify_text(text):
return client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Classify: {text}"}]
)
classify_text("hello") # API call → cached
classify_text("hello") # instant, freeWorks with both sync and async functions. Cache is stored in the same local SQLite database.
Enable prompt storage for deeper optimization analysis:
client = wrap(OpenAI(), store_prompts=True)Disabled by default for privacy. When enabled, the optimizer can detect near-duplicate prompts and analyze what causes retry failures.
Your code LLM Cost Profiler OpenAI / Anthropic
───────── ───────────────── ──────────────────
client.chat.completions → ClientProxy → ResourceProxy chain
.create(...) → intercepts create()
├─ captures call site (sys._getframe)
├─ reads active tags (contextvars)
├─ calls real SDK method ──────────→ API call happens
├─ extracts tokens from response
├─ looks up cost from pricing table
├─ logs to SQLite (async-safe)
└─ returns original response ←────── response comes back
- Proxy pattern — wraps the SDK client transparently. No monkey-patching, no subclassing. Your client object behaves identically.
- SQLite + WAL mode — all data stored locally at
~/.llmcost/data.db. Thread-safe writes, concurrent reads. No external database needed. - Built-in pricing — covers OpenAI and Anthropic models. Prefix-matching handles versioned model names (e.g.,
gpt-4o-2024-08-06matchesgpt-4o). - Call site detection — walks the Python stack via
sys._getframe()to find the exact file and line that triggered each API call. No decorators or annotations required. - Zero dependencies — only uses the Python standard library. The OpenAI/Anthropic SDKs are detected at runtime, not required at install.
Wraps an OpenAI or Anthropic client. Returns a transparent proxy that tracks all API calls.
from llm_cost_profiler import wrap
client = wrap(OpenAI()) # basic tracking
client = wrap(OpenAI(), store_prompts=True) # also store prompt contentContext manager that attaches metadata to all API calls within its scope.
from llm_cost_profiler import tag
with tag(feature="search", env="production"):
# all calls here are tagged
...Decorator that caches function results in SQLite. Identical arguments return cached responses.
from llm_cost_profiler import cache
@cache(ttl=3600)
def my_function(text):
...Returns the currently active tags as a dictionary. Useful for debugging.
from llm_cost_profiler import get_current_tags
with tag(feature="search"):
print(get_current_tags()) # {"feature": "search"}pip uninstall llm-spend-profilerTo also remove stored data:
# macOS / Linux
rm -rf ~/.llmcost
# Windows
rmdir /s /q %USERPROFILE%\.llmcost- Python 3.9+
- No required dependencies
- Optional:
openaiand/oranthropicSDKs
Contributions are welcome. To set up the dev environment:
git clone https://github.com/buildwithabid/llm-cost-profiler.git
cd llm-cost-profiler
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -e ".[dev]"
pytestAll 50 tests should pass. If you're adding a new feature, please include tests.
MIT -- see LICENSE for details.
Keywords: LLM cost tracking, OpenAI cost monitoring, Anthropic API costs, GPT-4 cost optimizer, Claude API spending, LLM token usage tracker, AI API cost management, Python LLM profiler, reduce OpenAI bill, LLM spend analytics, GPT cost per feature, AI cost optimization tool, LLM API budget monitor, token cost calculator, ChatGPT cost tracker