LLM Cost Profiler

Track, visualize, and optimize your OpenAI and Anthropic API spending.
Two lines of Python. Zero config. Instant cost visibility.

LLM Cost Report — Last 7 Days
========================================
Total: $847.32 | 2.4M tokens | 12,847 calls

By Feature:
  summarizer         $412.80  (48.7%)  ████████████████████
  chatbot            $203.11  (24.0%)  ████████████
  classifier          $89.40  (10.5%)  █████
  content_gen         $78.22   (9.2%)  ████
  extraction          $41.50   (4.9%)  ██
  untagged            $22.29   (2.6%)  █

Warnings:
  ⚠ summarizer: 34% of calls are retries ($140.15 wasted)
  ⚠ chatbot: avg 3,200 input tokens but only 180 output tokens (context bloat)
  ⚠ classifier: using gpt-4o but output is always <10 tokens (cheaper model works)

I ran this on my own project and found $1,240/month in waste — duplicate calls that should be cached, an expensive model doing a job a cheap one handles fine, and retry loops burning money on failures. All fixable in an afternoon.

Why LLM Cost Profiler?

If you're building with GPT-4, GPT-4o, Claude, or any LLM API, costs add up fast — and they're invisible until the bill arrives. Most teams discover they're overspending only after it's too late.

LLM Cost Profiler gives you real-time cost tracking per feature, per model, per line of code — without changing how you write code. It detects the five most common sources of LLM waste:

Duplicate calls that should be cached (often 30-60% of total spend)
Retry loops burning money on repeated failures
Expensive models doing jobs that cheaper models handle identically
Context bloat from unbounded conversation history
Sequential calls that could be batched

Works with OpenAI (GPT-4, GPT-4o, GPT-4o-mini, o1, o3) and Anthropic (Claude Opus, Sonnet, Haiku). Supports sync and async clients. Zero dependencies.

Quick Start

Install

pip install llm-spend-profiler

Wrap your client

from openai import OpenAI
from llm_cost_profiler import wrap

client = wrap(OpenAI())  # that's it — every call is tracked now

Your code works exactly as before. Every API call is silently logged to a local SQLite database. If logging ever fails, it fails silently — your app is never affected.

Works with Anthropic

from anthropic import Anthropic
client = wrap(Anthropic())

Works with async

from openai import AsyncOpenAI
client = wrap(AsyncOpenAI())

See where your money goes

llmcost report

That's it. You're tracking.

CLI Commands

All commands work out of the box once you've wrapped a client and made some API calls.

Command	What it does
`llmcost report`	Spending breakdown by feature and model
`llmcost hotspots`	Top cost hotspots by code location
`llmcost compare`	Period-over-period cost comparison
`llmcost optimize`	Actionable savings with estimated dollar amounts
`llmcost latency`	Latency percentiles by model and call site
`llmcost dashboard`	Local web dashboard at `http://127.0.0.1:8177`

`llmcost report`

llmcost report           # last 7 days (default)
llmcost report --days 30 # last 30 days

Shows total spend, breakdown by feature and model, and automatic warnings for retry waste, context bloat, and overpriced model usage.

`llmcost hotspots`

llmcost hotspots          # top 10 (default)
llmcost hotspots --top 20 # top 20

Top Cost Hotspots:
  1. features/summarizer.py:47   summarize_doc()    $412.80/week   4,201 calls  ████████████████████
  2. api/chat.py:123             handle_message()   $203.11/week   3,892 calls  ██████████
  3. pipeline/classify.py:34     classify_text()     $89.40/week   2,847 calls  ████

Auto-detected from the call stack. No manual annotation needed.

`llmcost compare`

llmcost compare           # week-over-week (default)
llmcost compare --days 30 # month-over-month

Week-over-Week Comparison:
  Total: $847.32 → was $623.10 (+36% ⚠)

  Biggest increases:
    summarizer: +$180 (+77%)
    chatbot:    +$44  (+28%)

`llmcost optimize`

llmcost optimize            # last 30 days (default)
llmcost optimize --days 90  # last 90 days

LLM Cost Optimization Report
========================================
Current monthly spend (projected): $2,847
Potential savings found: $1,240/month (43.5%)

  #1 CACHE — classifier.py:34                        [SAVE $310/mo]
     85% of calls are exact duplicates (723 of 847/week)
     → Add @cache decorator
     Confidence: HIGH

  #2 RETRY FIX — content_gen.py:112                   [SAVE $180/mo]
     28% retry rate from JSON parse errors
     → Fix prompt to return raw JSON
     Confidence: HIGH

  #3 MODEL DOWNGRADE — classifier.py:34               [SAVE $71/mo]
     Output is always <10 tokens, one of 5 fixed labels
     → Switch gpt-4o to gpt-4o-mini
     Confidence: MEDIUM

Five analyses: cache detection, retry waste, model downgrade, context bloat, batching opportunities.

`llmcost latency`

llmcost latency           # last 7 days (default)
llmcost latency --days 30 # last 30 days

LLM Latency Report — Last 7 Days
========================================
Overall: p50 320ms | p95 1,240ms | p99 3,100ms | 12,847 calls

By Model:
  gpt-4o                p50  450ms   p95  1,800ms   p99  4,200ms   4,201 calls
  gpt-4o-mini           p50  180ms   p95    520ms   p99  1,100ms   3,892 calls

Slowest Call Sites:
  1. features/summarizer.py:47   p95 3,200ms   4,201 calls  ████████████████████
  2. api/chat.py:123             p95 1,800ms   3,892 calls  ███████████

Shows p50, p95, and p99 latency percentiles — overall, per model, and per call site. Warns when p95 exceeds 3 seconds.

`llmcost dashboard`

llmcost dashboard           # default port 8177
llmcost dashboard --port 9000

Dark-themed local web dashboard with cost cards, feature treemap, spend timeline, model breakdown, hotspots table, and optimization waterfall. Auto-refreshes every 30 seconds. Single HTML file — no npm, no build step.

Features

Tag Your Calls

Group costs by feature, customer, environment — whatever matters to you:

from llm_cost_profiler import tag

with tag(feature="summarizer", customer="acme_corp"):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Summarize this document..."}]
    )

Tags nest naturally. Inner tags merge with outer tags:

with tag(feature="pipeline"):
    with tag(step="extract"):
        # tagged as feature=pipeline, step=extract
        client.chat.completions.create(...)
    with tag(step="transform"):
        # tagged as feature=pipeline, step=transform
        client.chat.completions.create(...)

Cache Responses

Stop paying for duplicate calls:

from llm_cost_profiler import cache

@cache(ttl=3600)  # cache for 1 hour
def classify_text(text):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Classify: {text}"}]
    )

classify_text("hello")  # API call → cached
classify_text("hello")  # instant, free

Works with both sync and async functions. Cache is stored in the same local SQLite database.

Store Prompts (optional)

Enable prompt storage for deeper optimization analysis:

client = wrap(OpenAI(), store_prompts=True)

Disabled by default for privacy. When enabled, the optimizer can detect near-duplicate prompts and analyze what causes retry failures.

How It Works

Your code                     LLM Cost Profiler                    OpenAI / Anthropic
─────────                     ─────────────────                    ──────────────────
client.chat.completions  →  ClientProxy → ResourceProxy chain
          .create(...)   →  intercepts create()
                            ├─ captures call site (sys._getframe)
                            ├─ reads active tags (contextvars)
                            ├─ calls real SDK method  ──────────→  API call happens
                            ├─ extracts tokens from response
                            ├─ looks up cost from pricing table
                            ├─ logs to SQLite (async-safe)
                            └─ returns original response  ←──────  response comes back

Proxy pattern — wraps the SDK client transparently. No monkey-patching, no subclassing. Your client object behaves identically.
SQLite + WAL mode — all data stored locally at ~/.llmcost/data.db. Thread-safe writes, concurrent reads. No external database needed.
Built-in pricing — covers OpenAI and Anthropic models. Prefix-matching handles versioned model names (e.g., gpt-4o-2024-08-06 matches gpt-4o).
Call site detection — walks the Python stack via sys._getframe() to find the exact file and line that triggered each API call. No decorators or annotations required.
Zero dependencies — only uses the Python standard library. The OpenAI/Anthropic SDKs are detected at runtime, not required at install.

API Reference

`wrap(client, store_prompts=False)`

Wraps an OpenAI or Anthropic client. Returns a transparent proxy that tracks all API calls.

from llm_cost_profiler import wrap

client = wrap(OpenAI())                        # basic tracking
client = wrap(OpenAI(), store_prompts=True)     # also store prompt content

`tag(**kwargs)`

Context manager that attaches metadata to all API calls within its scope.

from llm_cost_profiler import tag

with tag(feature="search", env="production"):
    # all calls here are tagged
    ...

`cache(ttl=3600, db_path=None)`

Decorator that caches function results in SQLite. Identical arguments return cached responses.

from llm_cost_profiler import cache

@cache(ttl=3600)
def my_function(text):
    ...

`get_current_tags()`

Returns the currently active tags as a dictionary. Useful for debugging.

from llm_cost_profiler import get_current_tags

with tag(feature="search"):
    print(get_current_tags())  # {"feature": "search"}

Uninstall

pip uninstall llm-spend-profiler

To also remove stored data:

# macOS / Linux
rm -rf ~/.llmcost

# Windows
rmdir /s /q %USERPROFILE%\.llmcost

Requirements

Python 3.9+
No required dependencies
Optional: openai and/or anthropic SDKs

Contributing

Contributions are welcome. To set up the dev environment:

git clone https://github.com/buildwithabid/llm-cost-profiler.git
cd llm-cost-profiler
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -e ".[dev]"
pytest

All 50 tests should pass. If you're adding a new feature, please include tests.

License

MIT -- see LICENSE for details.

_{Keywords: LLM cost tracking, OpenAI cost monitoring, Anthropic API costs, GPT-4 cost optimizer, Claude API spending, LLM token usage tracker, AI API cost management, Python LLM profiler, reduce OpenAI bill, LLM spend analytics, GPT cost per feature, AI cost optimization tool, LLM API budget monitor, token cost calculator, ChatGPT cost tracker}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
docs/superpowers		docs/superpowers
llm_cost_profiler		llm_cost_profiler
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

LLM Cost Profiler

Why LLM Cost Profiler?

Table of Contents

Quick Start

Install

Wrap your client

Works with Anthropic

Works with async

See where your money goes

CLI Commands

llmcost report

llmcost hotspots

llmcost compare

llmcost optimize

llmcost latency

llmcost dashboard

Features

Tag Your Calls

Cache Responses

Store Prompts (optional)

How It Works

API Reference

wrap(client, store_prompts=False)

tag(**kwargs)

cache(ttl=3600, db_path=None)

get_current_tags()

Uninstall

Requirements

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`llmcost report`

`llmcost hotspots`

`llmcost compare`

`llmcost optimize`

`llmcost latency`

`llmcost dashboard`

`wrap(client, store_prompts=False)`

`tag(**kwargs)`

`cache(ttl=3600, db_path=None)`

`get_current_tags()`

Packages