deepconf

DeepConf: Deep Think with Confidence

DeepConf is a confidence-aware reasoning approach for large language models that uses model-internal confidence signals to dynamically filter out low-quality reasoning traces during generation, improving both efficiency and accuracy.

Overview

Based on the paper "Deep Think with Confidence" by Fu et al. (2024), DeepConf implements:

Token-level confidence scoring using entropy and log-probability metrics
Online mode with early termination to save computational resources
Warmup phase for threshold calibration
Consensus-based stopping when high agreement is reached
Weighted majority voting for final answer selection

Features

✅ Local models only - Works with OptiLLM's local inference engine
✅ Two variants: low (aggressive, top 10%) and high (conservative, top 90%)
✅ Configurable parameters for different use cases
✅ Early termination to reduce token usage by 50-70%
✅ Automatic quality control without external evaluation

Usage

Basic Usage

Set up OptiLLM for local inference:

export OPTILLM_API_KEY=optillm
python optillm.py --model your-local-model

Then make a request with DeepConf decoding:

import openai

client = openai.OpenAI(
    api_key="optillm",
    base_url="http://localhost:8000/v1"
)

response = client.chat.completions.create(
    model="your-model",
    messages=[
        {"role": "user", "content": "Solve this math problem: What is the derivative of x^3 + 2x^2 - 5x + 1?"}
    ],
    extra_body={
        "decoding": "deepconf",
        "variant": "low",           # "low" or "high"
        "warmup_samples": 16,       # Number of calibration traces
        "max_traces": 64,           # Maximum total traces
        "consensus_threshold": 0.95  # Stop when consensus reached
    }
)

print(response.choices[0].message.content)

Configuration Parameters

Parameter	Default	Description
`variant`	`"low"`	Filtering strategy: `"low"` (top 10%, aggressive) or `"high"` (top 90%, conservative)
`warmup_samples`	`16`	Number of initial traces for threshold calibration
`consensus_threshold`	`0.95`	Stop generation when this level of agreement is reached
`max_traces`	`128`	Maximum number of traces to generate
`window_size`	`2048`	Sliding window size for group confidence calculation
`top_k`	`5`	Number of top tokens for confidence calculation
`min_trace_length`	`100`	Minimum tokens before allowing early termination
`max_tokens_per_trace`	`4096`	Maximum tokens per individual trace
`confidence_metric`	`"average_confidence"`	Metric used for threshold calculation
`include_stats`	`false`	Include processing statistics in response

Advanced Usage

Include statistics in the response for debugging:

response = client.chat.completions.create(
    model="your-model", 
    messages=[...],
    extra_body={
        "decoding": "deepconf",
        "variant": "high",
        "include_stats": true,
        "warmup_samples": 8,
        "max_traces": 32
    }
)

How It Works

Warmup Phase: Generate initial traces to calibrate confidence threshold
Online Generation: Generate traces with early termination based on confidence
Consensus Check: Stop when sufficient agreement is reached
Final Selection: Use weighted majority voting to select the best answer

Confidence Metrics

Token Entropy: H = -∑P(j) log P(j)
Token Confidence: C = -(1/k) ∑log P(j) for top-k tokens
Group Confidence: Sliding window averages over token confidences
Trace Confidence: Average confidence across all tokens in a trace

Variants

DeepConf-low: Uses 90th percentile threshold (keeps top 10% traces) - more aggressive filtering
DeepConf-high: Uses 10th percentile threshold (keeps top 90% traces) - more conservative filtering

Performance

DeepConf typically achieves:

50-70% reduction in token usage compared to standard sampling
Maintained or improved accuracy through confidence-based filtering
Automatic quality control without requiring external evaluation models

Requirements

Local model inference (PyTorch)
OptiLLM with OPTILLM_API_KEY=optillm
Compatible with transformer models that provide logits access

References

Paper: "Deep Think with Confidence" by Fu et al. (2025)
arXiv: https://arxiv.org/abs/2508.15260
Authors: Yichao Fu (UCSD), Xuewei Wang (Meta AI), Yuandong Tian (Meta AI), Jiawei Zhao (Meta AI)

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
confidence.py		confidence.py
deepconf.py		deepconf.py
processor.py		processor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

DeepConf: Deep Think with Confidence

Overview

Features

Usage

Basic Usage

Configuration Parameters

Advanced Usage

How It Works

Confidence Metrics

Variants

Performance

Requirements

References

FilesExpand file tree

deepconf

Directory actions

More options

Directory actions

More options

Latest commit

History

deepconf

Folders and files

parent directory

README.md

DeepConf: Deep Think with Confidence

Overview

Features

Usage

Basic Usage

Configuration Parameters

Advanced Usage

How It Works

Confidence Metrics

Variants

Performance

Requirements

References