Proposal: Capability-Aware Tool Presentation for Heterogeneous Model Sizes

## Summary

Tools defined via MCP are presented identically to all models regardless of capability. A 1.5B model on a Raspberry Pi receives the same tool schemas as Claude Opus. This wastes tokens, degrades accuracy, and makes small models unusable with large tool registries.

I propose adding optional **capability hints** to the MCP tool schema, enabling servers to declare tier-aware descriptions and parameters.

## The Problem

Empirical data across 1,000+ native tool calling inference calls (4 models, 80 tools, 50 prompts):

| Model | Parameters | Accuracy with 80 tools |
|-------|-----------|----------------------|
| qwen2.5:1.5b | 1.5B | **50%** |
| qwen3.5:9b | 9B | 80% |
| gpt-oss:20b | 20B | 80% |
| qwen3.5:35b | 35B | 88% |

Small models can't navigate large tool sets. But a decomposition reveals **the bottleneck is tool discovery, not tool selection**:

```
P(correct tool) = P(correct family) × P(correct tool | correct family)
```

| Model | P(correct family) | P(correct tool \| family) |
|-------|------------------|--------------------------|
| 1.5B | 56% | **89%** |
| 35B | 90% | **98%** |

Even a 1.5B model picks the right tool **89% of the time** when shown the right tool neighborhood.

## Proposed Schema Addition

Add an optional `capabilityHints` field to the Tool definition:

```json
{
  "name": "file_read",
  "description": "Read file contents with line numbers, offset, and encoding control",
  "inputSchema": { ... },
  "capabilityHints": {
    "tiers": {
      "small": {
        "description": "Read file",
        "inputSchema": {
          "type": "object",
          "properties": {
            "path": { "type": "string" }
          },
          "required": ["path"]
        }
      },
      "medium": {
        "description": "Read a file from disk",
        "inputSchema": {
          "type": "object",
          "properties": {
            "path": { "type": "string" },
            "encoding": { "type": "string" }
          },
          "required": ["path"]
        }
      }
    },
    "category": "filesystem",
    "priority": 0.8
  }
}
```

**Backwards compatible:** Clients that don't understand `capabilityHints` ignore it and use the existing `description`/`inputSchema` (which becomes the "large" tier default). No breaking changes.

## What This Enables

1. **MCP clients** (Claude Code, Cursor, Windsurf) can detect the connected model's capability and present the appropriate tier's description/schema
2. **MCP servers** declare once, serve all model sizes — no need for separate server configs
3. **Tool builders** think about small model usability upfront (shorter descriptions, fewer params)
4. **Token savings of 83-92%** for filtered/adapted strategies, translating directly to cost and latency reduction

## Benchmark Results

With tier-adapted presentation:

| Strategy | 1.5B | 20B | Token savings |
|----------|------|-----|--------------|
| Baseline (all tools) | 50% | 80% | — |
| Hybrid (8 detailed + rest name-only) | **60%** | 76% | 47% |
| Semantic reorder + hint | 54% | **88%** | 0% (pure accuracy gain) |
| Family oracle (upper bound) | **70%** | **84%** | 83% |

## Evidence

- **Whitepaper:** [Tier-Based Adaptive Tool Routing for Capability-Heterogeneous AI Agents](https://zenodo.org/records/19228710) (DOI: 10.5281/zenodo.19228710)
- **Benchmark data:** 1,000+ inference calls, native Ollama tool calling API
- **Open-source SDK:** [yantrikos-sdk](https://pypi.org/project/yantrikos-sdk/) on PyPI
- **Reference implementation:** [github.com/yantrikos/tier](https://github.com/yantrikos/tier)
- **Production origin:** Architecture from YantrikOS (116+ tools, 4 model tiers)

## Design Considerations

1. **Tier names:** `small` / `medium` / `large` (or parameter-range based) — open to discussion
2. **Tier detection:** Client-side based on model metadata, not server-side
3. **Fallback:** If a tier isn't declared, client uses the top-level description/inputSchema
4. **Category field:** Enables family-based routing which improves accuracy by +4-20pp
5. **Priority field:** Hints for ordering when presenting multiple tools

## Relationship to Existing Work

- **AgentFlux** (arXiv:2510.00229): Decouples tool selection from argument generation but doesn't address presentation adaptation
- **TinyLLM** (arXiv:2511.22138): Evaluates small models on tool use but doesn't propose solutions
- This proposal provides the missing **protocol-level** solution

Happy to discuss implementation details or provide additional benchmark data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Capability-Aware Tool Presentation for Heterogeneous Model Sizes #2470

Summary

The Problem

Proposed Schema Addition

What This Enables

Benchmark Results

Evidence

Design Considerations

Relationship to Existing Work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	Parameters	Accuracy with 80 tools
qwen2.5:1.5b	1.5B	50%
qwen3.5:9b	9B	80%
gpt-oss:20b	20B	80%
qwen3.5:35b	35B	88%

Strategy	1.5B	20B	Token savings
Baseline (all tools)	50%	80%	—
Hybrid (8 detailed + rest name-only)	60%	76%	47%
Semantic reorder + hint	54%	88%	0% (pure accuracy gain)
Family oracle (upper bound)	70%	84%	83%

Proposal: Capability-Aware Tool Presentation for Heterogeneous Model Sizes #2470

Description

Summary

The Problem

Proposed Schema Addition

What This Enables

Benchmark Results

Evidence

Design Considerations

Relationship to Existing Work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions