Skip to content

Proposal: Capability-Aware Tool Presentation for Heterogeneous Model Sizes #2470

@spranab

Description

@spranab

Summary

Tools defined via MCP are presented identically to all models regardless of capability. A 1.5B model on a Raspberry Pi receives the same tool schemas as Claude Opus. This wastes tokens, degrades accuracy, and makes small models unusable with large tool registries.

I propose adding optional capability hints to the MCP tool schema, enabling servers to declare tier-aware descriptions and parameters.

The Problem

Empirical data across 1,000+ native tool calling inference calls (4 models, 80 tools, 50 prompts):

Model Parameters Accuracy with 80 tools
qwen2.5:1.5b 1.5B 50%
qwen3.5:9b 9B 80%
gpt-oss:20b 20B 80%
qwen3.5:35b 35B 88%

Small models can't navigate large tool sets. But a decomposition reveals the bottleneck is tool discovery, not tool selection:

P(correct tool) = P(correct family) × P(correct tool | correct family)
Model P(correct family) P(correct tool | family)
1.5B 56% 89%
35B 90% 98%

Even a 1.5B model picks the right tool 89% of the time when shown the right tool neighborhood.

Proposed Schema Addition

Add an optional capabilityHints field to the Tool definition:

{
  "name": "file_read",
  "description": "Read file contents with line numbers, offset, and encoding control",
  "inputSchema": { ... },
  "capabilityHints": {
    "tiers": {
      "small": {
        "description": "Read file",
        "inputSchema": {
          "type": "object",
          "properties": {
            "path": { "type": "string" }
          },
          "required": ["path"]
        }
      },
      "medium": {
        "description": "Read a file from disk",
        "inputSchema": {
          "type": "object",
          "properties": {
            "path": { "type": "string" },
            "encoding": { "type": "string" }
          },
          "required": ["path"]
        }
      }
    },
    "category": "filesystem",
    "priority": 0.8
  }
}

Backwards compatible: Clients that don't understand capabilityHints ignore it and use the existing description/inputSchema (which becomes the "large" tier default). No breaking changes.

What This Enables

  1. MCP clients (Claude Code, Cursor, Windsurf) can detect the connected model's capability and present the appropriate tier's description/schema
  2. MCP servers declare once, serve all model sizes — no need for separate server configs
  3. Tool builders think about small model usability upfront (shorter descriptions, fewer params)
  4. Token savings of 83-92% for filtered/adapted strategies, translating directly to cost and latency reduction

Benchmark Results

With tier-adapted presentation:

Strategy 1.5B 20B Token savings
Baseline (all tools) 50% 80%
Hybrid (8 detailed + rest name-only) 60% 76% 47%
Semantic reorder + hint 54% 88% 0% (pure accuracy gain)
Family oracle (upper bound) 70% 84% 83%

Evidence

Design Considerations

  1. Tier names: small / medium / large (or parameter-range based) — open to discussion
  2. Tier detection: Client-side based on model metadata, not server-side
  3. Fallback: If a tier isn't declared, client uses the top-level description/inputSchema
  4. Category field: Enables family-based routing which improves accuracy by +4-20pp
  5. Priority field: Hints for ordering when presenting multiple tools

Relationship to Existing Work

  • AgentFlux (arXiv:2510.00229): Decouples tool selection from argument generation but doesn't address presentation adaptation
  • TinyLLM (arXiv:2511.22138): Evaluates small models on tool use but doesn't propose solutions
  • This proposal provides the missing protocol-level solution

Happy to discuss implementation details or provide additional benchmark data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions