Summary
Tools defined via MCP are presented identically to all models regardless of capability. A 1.5B model on a Raspberry Pi receives the same tool schemas as Claude Opus. This wastes tokens, degrades accuracy, and makes small models unusable with large tool registries.
I propose adding optional capability hints to the MCP tool schema, enabling servers to declare tier-aware descriptions and parameters.
The Problem
Empirical data across 1,000+ native tool calling inference calls (4 models, 80 tools, 50 prompts):
| Model |
Parameters |
Accuracy with 80 tools |
| qwen2.5:1.5b |
1.5B |
50% |
| qwen3.5:9b |
9B |
80% |
| gpt-oss:20b |
20B |
80% |
| qwen3.5:35b |
35B |
88% |
Small models can't navigate large tool sets. But a decomposition reveals the bottleneck is tool discovery, not tool selection:
P(correct tool) = P(correct family) × P(correct tool | correct family)
| Model |
P(correct family) |
P(correct tool | family) |
| 1.5B |
56% |
89% |
| 35B |
90% |
98% |
Even a 1.5B model picks the right tool 89% of the time when shown the right tool neighborhood.
Proposed Schema Addition
Add an optional capabilityHints field to the Tool definition:
{
"name": "file_read",
"description": "Read file contents with line numbers, offset, and encoding control",
"inputSchema": { ... },
"capabilityHints": {
"tiers": {
"small": {
"description": "Read file",
"inputSchema": {
"type": "object",
"properties": {
"path": { "type": "string" }
},
"required": ["path"]
}
},
"medium": {
"description": "Read a file from disk",
"inputSchema": {
"type": "object",
"properties": {
"path": { "type": "string" },
"encoding": { "type": "string" }
},
"required": ["path"]
}
}
},
"category": "filesystem",
"priority": 0.8
}
}
Backwards compatible: Clients that don't understand capabilityHints ignore it and use the existing description/inputSchema (which becomes the "large" tier default). No breaking changes.
What This Enables
- MCP clients (Claude Code, Cursor, Windsurf) can detect the connected model's capability and present the appropriate tier's description/schema
- MCP servers declare once, serve all model sizes — no need for separate server configs
- Tool builders think about small model usability upfront (shorter descriptions, fewer params)
- Token savings of 83-92% for filtered/adapted strategies, translating directly to cost and latency reduction
Benchmark Results
With tier-adapted presentation:
| Strategy |
1.5B |
20B |
Token savings |
| Baseline (all tools) |
50% |
80% |
— |
| Hybrid (8 detailed + rest name-only) |
60% |
76% |
47% |
| Semantic reorder + hint |
54% |
88% |
0% (pure accuracy gain) |
| Family oracle (upper bound) |
70% |
84% |
83% |
Evidence
Design Considerations
- Tier names:
small / medium / large (or parameter-range based) — open to discussion
- Tier detection: Client-side based on model metadata, not server-side
- Fallback: If a tier isn't declared, client uses the top-level description/inputSchema
- Category field: Enables family-based routing which improves accuracy by +4-20pp
- Priority field: Hints for ordering when presenting multiple tools
Relationship to Existing Work
- AgentFlux (arXiv:2510.00229): Decouples tool selection from argument generation but doesn't address presentation adaptation
- TinyLLM (arXiv:2511.22138): Evaluates small models on tool use but doesn't propose solutions
- This proposal provides the missing protocol-level solution
Happy to discuss implementation details or provide additional benchmark data.
Summary
Tools defined via MCP are presented identically to all models regardless of capability. A 1.5B model on a Raspberry Pi receives the same tool schemas as Claude Opus. This wastes tokens, degrades accuracy, and makes small models unusable with large tool registries.
I propose adding optional capability hints to the MCP tool schema, enabling servers to declare tier-aware descriptions and parameters.
The Problem
Empirical data across 1,000+ native tool calling inference calls (4 models, 80 tools, 50 prompts):
Small models can't navigate large tool sets. But a decomposition reveals the bottleneck is tool discovery, not tool selection:
Even a 1.5B model picks the right tool 89% of the time when shown the right tool neighborhood.
Proposed Schema Addition
Add an optional
capabilityHintsfield to the Tool definition:{ "name": "file_read", "description": "Read file contents with line numbers, offset, and encoding control", "inputSchema": { ... }, "capabilityHints": { "tiers": { "small": { "description": "Read file", "inputSchema": { "type": "object", "properties": { "path": { "type": "string" } }, "required": ["path"] } }, "medium": { "description": "Read a file from disk", "inputSchema": { "type": "object", "properties": { "path": { "type": "string" }, "encoding": { "type": "string" } }, "required": ["path"] } } }, "category": "filesystem", "priority": 0.8 } }Backwards compatible: Clients that don't understand
capabilityHintsignore it and use the existingdescription/inputSchema(which becomes the "large" tier default). No breaking changes.What This Enables
Benchmark Results
With tier-adapted presentation:
Evidence
Design Considerations
small/medium/large(or parameter-range based) — open to discussionRelationship to Existing Work
Happy to discuss implementation details or provide additional benchmark data.