Skip to content

Field Projection for Tool Output Schema to Optimize Token Usage and Privacy #1704

@pratikjadhav2726

Description

@pratikjadhav2726

Is your feature request related to a problem? Please describe.

Currently, tools in MCP return full output objects conforming to their outputSchema, regardless of how many fields are actually needed by the client or LLM. This results in several critical problems:

  1. Excessive Token Consumption: Large structuredContent objects consume unnecessary tokens even when only a few fields are needed (often 85-95% waste)
  2. Reduced Model Performance: Unnecessary data in context reduces model clarity and focus
  3. No Client Control: Clients cannot express their actual data requirements to servers

Example: A get_user_profile tool returns ~800 tokens of comprehensive data (name, email, phone, address, preferences, activity history, metadata), but for a simple greeting task, only the name (~20 tokens) is needed - a 97.5% token waste.

Describe the solution you'd like

Add protocol-level support for field projection allowing clients/LLMs to request specific subsets of fields from tool outputs:

1. Capability Declaration

Servers declare projection support during initialization:

{
  "capabilities": {
    "tools": {
      "projection": {
        "supported": true,
        "modes": ["include", "exclude"],
        "maxDepth": 5
      }
    }
  }
}

2. Tool Annotations

Tools indicate projection support and recommended views:

{
  "annotations": {
    "projectionHint": {
      "supported": true,
      "recommendedViews": {
        "minimal": ["id", "name"],
        "contact": ["name", "email", "phone"]
      }
    }
  }
}

3. Projection in Tool Calls

Clients specify needed fields in _meta:

{
  "method": "tools/call",
  "params": {
    "name": "get_user_profile",
    "arguments": { "userId": "user123" },
    "_meta": {
      "projection": {
        "mode": "include",
        "fields": ["name", "email"]
      }
    }
  }
}

4. Projected Response

Server returns only requested fields with metadata:

{
  "result": {
    "structuredContent": { "name": "John Doe", "email": "john@example.com" },
    "content": [{ "type": "text", "text": "Name: John Doe\nEmail: john@example.com" }],
    "_meta": {
      "projection": {
        "applied": true,
        "mode": "include",
        "fields": ["name", "email"],
        "projectedSchema": { /* derived subset schema for validation */ }
      }
    }
  }
}

Key Features:

  • Three modes: include (whitelist), exclude (blacklist), view (named presets)
  • Field path syntax: Support nested fields (address.city), arrays, wildcards
  • Partial schema validation: Servers provide derived schema for projected outputs
  • Graceful degradation: Falls back to full output if projection not supported

Describe alternatives you've considered

  1. Client-side filtering (post-processing)

    • ❌ No token savings - data still transferred
    • ❌ No privacy benefits - all data exposed to LLM
    • ❌ Defeats the entire purpose
  2. Separate summary tools (e.g., get_user_profile_minimal, get_user_profile_contact)

    • ❌ Tool proliferation - maintenance burden
    • ❌ Doesn't scale with schema complexity
    • ❌ Less flexible than dynamic projection
    • ✅ Can complement projection as "recommended views"
  3. Schema views (tools declare multiple named output schemas)

    • ❌ Less flexible - requires predefined views
    • ✅ Partially adopted as recommendedViews in annotations

Additional context

Performance & Cost Impact

  • Token savings: 85-95% reduction in typical scenarios
  • Cost example: For 10,000 tool calls/day with Claude Sonnet 3.5, saving 800 tokens/call = $41,610/year savings
  • Use case savings:
    • User greeting (name only): 97.5%
    • Email notification (name + email): 95%
    • Status check (single field): 97.5%

Security & Privacy Benefits

  • Data minimization: GDPR Article 5.1c compliance
  • Purpose limitation: Only collect data needed for specific task
  • Audit trail: Clear metadata about requested/returned fields
  • Access control: Projection respects field-level permissions

Backwards Compatibility

Fully backwards compatible:

  • Optional capability via negotiation
  • Graceful degradation if not supported
  • No breaking changes to existing tools
  • Works with current content/structuredContent semantics

Architecture Alignment

  • Uses _meta field (consistent with progressToken)
  • Capability negotiation (like sampling, roots, elicitation)
  • Tool annotations (like readOnlyHint, idempotentHint)
  • Works within existing outputSchema validation framework

Related Work

Open Questions for Maintainers

  1. Should projection be in _meta or top-level params? (Proposal: _meta for consistency)
  2. Is the JSON pointer-like field path syntax sufficient, or support full JSONPath?
  3. Should schema validation be MUST or SHOULD for servers?
  4. Is server-level + tool-level capability granularity sufficient?
  5. How should projection interact with progress tokens?

Implementation Example

// Server declares support
const capabilities = {
  tools: { projection: { supported: true, modes: ["include", "exclude"] } }
};

// Client requests projection
const result = await client.callTool({
  name: "get_user_profile",
  arguments: { userId: "123" },
  _meta: { projection: { mode: "include", fields: ["name", "email"] } }
});

// Server returns projected data
if (result._meta?.projection?.applied) {
  console.log("Token savings achieved:", result._meta.projection.fields);
}

This feature would enable more efficient, privacy-preserving, and cost-effective MCP tool integrations while maintaining full backwards compatibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions