Skip to content

SEP: Tool Risk Metadata#2793

Open
walbis wants to merge 3 commits into
modelcontextprotocol:mainfrom
walbis:sep/tool-risk-metadata
Open

SEP: Tool Risk Metadata#2793
walbis wants to merge 3 commits into
modelcontextprotocol:mainfrom
walbis:sep/tool-risk-metadata

Conversation

@walbis
Copy link
Copy Markdown

@walbis walbis commented May 26, 2026

SEP: Tool Risk Metadata

This SEP extends ToolAnnotations with structured, machine-readable risk
metadata so MCP clients can make consistent allowlist / approval decisions
across tools without each consumer rebuilding a bespoke per-tool catalogue.

What it adds (all optional, additive)

ToolAnnotations already carries readOnlyHint, destructiveHint,
idempotentHint, openWorldHint — yes/no hints. This proposal adds
graded fields:

  • riskLevel: "low" | "medium" | "high" | "critical"
  • category: "read" | "observe" | "mutate" | "delete" | "destroy" | "utility"
  • blastRadius: "item" | "namespace" | "cluster" | "organization" | "global"
  • reversibility: "auto" | "manual" | "none"
  • sideEffects: string[] (open vocab)
  • approvalRecommendation: "none" | "single" | "multi"
  • minTrustLevel: number (1–5 advisory scale)

Pure addition; no breaking change. Servers and clients that don't know
these fields work exactly as today.

Motivation

Every MCP-consuming agent platform reinvents the same vocabulary. KARAI's
config/tool_policies.yaml (~30 K8s tools) is one example; Claude Desktop,
Cursor, Cline, Continue, OpenDevin ship analogous configs. destructiveHint
gives a binary yes/no, but a tool deleting one row and a tool dropping a
whole namespace get the same flag.

Reference implementation

File

seps/9999-tool-risk-metadata.md — happy to rename to the allocated SEP
number on request.

Looking for sponsor + feedback on

  1. Enum vocabularies for category / blastRadius / reversibility.
  2. Whether minTrustLevel should be numeric or string-with-convention.
  3. Whether the same vocabulary should apply to Resources / Prompts in a
    follow-up SEP.

walbis and others added 3 commits May 27, 2026 00:18
Extend ToolAnnotations with optional graded risk metadata
(riskLevel, category, blastRadius, reversibility, sideEffects,
approvalRecommendation, minTrustLevel) so MCP clients can make
consistent allowlist + approval decisions across tools without
each consumer rebuilding a bespoke per-tool catalogue.

Pure addition; fully backward compatible.

Reference impl: walbis/karai config/tool_policies.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename seps/9999-tool-risk-metadata.md -> seps/2793-tool-risk-metadata.md
to match the allocated PR/SEP number. Updates the title heading, SEP
Number cell, Issue link, and PR link in the metadata table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reference implementation is no longer hypothetical — heuristic classifier
+ CLI + 19 tests are shipping at https://github.com/walbis/mcp-risk-inferrer.

- Prototype row now lists the inferrer alongside the KARAI catalogue.
- Inference paragraph updates the heuristic description to match the
  implementation (scope-hint params for blastRadius, four risky-flag
  classes) and downgrades the LLM augmenter to "planned".
- Reference-implementations section gives the inferrer a real bullet
  with status and v0.2/v0.3 roadmap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@walbis
Copy link
Copy Markdown
Author

walbis commented May 26, 2026

Reference implementation now exists — pushed v0.1 of the inferrer as a follow-up commit on this PR's branch and as a standalone repo:

  • https://github.com/walbis/mcp-risk-inferrer (Python, MIT)
  • Heuristic classifier: ordered verb regex → category + riskLevel; risky-flag bump (force/recursive/cascade/all_namespaces); blast-radius inference from scope-hint params (namespace/cluster/org); derivation tables for reversibility/approvalRecommendation/minTrustLevel.
  • Pydantic models matching this SEP's vocabulary exactly (RiskLevel/Category/BlastRadius/Reversibility/ApprovalRec/ToolRiskManifest).
  • CLI: mcp-risk-inferrer classify tools.json --format yaml|json.
  • 19 real failing-first tests against an 11-tool K8s fixture (kubectl_get → read/low; kubectl_apply → mutate/medium/namespace; kubectl_delete → delete/high + state_loss; cleanup_pods → destroy/critical; force bumps high → critical; etc.). No tautologies.
  • v0.2 planned: optional LLM augmenter. v0.3: live MCP connector.

Goal is that consumers can bootstrap risk metadata for any existing MCP server today, without waiting for every server to declare the new fields. Happy to iterate the vocabulary mapping if the spec moves during review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant