SEP: Tool Risk Metadata by walbis · Pull Request #2793 · modelcontextprotocol/modelcontextprotocol

walbis · 2026-05-26T21:18:52Z

SEP: Tool Risk Metadata

This SEP extends ToolAnnotations with structured, machine-readable risk
metadata so MCP clients can make consistent allowlist / approval decisions
across tools without each consumer rebuilding a bespoke per-tool catalogue.

What it adds (all optional, additive)

ToolAnnotations already carries readOnlyHint, destructiveHint,
idempotentHint, openWorldHint — yes/no hints. This proposal adds
graded fields:

riskLevel: "low" | "medium" | "high" | "critical"
category: "read" | "observe" | "mutate" | "delete" | "destroy" | "utility"
blastRadius: "item" | "namespace" | "cluster" | "organization" | "global"
reversibility: "auto" | "manual" | "none"
sideEffects: string[] (open vocab)
approvalRecommendation: "none" | "single" | "multi"
minTrustLevel: number (1–5 advisory scale)

Pure addition; no breaking change. Servers and clients that don't know
these fields work exactly as today.

Motivation

Every MCP-consuming agent platform reinvents the same vocabulary. KARAI's
config/tool_policies.yaml (~30 K8s tools) is one example; Claude Desktop,
Cursor, Cline, Continue, OpenDevin ship analogous configs. destructiveHint
gives a binary yes/no, but a tool deleting one row and a tool dropping a
whole namespace get the same flag.

Reference implementation

KARAI catalogue (manual, proof of need):
https://github.com/walbis/karai/blob/master/config/tool_policies.yaml
Planned: mcp-risk-inferrer OSS service that derives these values for
servers that haven't declared them yet (verb + schema heuristic + optional
LLM augment).

File

seps/9999-tool-risk-metadata.md — happy to rename to the allocated SEP
number on request.

Looking for sponsor + feedback on

Enum vocabularies for category / blastRadius / reversibility.
Whether minTrustLevel should be numeric or string-with-convention.
Whether the same vocabulary should apply to Resources / Prompts in a
follow-up SEP.

Extend ToolAnnotations with optional graded risk metadata (riskLevel, category, blastRadius, reversibility, sideEffects, approvalRecommendation, minTrustLevel) so MCP clients can make consistent allowlist + approval decisions across tools without each consumer rebuilding a bespoke per-tool catalogue. Pure addition; fully backward compatible. Reference impl: walbis/karai config/tool_policies.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rename seps/9999-tool-risk-metadata.md -> seps/2793-tool-risk-metadata.md to match the allocated PR/SEP number. Updates the title heading, SEP Number cell, Issue link, and PR link in the metadata table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reference implementation is no longer hypothetical — heuristic classifier + CLI + 19 tests are shipping at https://github.com/walbis/mcp-risk-inferrer. - Prototype row now lists the inferrer alongside the KARAI catalogue. - Inference paragraph updates the heuristic description to match the implementation (scope-hint params for blastRadius, four risky-flag classes) and downgrades the LLM augmenter to "planned". - Reference-implementations section gives the inferrer a real bullet with status and v0.2/v0.3 roadmap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

walbis · 2026-05-26T21:33:32Z

Reference implementation now exists — pushed v0.1 of the inferrer as a follow-up commit on this PR's branch and as a standalone repo:

https://github.com/walbis/mcp-risk-inferrer (Python, MIT)
Heuristic classifier: ordered verb regex → category + riskLevel; risky-flag bump (force/recursive/cascade/all_namespaces); blast-radius inference from scope-hint params (namespace/cluster/org); derivation tables for reversibility/approvalRecommendation/minTrustLevel.
Pydantic models matching this SEP's vocabulary exactly (RiskLevel/Category/BlastRadius/Reversibility/ApprovalRec/ToolRiskManifest).
CLI: mcp-risk-inferrer classify tools.json --format yaml|json.
19 real failing-first tests against an 11-tool K8s fixture (kubectl_get → read/low; kubectl_apply → mutate/medium/namespace; kubectl_delete → delete/high + state_loss; cleanup_pods → destroy/critical; force bumps high → critical; etc.). No tautologies.
v0.2 planned: optional LLM augmenter. v0.3: live MCP connector.

Goal is that consumers can bootstrap risk metadata for any existing MCP server today, without waiting for every server to declare the new fields. Happy to iterate the vocabulary mapping if the spec moves during review.

b-gutman · 2026-06-08T02:14:38Z

Strong +1 from the multi-server-host side. We run Pipeworx (a hosted MCP gateway — ~755 packs / 3,300+ tools behind one origin), and the "every platform reinvents the same vocabulary" problem is just as real on the consuming side.

One concrete data point that argues for this SEP: clients already diverge on the existing binary hints. The spec makes destructiveHint meaningful only when readOnlyHint=false, but at least one major client's validator flags it as "missing" on every tool regardless — so we set it unconditionally across all 3k+ tools just to pass review. Graded, machine-readable metadata helps, but only if the SEP also pins down required-ness (when a field is expected vs. optional per client), not just the value vocabulary — otherwise the same divergence reappears one layer up.

Two field notes on the proposed fields: (1) blastRadius maps cleanly onto how we'd want to gate tools by auth tier — genuinely useful. (2) reversibility: "none" + category: "destroy" overlap with the legacy destructiveHint; worth documenting precedence so consumers don't double-count. Happy to share how the binary hints play out across 3k+ real tools if it'd help ground the design.

AgentGymLeader · 2026-06-08T06:52:23Z

@walbis @b-gutman the graded direction makes sense, and Bruce's required-ness point is the one I'd put front and center: a value vocabulary without a per-field optional/expected contract just pushes the divergence up a layer, the way he saw with destructiveHint.

One framing that might keep this clean: treat each field as a producer-asserted hint the consumer can interpret, override, or ignore — not an authority signal. A server says reversibility: "none"; the host still decides what to gate or prompt on. If the SEP spells that out (who asserts each field, and that consumers may override), the reversibility/category vs legacy destructiveHint precedence question mostly answers itself — you're reconciling hints, not resolving authority.

I'd scope approvalRecommendation as advisory-only for the same reason.

ppcvote · 2026-06-24T13:06:31Z

@walbis @b-gutman @AgentGymLeader — +1 from the static-analysis side, and a concrete addition that builds on Bruce's required-ness point.

Why the divergence reappears, and where SARIF helps

Bruce's framing is correct: a value vocabulary without per-field required-ness pushes divergence up a layer. The same problem hits the consumer-of-the-consumer — static-analysis tools that audit MCP catalogues (mcp-scanner, prompt-defense-audit, ultraprobe) want to ingest and emit this vocabulary, and right now there's no bridge between "server-declared riskLevel" and "SARIF finding emitted by a scanner". Each scanner reinvents the severity mapping.

A normative riskLevel → SARIF level table in this SEP closes that gap:

`riskLevel`	`reversibility`	SARIF `level`	Consumer treatment (default)
`critical`	`none`	`error`	MUST fail CI / block auto-approval
`high`	any	`error`	MUST surface for approval
`medium`	any	`warning`	SHOULD surface; consumer may tighten
`low`	any	`note`	informational
absent	—	`note`	informational, mark as inferred

Spec language could be: "Consumers MAY tighten severity (escalate warning→error), but MUST NOT loosen below the producer-asserted level." That keeps Andrew's hint-not-authority framing for the value, while giving SARIF emitters a deterministic mapping that doesn't require per-client guessing.

One non-overlap clarification building on Bruce's reversibility:"none" + destructiveHint point

When a tool divergence exists between server-declared and statically-inferred risk, that divergence is the test signal: scanner emits a SARIF result like mcp-risk-mismatch (server: low, inferred: critical). Consumers gate or audit on the delta. This works whether the producer asserts the hint, the consumer overrides it, or a third-party scanner re-computes it.

Offer

Happy to draft seps/NNNN-tool-risk-sarif-mapping.md as a scoped follow-up SEP once #2793 lands, focused on the CI/CD + SAST consumer slice (out of scope for the current SEP). Doesn't change anything you've proposed; complements it on the audit side.

(Reference shapes I'd carry forward: SARIF 2.1.0 §3.27.10 for level semantics, ATR convergence on rule-ID federation in agent-threat-rules PR #54, Cisco mcp-scanner #146 merged 2026-04 for the static-side precedent.)

walbis and others added 3 commits May 27, 2026 00:18

localden added SEP draft SEP proposal with a sponsor. labels Jun 8, 2026

github-project-automation Bot added this to SEP Review Pipeline Jun 8, 2026

localden added proposal SEP proposal without a sponsor. and removed draft SEP proposal with a sponsor. labels Jun 8, 2026

ppcvote mentioned this pull request Jun 24, 2026

Are external skill contributions welcome? Proposing critical-qa (self-adversarial review before "done") anthropics/skills#1328

Open

github-actions Bot mentioned this pull request Jun 29, 2026

[MCP: SEP Tracker] Week of 2026-06-29 jeffhandley/dotnet-vitals-servicing-release-manager#27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SEP: Tool Risk Metadata#2793

SEP: Tool Risk Metadata#2793
walbis wants to merge 3 commits into
modelcontextprotocol:mainfrom
walbis:sep/tool-risk-metadata

walbis commented May 26, 2026

Uh oh!

walbis commented May 26, 2026

Uh oh!

b-gutman commented Jun 8, 2026

Uh oh!

AgentGymLeader commented Jun 8, 2026

Uh oh!

ppcvote commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

walbis commented May 26, 2026