SEP: Tool Risk Metadata#2793
Open
walbis wants to merge 3 commits into
Open
Conversation
Extend ToolAnnotations with optional graded risk metadata (riskLevel, category, blastRadius, reversibility, sideEffects, approvalRecommendation, minTrustLevel) so MCP clients can make consistent allowlist + approval decisions across tools without each consumer rebuilding a bespoke per-tool catalogue. Pure addition; fully backward compatible. Reference impl: walbis/karai config/tool_policies.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename seps/9999-tool-risk-metadata.md -> seps/2793-tool-risk-metadata.md to match the allocated PR/SEP number. Updates the title heading, SEP Number cell, Issue link, and PR link in the metadata table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reference implementation is no longer hypothetical — heuristic classifier + CLI + 19 tests are shipping at https://github.com/walbis/mcp-risk-inferrer. - Prototype row now lists the inferrer alongside the KARAI catalogue. - Inference paragraph updates the heuristic description to match the implementation (scope-hint params for blastRadius, four risky-flag classes) and downgrades the LLM augmenter to "planned". - Reference-implementations section gives the inferrer a real bullet with status and v0.2/v0.3 roadmap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
|
Reference implementation now exists — pushed v0.1 of the inferrer as a follow-up commit on this PR's branch and as a standalone repo:
Goal is that consumers can bootstrap risk metadata for any existing MCP server today, without waiting for every server to declare the new fields. Happy to iterate the vocabulary mapping if the spec moves during review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SEP: Tool Risk Metadata
This SEP extends
ToolAnnotationswith structured, machine-readable riskmetadata so MCP clients can make consistent allowlist / approval decisions
across tools without each consumer rebuilding a bespoke per-tool catalogue.
What it adds (all optional, additive)
ToolAnnotationsalready carriesreadOnlyHint,destructiveHint,idempotentHint,openWorldHint— yes/no hints. This proposal addsgraded fields:
riskLevel: "low" | "medium" | "high" | "critical"category: "read" | "observe" | "mutate" | "delete" | "destroy" | "utility"blastRadius: "item" | "namespace" | "cluster" | "organization" | "global"reversibility: "auto" | "manual" | "none"sideEffects: string[](open vocab)approvalRecommendation: "none" | "single" | "multi"minTrustLevel: number(1–5 advisory scale)Pure addition; no breaking change. Servers and clients that don't know
these fields work exactly as today.
Motivation
Every MCP-consuming agent platform reinvents the same vocabulary. KARAI's
config/tool_policies.yaml(~30 K8s tools) is one example; Claude Desktop,Cursor, Cline, Continue, OpenDevin ship analogous configs.
destructiveHintgives a binary yes/no, but a tool deleting one row and a tool dropping a
whole namespace get the same flag.
Reference implementation
https://github.com/walbis/karai/blob/master/config/tool_policies.yaml
mcp-risk-inferrerOSS service that derives these values forservers that haven't declared them yet (verb + schema heuristic + optional
LLM augment).
File
seps/9999-tool-risk-metadata.md— happy to rename to the allocated SEPnumber on request.
Looking for sponsor + feedback on
category/blastRadius/reversibility.minTrustLevelshould be numeric or string-with-convention.follow-up SEP.