docs(security-ig): shared tool-definition drift (rug-pull) test corpus by eeee2345 · Pull Request #2924 · modelcontextprotocol/modelcontextprotocol

eeee2345 · 2026-06-16T06:31:15Z

Adds a small, labeled test corpus for post-approval tool-definition drift (the rug-pull case: a server passes admission, then changes a tool on a later session). It came out of a discussion in #security-ig.

It covers two complementary drift signals, each labeled with verdicts from a real engine (not asserted):

content-injection: schema and annotations stay identical, the attack is smuggled into the description text. Verified against the open ATR engine: 4/4 malicious fire, 3/3 benign quiet.
capability-surface: the declared surface escalates after approval (annotations readOnly -> destructive, declared effects, data-access, external-reach, auth-scope). Verified against the Interlock drift engine: 6/6 malicious fire, 5/5 benign quiet, plus 3 undeclared/hidden cases surfaced for review rather than auto-blocked.

Each case is baseline -> twin with an expected verdict, so a detector can be pointed at it and checked: does it catch the malicious change without firing on benign evolution? The benign controls are the point.

Capability-surface cases contributed by Maaz (Interlock); content-injection by me (ATR). Labels are real engine output, not assertions.

On location: I couldn't find an existing home for security test corpora in the repo, so I put this under docs/community/security-ig/ as Interest Group material. Not attached to it living here, happy to move it wherever fits, flagging @pcarleton on placement.

AI disclosure (per CONTRIBUTING): I used AI assistance (Claude Code) to draft this PR and assemble the corpus, and I reviewed it. The detection labels are real engine output, not generated: the content-injection half was run through the ATR engine, the capability-surface half through Interlock's. I understand the contents and can speak to any individual case.

…orpus Two complementary drift signals, content-injection and capability-surface, each labeled with real engine verdicts. Came out of #security-ig discussion. Capability-surface cases contributed by Maaz (Interlock). Signed-off-by: Adam Lin <adam@agentthreatrule.org>

eeee2345 requested review from a team as code owners June 16, 2026 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(security-ig): shared tool-definition drift (rug-pull) test corpus#2924

docs(security-ig): shared tool-definition drift (rug-pull) test corpus#2924
eeee2345 wants to merge 1 commit into
modelcontextprotocol:mainfrom
eeee2345:docs/security-ig-tool-drift-corpus

eeee2345 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eeee2345 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant