Skip to content

Commit c3ee041

Browse files
authored
chore(ai): Add check-code-attribution skill (JAVA-499) (#5449)
chore(ai): Add check-code-attribution skill (JAVA-499) Adds a check-code-attribution skill that validates license headers + THIRD_PARTY_NOTICES.md entries for code copied or adapted from third parties. Also verifies license compatiblity against Sentry's licensing policy. Focus is limited to the branch diff. Reports any issues found via PR comments (when run on CI) or to the terminal (when run locally). To run it in Claude Code: ``` /check-code-attribution ``` Runs on CI automatically via [Warden](https://warden.sentry.dev/). - Purely advisory / does not block merge. - Generates PR comments with code suggestions for all discovered issues. - Automatically manages removing stale comments as PRs are updated. Current Warden configs: ┌─────────────────┬─────────────────────────────┬───────────────────────────────────────────────────┐ │ Setting │ Value │ Effect │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ model │ anthropic/claude-sonnet-4-6 │ Model used for analysis │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ maxTurns │ 30 │ Max tool calls per chunk │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ skill │ check-code-attribution │ Per-file vendored code attribution check │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ failOn │ off │ Do not fail workflow if attribution issues found │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ reportOn │ medium │ Show findings at >= medium severity via PR comment│ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ requestChanges │ false │ Never post REQUEST_CHANGES comments on PRs │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ failCheck │ false │ No red X on workflow in GitHub UI if it fails │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ triggers │ pull_request + local │ Runs on PR open/sync and local warden invocations │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ reportOnSuccess │ false (default) │ No comment when everything is clean │ └─────────────────┴─────────────────────────────┴───────────────────────────────────────────────────┘ Going forward, we can consider blocking PRs once we've had a chance to vet behavior in the wild.
1 parent 93590c4 commit c3ee041

19 files changed

Lines changed: 1392 additions & 0 deletions

.claude/skills/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,5 @@
88
!test/**
99
!btrace-perfetto/
1010
!btrace-perfetto/**
11+
!check-code-attribution/
12+
!check-code-attribution/**

.claude/skills/check-code-attribution/SKILL.md

Lines changed: 244 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
[
2+
{
3+
"id": "header-complete-and-notice-present",
4+
"file": "HeaderCompleteAndNoticePresent.java",
5+
"expectFinding": false,
6+
"notes": "Header matches catalog entry"
7+
},
8+
{
9+
"id": "header-complete-but-notice-missing",
10+
"file": "HeaderCompleteButNoticeMissing.java",
11+
"expectFinding": true,
12+
"isolated": true,
13+
"notes": "Full header; no catalog / root NOTICES entry. Isolated: prompt-cache priming in a concurrent batch suppresses the missing-NOTICES finding below medium."
14+
},
15+
{
16+
"id": "header-missing-but-notice-present",
17+
"file": "HeaderMissingButNoticePresent.java",
18+
"expectFinding": true,
19+
"isolated": true,
20+
"notes": "NOTICES entry claims file is vendored but file has no attribution header. Isolated: a complete NOTICES entry suppresses the missing-header finding in a concurrent batch."
21+
},
22+
{
23+
"id": "header-fully-stripped",
24+
"file": "HeaderFullyStripped.java",
25+
"expectFinding": true,
26+
"notes": "Header has no required attribution fields"
27+
},
28+
{
29+
"id": "header-partially-stripped",
30+
"file": "HeaderPartiallyStripped.java",
31+
"expectFinding": true,
32+
"notes": "Adapted from + URL only; no copyright or license name"
33+
},
34+
{
35+
"id": "header-missing-non-essential-info",
36+
"file": "HeaderMissingNonEssentialInfo.java",
37+
"expectFinding": false,
38+
"notes": "All four required fields present; no license boilerplate — boilerplate is not required in the header"
39+
},
40+
{
41+
"id": "header-vs-notice-mismatch",
42+
"file": "THIRD_PARTY_NOTICES.md",
43+
"expectFinding": true,
44+
"isolated": true,
45+
"notes": "Copyright in metadata field does not match embedded license text. Isolated: mismatch finding needs an independent assertion free of interference from other NOTICES changes."
46+
},
47+
{
48+
"id": "new-license-type",
49+
"file": "NewLicenseType.java",
50+
"expectFinding": true,
51+
"notes": "AGPL v3 license in file header — absolute ban, must be removed"
52+
}
53+
]
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Attribution skill validation tests
2+
3+
Self-contained samples for validating `check-code-attribution` without touching production SDK sources.
4+
5+
6+
## Run the tests
7+
8+
```bash
9+
./check-code-attribution-tests.sh
10+
```
11+
12+
Requires Node.js and a Warden provider (see **Warden CLI** below).
13+
14+
In practice, straight command line runs tend to be a bit flakier than asking Claude Code to run the tests for you.
15+
16+
## Local development
17+
18+
### Discovering changed files
19+
20+
When running `/check-code-attribution` outside Warden, list files changed on the current branch vs the base branch, then apply the same exclusions as `ignorePaths` in `warden.toml`:
21+
22+
```bash
23+
MB=$(git merge-base HEAD origin/main 2>/dev/null || git merge-base HEAD main)
24+
git diff --name-only "${MB}"..HEAD
25+
```
26+
27+
### Warden CLI
28+
29+
Warden does **not** use Cursor auth. Before running Warden locally, configure a provider (same model family as `warden.toml`, or override with `-m`):
30+
31+
```bash
32+
# Option A: Anthropic API key (matches CI model in warden.toml)
33+
export WARDEN_ANTHROPIC_API_KEY=sk-ant-... # or: export ANTHROPIC_API_KEY=sk-ant-...
34+
35+
# Option B: Pi OAuth / API key store (~/.pi/agent/auth.json)
36+
npx pi # then run /login and pick Anthropic (or another provider)
37+
38+
# Option C: Different provider for a one-off run
39+
export WARDEN_OPENAI_API_KEY=sk-...
40+
npx @sentry/warden origin/main..HEAD --skill check-code-attribution -m openai/gpt-5.5 -vv
41+
```
42+
43+
```bash
44+
npx @sentry/warden origin/main..HEAD --skill check-code-attribution -vv
45+
```
46+
47+
## Layout
48+
49+
- `EXPECTED.json` — scenario IDs and expected outcomes (single source of truth).
50+
- `THIRD_PARTY_NOTICES.catalog.md` — NOTICES-style entries for validation class names.
51+
- `scenarios/``.java` files and `THIRD_PARTY_NOTICES.mismatch-snippet.md` (copyright-mismatch fixture).
52+
- `check-code-attribution-tests.sh` — runs Warden on a temp branch and asserts per-scenario pass/fail.
53+
- `assert-scenarios.mjs` — validation driver (`list-isolated`, `routing-set`, `assert` subcommands); parses Warden JSONL and checks outcomes from `EXPECTED.json`.
54+
55+
### assert-scenarios.mjs commands
56+
57+
```bash
58+
node assert-scenarios.mjs validate EXPECTED.json scenarios/ # pre-flight (no API); run automatically by the shell script
59+
node assert-scenarios.mjs list-isolated EXPECTED.json # id<TAB>file per isolated scenario
60+
node assert-scenarios.mjs list-main-java EXPECTED.json scenarios/ # .java files for the main Warden batch
61+
node assert-scenarios.mjs routing-set routing.json <id> <path> # update id → Warden JSONL path
62+
node assert-scenarios.mjs assert EXPECTED.json <dest-pkg> routing.json
63+
```
64+
65+
Warden runs are limited to 300s. On macOS the script uses `gtimeout` (from `brew install coreutils`) when available, otherwise GNU `timeout`, otherwise `perl` with `alarm`.
66+
67+
## Add a scenario
68+
69+
1. Add `scenarios/<UniqueClassName>.java`.
70+
2. Add or omit a catalog entry in `THIRD_PARTY_NOTICES.catalog.md`.
71+
3. Add an entry to `EXPECTED.json`.
72+
4. **Isolation (if needed):** If the scenario relies on a finding that could be suppressed by Anthropic prompt-cache priming when analyzed alongside many other files (e.g. a missing-NOTICES entry, or a missing header on a file that has a complete NOTICES entry), add `"isolated": true` to its `EXPECTED.json` entry. The test script creates a dedicated worktree for each isolated scenario automatically — no changes to the script itself are needed.
73+
74+
## Validation (maintainers)
75+
76+
Test samples live under `validation-tests/` and are excluded from normal skill runs via `.claude/**` in `warden.toml`.
77+
78+
```bash
79+
.claude/skills/check-code-attribution/validation-tests/check-code-attribution-tests.sh
80+
```
81+
82+
Expected outcomes are in `EXPECTED.json`. The script creates isolated git worktrees, runs Warden with `--report-on medium --json`, and asserts per-scenario pass/fail. Scenarios marked `"isolated": true` in `EXPECTED.json` each get their own worktree to avoid Anthropic prompt-cache priming that can suppress findings below medium in concurrent batches. Exit 0 = all pass.
83+
84+
When manually reviewing a file under `scenarios/`, search `THIRD_PARTY_NOTICES.catalog.md` in addition to root `THIRD_PARTY_NOTICES.md` (Quick triage step 2 in `SKILL.md`).
85+
86+
Non-Java fixtures required by the test script are listed in `REQUIRED_SCENARIO_FIXTURES` in `assert-scenarios.mjs`; pre-flight `validate` fails if any are missing.
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Test THIRD_PARTY_NOTICES catalog (not shipped)
2+
3+
Used only when validating `check-code-attribution` against `validation-tests/scenarios/**`.
4+
Grep this file in addition to the repository root `THIRD_PARTY_NOTICES.md`.
5+
6+
---
7+
8+
## Example — HeaderFullyStripped (MIT)
9+
10+
**Source:** https://github.com/example/attribution-fixtures<br>
11+
**License:** MIT License<br>
12+
**Copyright:** Copyright (c) 2016 Example Author
13+
14+
### Scope
15+
16+
Attribution validation sample. The code resides in `io.sentry.skills.verification.HeaderFullyStripped` (`validation-tests/scenarios/HeaderFullyStripped.java`).
17+
18+
```
19+
MIT License
20+
21+
Copyright (c) 2016 Example Author
22+
23+
Permission is hereby granted, free of charge, to any person obtaining a copy
24+
of this software and associated documentation files (the "Software"), to deal
25+
in the Software without restriction, including without limitation the rights
26+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
27+
copies of the Software, and to permit persons to whom the Software is
28+
furnished to do so, subject to the following conditions:
29+
30+
The above copyright notice and this permission notice shall be included in all
31+
copies or substantial portions of the Software.
32+
33+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
34+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
35+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
36+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
37+
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
38+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
39+
```
40+
41+
---
42+
43+
## Example — HeaderMissingButNoticePresent (Apache 2.0)
44+
45+
**Source:** https://github.com/example/notices-without-header<br>
46+
**License:** Apache License 2.0<br>
47+
**Copyright:** Copyright 2023 Example Corp.
48+
49+
### Scope
50+
51+
Attribution validation sample. The code resides in `io.sentry.skills.verification.HeaderMissingButNoticePresent`.
52+
53+
```
54+
Copyright 2023 Example Corp.
55+
56+
Licensed under the Apache License, Version 2.0 (the "License");
57+
you may not use this file except in compliance with the License.
58+
You may obtain a copy of the License at
59+
60+
http://www.apache.org/licenses/LICENSE-2.0
61+
62+
Unless required by applicable law or agreed to in writing, software
63+
distributed under the License is distributed on an "AS IS" BASIS,
64+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
65+
See the License for the specific language governing permissions and
66+
limitations under the License.
67+
```
68+
69+
---
70+
71+
## Example — HeaderMissingNonEssentialInfo (MIT)
72+
73+
**Source:** https://github.com/example/examplelib<br>
74+
**License:** MIT License<br>
75+
**Copyright:** Copyright 2020 Example Corp.
76+
77+
### Scope
78+
79+
Attribution validation sample. The code resides in `io.sentry.skills.verification.HeaderMissingNonEssentialInfo`.
80+
81+
```
82+
MIT License
83+
84+
Copyright (c) 2020 Example Corp.
85+
86+
Permission is hereby granted, free of charge, to any person obtaining a copy
87+
of this software and associated documentation files (the "Software"), to deal
88+
in the Software without restriction, including without limitation the rights
89+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
90+
copies of the Software, and to permit persons to whom the Software is
91+
furnished to do so, subject to the following conditions:
92+
93+
The above copyright notice and this permission notice shall be included in all
94+
copies or substantial portions of the Software.
95+
96+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
97+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
98+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
99+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
100+
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
101+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
102+
```
103+
104+
---
105+
106+
## Example — HeaderCompleteAndNoticePresent (Apache 2.0)
107+
108+
**Source:** https://github.com/example/something<br>
109+
**License:** Apache License 2.0<br>
110+
**Copyright:** Copyright 2020 Example Authors
111+
112+
### Scope
113+
114+
Attribution validation sample. The code resides in `io.sentry.skills.verification.HeaderCompleteAndNoticePresent`.
115+
116+
```
117+
Copyright 2020 Example Authors
118+
119+
Licensed under the Apache License, Version 2.0 (the "License");
120+
you may not use this file except in compliance with the License.
121+
You may obtain a copy of the License at
122+
123+
http://www.apache.org/licenses/LICENSE-2.0
124+
125+
Unless required by applicable law or agreed to in writing, software
126+
distributed under the License is distributed on an "AS IS" BASIS,
127+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
128+
See the License for the specific language governing permissions and
129+
limitations under the License.
130+
```

0 commit comments

Comments
 (0)