Skip to content

Add dangerous Unicode character detection (Trojan Source / glassworm)#4344

Merged
andimarek merged 1 commit intomasterfrom
protect/unicode-safety-checks
Mar 19, 2026
Merged

Add dangerous Unicode character detection (Trojan Source / glassworm)#4344
andimarek merged 1 commit intomasterfrom
protect/unicode-safety-checks

Conversation

@andimarek
Copy link
Copy Markdown
Member

Summary

  • Adds Unicode safety checks to both the pre-commit hook and the CI workflow, following the same pattern as the existing Windows filename and file size checks
  • Detects invisible and rendering-altering characters used in Trojan Source (BiDi override) and glassworm-style attacks
  • Blocked categories:
    • C0/C1 control characters (U+0000-001F, U+007F-009F, except TAB/LF/CR)
    • Zero-width characters (U+200B-200D, U+FEFF)
    • BiDi override/embedding/isolate (U+202A-202E, U+2066-2069)
  • Uses perl instead of grep -P for macOS compatibility
  • Binary files are skipped via file --mime-type
  • Variation selectors (U+FE00-FE0F) intentionally excluded — they appear in legitimate emoji sequences (e.g. ❤️ in test fixtures) and Java identifiers cannot contain them

Test plan

  • Verified detection of BiDi override character (U+202E) in test file
  • Verified detection of zero-width space (U+200B) in test file
  • Verified clean files produce no output
  • Full repo scan: zero false positives across all tracked files
  • Binary files (.class, .jar, images) are correctly skipped

🤖 Generated with Claude Code

Detect invisible and rendering-altering Unicode characters that can be
used for Trojan Source (BiDi override) and glassworm-style attacks.

Blocked categories: C0/C1 control characters (except TAB/LF/CR),
zero-width characters (U+200B-200D, U+FEFF), and BiDi override/isolate
characters (U+202A-202E, U+2066-2069). Uses perl for macOS portability
(grep -P is unavailable on macOS). Binary files are skipped automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Test Report

Test Results

Java Version Total Passed Failed Errors Skipped
Java 11 5708 (±0) 5652 (±0) 0 (±0) 0 (±0) 56 (±0)
Java 17 5708 (±0) 5651 (±0) 0 (±0) 0 (±0) 57 (±0)
Java 21 5708 (±0) 5651 (±0) 0 (±0) 0 (±0) 57 (±0)
Java 25 5708 (±0) 5651 (±0) 0 (±0) 0 (±0) 57 (±0)
jcstress 32 (±0) 32 (±0) 0 (±0) 0 (±0) 0 (±0)
Total 22864 (±0) 22637 (±0) 0 (±0) 0 (±0) 227 (±0)

Code Coverage (Java 25)

Metric Covered Missed Coverage vs Master
Lines 28775 3122 90.2% ±0.0%
Branches 8354 1508 84.7% ±0.0%
Methods 7699 1223 86.3% ±0.0%

No per-class coverage changes detected.

Full HTML report: build artifact jacoco-html-report

Updated: 2026-03-18 23:40:27 UTC

@andimarek andimarek merged commit 58aaa22 into master Mar 19, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants