Unicode security helpers for deceptive text and URL checks.
This module is intentionally lightweight so it can be imported in display and approval paths without affecting startup performance.
Argument key names that likely contain URLs and should be safety-checked.
Detect deceptive or hidden Unicode code points in text.
Remove known dangerous/invisible Unicode characters from text.
Neutralize control characters and deceptive Unicode in untrusted text.
Untrusted strings (MCP server errors, config-file contents, tool output)
can carry ANSI escape sequences, other control characters, or invisible
Unicode that corrupts the terminal, breaks out of a layout, or injects fake
lines into logs and prompts. This first removes the invisible/bidi code
points flagged by strip_dangerous_unicode, then replaces every remaining
Unicode "Other" (control/format) character with a space.
Render hidden Unicode characters as explicit markers.
Example output: abc<U+202E RIGHT-TO-LEFT OVERRIDE>def.
Summarize Unicode issues for warning messages.
Deduplicates by code point. When more than max_items unique entries exist,
the summary is truncated with a +N more entries suffix.
Join safety warnings into a display string with overflow indicator.
Check a URL for suspicious Unicode and domain spoofing patterns.
Flatten nested dict/list structures into key-path/string pairs.
Return whether a key path suggests URL-like content.
A dangerous Unicode character found in text.
Safety analysis output for a URL string.
A result may have safe=True with non-empty warnings when
informational warnings (e.g. punycode decoding) are present without
suspicious patterns.