I have read check documentation: https://checkstyle.org/checks/coding/illegaltokentext.html
I have downloaded the latest cli from: https://checkstyle.org/cmdline.html#Download_and_Run
I have executed the cli and showed it below, as cli describes the problem better than 1,000 words
How it works Now:
vivek@Viveks-MacBook-Air checkstyle % cat > /tmp/TestEscapes.java << 'EOF'
class TestEscapes {
void test() {
// These have NO escape sequences, should NOT be violations (FALSE POSITIVES)
final String r1 = "\u000b";
final String r2 = "\u001c";
final String r3 = "\u001D";
final String r4 = "\u1680";
final String r5 = "\u2000";
final String r6 = "\u3000";
// These SHOULD be violations (have valid escape replacements)
final String a1 = "\u0008"; // Should use \b
final String a2 = "\u0009"; // Should use \t
final String a3 = "\u0020"; // Should use \s
}
}
EOF
vivek@Viveks-MacBook-Air checkstyle % RUN_LOCALE="-Duser.language=en -Duser.country=US"
java $RUN_LOCALE -jar target/checkstyle-*-all.jar -c src/main/resources/google_checks.xml /tmp/TestEscapes.java
Starting audit...
[WARN] /tmp/TestEscapes.java:4:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:5:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:6:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:7:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:8:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:9:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:13:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
Audit done.
Is your feature request related to a problem? Please describe.
The IllegalTokenText check in google_checks.xml reports false positives for Unicode characters that have no corresponding escape sequence in Java.
Per JLS §3.10.7, the only valid escape sequences are:
EscapeSequence:
\ b (backspace BS, Unicode \u0008)
\ s (space SP, Unicode \u0020)
\ t (horizontal tab HT, Unicode \u0009)
\ n (linefeed LF, Unicode \u000a)
\ f (form feed FF, Unicode \u000c)
\ r (carriage return CR, Unicode \u000d)
\ [LineTerminator](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-LineTerminator) (line continuation, no Unicode representation)
\ " (double quote ", Unicode \u0022)
\ ' (single quote ', Unicode \u0027)
\ \ (backslash \, Unicode \u005c)
[OctalEscape](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalEscape) (octal value, Unicode \u0000 to \u00ff)
OctalEscape:
\ [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit)
\ [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit) [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit)
\ [ZeroToThree](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-ZeroToThree) [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit) [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit)
OctalDigit:
(one of)
0 1 2 3 4 5 6 7
ZeroToThree:
(one of)
0 1 2 3
Characters like \u000b (vertical tab), \u2000, \u3000 etc. have NO escape sequences in Java. Flagging them tells users to use an escape that doesn't exist.
Additionally, \u0008 (backspace -> \b) and \u0020 (space -> \s) are NOT currently flagged, but they SHOULD be.
Describe the solution you'd like
Update the regex pattern in google_checks.xml from:
\\u00(09|0(a|A)|0(b|B)|(0|1)(c|C)|(0|1)(d|D)|1(d|D)|1(e|E)|1(f|F)|22|27|5(C|c))|\\u1680|\\u3000|\\u20(00|0(a|A)|28|29|(2|5)(f|F))|\\(0(10|11|12|14|15|40|42|47)|134)
To:
\\u00(08|09|0(a|A)|0(c|C)|0(d|D)|20|22|27|5(C|c))|\\(0(10|11|12|14|15|40|42|47)|134)
Expected CLI output after fix:
Starting audit...
[WARN] TestEscapes.java:12:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] TestEscapes.java:13:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] TestEscapes.java:14:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
Audit done.
Only \u0008, \u0009, and \u0020 are flagged (lines 12-14), because they have valid escape replacements (\b, \t, \s).
I have read check documentation: https://checkstyle.org/checks/coding/illegaltokentext.html
I have downloaded the latest cli from: https://checkstyle.org/cmdline.html#Download_and_Run
I have executed the cli and showed it below, as cli describes the problem better than 1,000 words
How it works Now:
Is your feature request related to a problem? Please describe.
The IllegalTokenText check in google_checks.xml reports false positives for Unicode characters that have no corresponding escape sequence in Java.
Per JLS §3.10.7, the only valid escape sequences are:
Characters like \u000b (vertical tab), \u2000, \u3000 etc. have NO escape sequences in Java. Flagging them tells users to use an escape that doesn't exist.
Additionally, \u0008 (backspace -> \b) and \u0020 (space -> \s) are NOT currently flagged, but they SHOULD be.
Describe the solution you'd like
Update the regex pattern in google_checks.xml from:
\\u00(09|0(a|A)|0(b|B)|(0|1)(c|C)|(0|1)(d|D)|1(d|D)|1(e|E)|1(f|F)|22|27|5(C|c))|\\u1680|\\u3000|\\u20(00|0(a|A)|28|29|(2|5)(f|F))|\\(0(10|11|12|14|15|40|42|47)|134)To:
\\u00(08|09|0(a|A)|0(c|C)|0(d|D)|20|22|27|5(C|c))|\\(0(10|11|12|14|15|40|42|47)|134)Expected CLI output after fix:
Only \u0008, \u0009, and \u0020 are flagged (lines 12-14), because they have valid escape replacements (\b, \t, \s).