Skip to content

IllegalTokenText reports false positives for Unicode whitespace characters without escape sequences #18790

@vivek-0509

Description

@vivek-0509

I have read check documentation: https://checkstyle.org/checks/coding/illegaltokentext.html
I have downloaded the latest cli from: https://checkstyle.org/cmdline.html#Download_and_Run
I have executed the cli and showed it below, as cli describes the problem better than 1,000 words

How it works Now:

vivek@Viveks-MacBook-Air checkstyle % cat > /tmp/TestEscapes.java << 'EOF'
class TestEscapes {
  void test() {
    // These have NO escape sequences, should NOT be violations (FALSE POSITIVES)
    final String r1 = "\u000b"; 
    final String r2 = "\u001c"; 
    final String r3 = "\u001D"; 
    final String r4 = "\u1680"; 
    final String r5 = "\u2000"; 
    final String r6 = "\u3000"; 

    // These SHOULD be violations (have valid escape replacements)
    final String a1 = "\u0008"; // Should use \b
    final String a2 = "\u0009"; // Should use \t
    final String a3 = "\u0020"; // Should use \s
  }
}
EOF

vivek@Viveks-MacBook-Air checkstyle % RUN_LOCALE="-Duser.language=en -Duser.country=US"
java $RUN_LOCALE -jar target/checkstyle-*-all.jar -c src/main/resources/google_checks.xml /tmp/TestEscapes.java
Starting audit...
[WARN] /tmp/TestEscapes.java:4:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:5:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:6:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:7:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:8:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:9:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:13:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
Audit done.

Is your feature request related to a problem? Please describe.

The IllegalTokenText check in google_checks.xml reports false positives for Unicode characters that have no corresponding escape sequence in Java.

Per JLS §3.10.7, the only valid escape sequences are:

EscapeSequence:
\ b (backspace BS, Unicode \u0008)
\ s (space SP, Unicode \u0020)
\ t (horizontal tab HT, Unicode \u0009)
\ n (linefeed LF, Unicode \u000a)
\ f (form feed FF, Unicode \u000c)
\ r (carriage return CR, Unicode \u000d)
\ [LineTerminator](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-LineTerminator) (line continuation, no Unicode representation)
\ " (double quote ", Unicode \u0022)
\ ' (single quote ', Unicode \u0027)
\ \ (backslash \, Unicode \u005c)
[OctalEscape](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalEscape) (octal value, Unicode \u0000 to \u00ff)
OctalEscape:
\ [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit)
\ [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit) [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit)
\ [ZeroToThree](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-ZeroToThree) [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit) [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit)
OctalDigit:
(one of)
0 1 2 3 4 5 6 7
ZeroToThree:
(one of)
0 1 2 3

Characters like \u000b (vertical tab), \u2000, \u3000 etc. have NO escape sequences in Java. Flagging them tells users to use an escape that doesn't exist.
Additionally, \u0008 (backspace -> \b) and \u0020 (space -> \s) are NOT currently flagged, but they SHOULD be.

Describe the solution you'd like

Update the regex pattern in google_checks.xml from:
\\u00(09|0(a|A)|0(b|B)|(0|1)(c|C)|(0|1)(d|D)|1(d|D)|1(e|E)|1(f|F)|22|27|5(C|c))|\\u1680|\\u3000|\\u20(00|0(a|A)|28|29|(2|5)(f|F))|\\(0(10|11|12|14|15|40|42|47)|134)

To:
\\u00(08|09|0(a|A)|0(c|C)|0(d|D)|20|22|27|5(C|c))|\\(0(10|11|12|14|15|40|42|47)|134)

Expected CLI output after fix:

Starting audit...
[WARN] TestEscapes.java:12:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] TestEscapes.java:13:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] TestEscapes.java:14:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
Audit done.

Only \u0008, \u0009, and \u0020 are flagged (lines 12-14), because they have valid escape replacements (\b, \t, \s).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions