Skip to content

path validation: char is not signed by default.#4805

Merged
ethomson merged 1 commit intomasterfrom
signed_char
Sep 18, 2018
Merged

path validation: char is not signed by default.#4805
ethomson merged 1 commit intomasterfrom
signed_char

Conversation

@ethomson
Copy link
Copy Markdown
Member

ARM treats its char type as unsigned type by default; as a result, testing a char value as being < 0 is always false. This is a warning on ARM, which is promoted to an error given our use of -Werror.

Per ISO 9899:199, section "6.2.5 Types":

The three types char, signed char, and unsigned char are collectively
called the character types. The implementation shall define char to
have the same range, representation, and behavior as either signed
char or unsigned char.

...

Irrespective of the choice made, char is a separate type from the other
two and is not compatible with either.

ARM treats its `char` type as `unsigned type` by default; as a result,
testing a `char` value as being `< 0` is always false.  This is a
warning on ARM, which is promoted to an error given our use of
`-Werror`.

Per ISO 9899:199, section "6.2.5 Types":

> The three types char, signed char, and unsigned char are collectively
> called the character types. The implementation shall define char to
> have the same range, representation, and behavior as either signed
> char or unsigned char.
>
...

> Irrespective of the choice made, char is a separate type from the other
> two and is not compatible with either.
@ethomson
Copy link
Copy Markdown
Member Author

Note that this whole test seems rather pointless anyway, since the test immediately following it will return true for any instance where this test also returns true. Is the thinking that the tolower test is more expensive and we could skip it for bytes outside the ASCII range? If so, I would argue that most filenames are mostly ASCII so we've added a test to this loop that will rarely actually catch anything, making the loop more expensive overall.

So my inclination is to actually remove this test entirely.

@ethomson
Copy link
Copy Markdown
Member Author

/cc @carlosmn in case I'm missing something.

@carlosmn
Copy link
Copy Markdown
Member

One place where we'd go above 127 is for the special chars in ASCII, like accented vowels and other "European" characters. I think we probably only ever get UTF-8 here, but since we're looking at individual bytes, we might get one with a continuation here. The code in git says

			/*
			 * We know our needles contain only ASCII, so we clamp
			 * here to make the results of tolower() sane.
			 */

which I guess makes sense, maybe, though I'm not sure what tolower() would return that would cause the next case to also not be false. I guess on Windows your active old-timey codepage might affect it?

Other than that, I guess we'd avoid hitting tolower more often if our input contains combining characters and stuff that goes into the bit of UTF-8 made to fit past value 127. Non-roman filenames would hit this fairly often I would imagine.

@ethomson
Copy link
Copy Markdown
Member Author

OK, I think the bigger issue was that I wanted to make sure that I had understood the problem that you were trying to solve. I think that you're right that we shouldn't try to divine what tolower might do on all the platforms. You OK with this as-is?

@carlosmn
Copy link
Copy Markdown
Member

I think I would slightly favour (signed char)name[i] < -1 to keep the -1 but it's fine either way.

@ethomson
Copy link
Copy Markdown
Member Author

Personally, I think that the > 127 is more readable, but putting that preference aside: I've actually already cherry-picked this into a different branch, so without a strong reason not to, I'm going to keep this as-is to prevent having to redo that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants