fix(ngSanitize): follow HTML parser rules for start tags / allow < in text content#8212
fix(ngSanitize): follow HTML parser rules for start tags / allow < in text content#8212caitp wants to merge 2 commits into
Conversation
… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html).
There was a problem hiding this comment.
/cc @IgorMinar PTAL --- This particular block is only here to make sure that we throw if we find an apparent start-tag without a trailing >
This might not be the right thing to do --- if we don't have a trailing >, we could potentially just treat it as a text node. I'm not sure what the best thing to do in this case is.
There was a problem hiding this comment.
I think it is better to treat as a text node. IMO the sanitizer should be secure but tolerant
There was a problem hiding this comment.
I this really a bad text string? I would let it go as a text block. For instance:
In my math project I found that a<b when b=10
There was a problem hiding this comment.
As far as HTML parsing is concerned, /</[a-zA-Z/ is the start of a tag, so we shouldn't "fix" this, I think
There was a problem hiding this comment.
Although arguably we are not trying to "parse" html here, only sanitize text that may be inadvertently parsed by a browser later
There was a problem hiding this comment.
I think that this is right. we shouldn't try to fix broken html.
|
Other than that LGTM |
There was a problem hiding this comment.
shouldn't this < be encoded just to be safe?
There was a problem hiding this comment.
It is encoded in the real world, however in the test, the chars handler just appends the value to a string
|
LGTM except for the one test where |
|
We're passing a handler to |
|
I see. Thanks for the explanation. LGTM then. |
|
I still don't think that text containing a |
|
@petebacondarwin maybe we should see how people react. I agree that it kind of sucks |
… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html). Closes #8212 Closes #8193
… text content ngSanitize will now permit opening braces in text content, provided they are not followed by either an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing spec, without taking insertion mode into account. BREAKING CHANGE Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters in a start-tag. Following this change, any opening angle bracket which is not followed by either a forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html). Closes angular#8212 Closes angular#8193
ngSanitize will now permit opening braces in text content, provided they
are not followed by either an unescaped backslash, or by an ASCII letter
(u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing
spec, without taking insertion mode into account.
BREAKING CHANGE
Previously, $sanitize would "fix" invalid markup in which a space preceded
alphanumeric characters in a start-tag. Following this change, any opening
angle bracket which is not followed by either a forward slash, or by an
ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per
the HTML parsing spec
(http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html).
Closes #8193