Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
url: ignore IDN errors when domainname have hyphens
There are valid domain names with hyphens at 3 and 4th position, new
node WHATWG URL parser was failing for it assume its an invalid IDN.
Also filters IDN errors when domain label start or end with hyphen.

Fixes: #12965
Refs: https://www.icann.org/news/announcement-2000-01-07-en
  • Loading branch information
zimbabao committed May 19, 2017
commit 9dee867d73ba23237232b376213d9c69d1351618
29 changes: 29 additions & 0 deletions src/node_i18n.cc
Original file line number Diff line number Diff line change
Expand Up @@ -468,6 +468,21 @@ int32_t ToUnicode(MaybeStackBuffer<char>* buf,
// #46. Therefore, explicitly filters out that error here.
info.errors &= ~UIDNA_ERROR_EMPTY_LABEL;

// These error conditions are mandated unconditionally by UTS #46 version
// 9.0.0 (rev. 17), but were found to be incompatible with many actual domain
// names in the wild. As such, in the current UTS #46 draft (rev. 18) these
// checks are made optional depending on the CheckHyphens flag, which will be
// disabled in WHATWG URL's "domain to ASCII" algorithm as soon as the UTS #46
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain to Unicode

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. done all changes.

// draft becomes standard.
// Refs:
// - https://github.com/whatwg/url/issues/53
// - http://www.unicode.org/review/pri317/
// - http://www.unicode.org/reports/tr46/tr46-18.html
// - https://www.icann.org/news/announcement-2000-01-07-en
info.errors &= ~UIDNA_ERROR_HYPHEN_3_4;
info.errors &= ~UIDNA_ERROR_LEADING_HYPHEN;
info.errors &= ~UIDNA_ERROR_TRAILING_HYPHEN;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I was mistaken in my understanding of ToUnicode, which means that the entire block (ll. 464-485) shouldn't be necessary. While at it, can you delete this block, the erroneous || (!lenient && info.errors != 0) condition below, as well as the now-unnecessary lenient parameter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (U_FAILURE(status) || (!lenient && info.errors != 0)) {
len = -1;
buf->SetLength(0);
Expand Down Expand Up @@ -516,7 +531,21 @@ int32_t ToASCII(MaybeStackBuffer<char>* buf,
info.errors &= ~UIDNA_ERROR_EMPTY_LABEL;
info.errors &= ~UIDNA_ERROR_LABEL_TOO_LONG;
info.errors &= ~UIDNA_ERROR_DOMAIN_NAME_TOO_LONG;

// These error conditions are mandated unconditionally by UTS #46 version
// 9.0.0 (rev. 17), but were found to be incompatible with many actual domain
// names in the wild. As such, in the current UTS #46 draft (rev. 18) these
// checks are made optional depending on the CheckHyphens flag, which will be
// disabled in WHATWG URL's "domain to ASCII" algorithm as soon as the UTS #46
// draft becomes standard.
// Refs:
// - https://github.com/whatwg/url/issues/53
// - http://www.unicode.org/review/pri317/
// - http://www.unicode.org/reports/tr46/tr46-18.html
// - https://www.icann.org/news/announcement-2000-01-07-en
info.errors &= ~UIDNA_ERROR_HYPHEN_3_4;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the following code, which filters out several more conditions mandated by the UTS#46 draft, and adds an explicit comment explaining why this is:

// These error conditions are mandated unconditionally by UTS #46 version
// 9.0.0 (rev. 17), but were found to be incompatible with many actual domain
// names in the wild. As such, in the current UTS #46 draft (rev. 18) these
// checks are made optional depending on the CheckHyphens flag, which will be
// disabled in WHATWG URL's "domain to ASCII" algorithm as soon as the UTS #46
// draft becomes standard.
// Refs:
// - https://github.com/whatwg/url/issues/53
// - http://www.unicode.org/review/pri317/
// - http://www.unicode.org/reports/tr46/tr46-18.html
info.errors &= ~UIDNA_ERROR_HYPHEN_3_4;
info.errors &= ~UIDNA_ERROR_LEADING_HYPHEN;
info.errors &= ~UIDNA_ERROR_TRAILING_HYPHEN;

I also suspect that these lines of code need to be duplicated for the ToUnicode method in the file, since there is a set of identical changes to ToUnicode in the UTS#46 draft.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Will do.
I checked those too. But was not able to find any real world example of those.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimothyGu : Made changes as suggested. Filtering errors for trailing and leading hyphens.

info.errors &= ~UIDNA_ERROR_LEADING_HYPHEN;
info.errors &= ~UIDNA_ERROR_TRAILING_HYPHEN;

if (U_FAILURE(status) || (!lenient && info.errors != 0)) {
len = -1;
Expand Down
13 changes: 13 additions & 0 deletions test/fixtures/url-idna.js
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,19 @@ module.exports = {
{
ascii: `${`${'a'.repeat(64)}.`.repeat(4)}com`,
unicode: `${`${'a'.repeat(64)}.`.repeat(4)}com`
},
// URLs with hyphen
{
ascii: 'r4---sn-a5mlrn7s.gevideo.com',
unicode: 'r4---sn-a5mlrn7s.gevideo.com'
},
{
ascii: '-sn-a5mlrn7s.gevideo.com',
unicode: '-sn-a5mlrn7s.gevideo.com'
},
{
ascii: 'sn-a5mlrn7s-.gevideo.com',
unicode: 'sn-a5mlrn7s-.gevideo.com'
}
],
invalid: [
Expand Down
10 changes: 0 additions & 10 deletions test/parallel/test-url-parse-format.js
Original file line number Diff line number Diff line change
Expand Up @@ -487,16 +487,6 @@ const parseTests = {
path: '/bar'
},

'https://r4---sn-a5mlrn7s.gevideo.com/bar': {
href: 'https://r4---sn-a5mlrn7s.gevideo.com/bar',
host: 'r4---sn-a5mlrn7s.gevideo.com',
hostname: 'r4---sn-a5mlrn7s.gevideo.com',
protocol: 'https:',
slashes: true,
pathname: '/bar',
path: '/bar'
},

// IDNA tests
'http://www.日本語.com/': {
href: 'http://www.xn--wgv71a119e.com/',
Expand Down