Conversation
| ): void { | ||
| if (this.prodParam.hasYield && word === "yield") { | ||
| this.raise(startLoc, Errors.YieldBindingIdentifier); | ||
| // All JavaScript reserved words are not longer than 10 characters. |
There was a problem hiding this comment.
The longest reserved word is instanceof and implements.
|
Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/46602/ |
|
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. Latest deployment of this branch, based on commit 84be70f:
|
| ]); | ||
|
|
||
| export function canBeReservedWord(word: string): boolean { | ||
| return reservedWordLikeSet.has(word); |
There was a problem hiding this comment.
To avoid maintaining another reserved words list, could this be word === "await" || word === "enum" || isStrictBindReservedWord(word)?
There was a problem hiding this comment.
Instead of that long list, why not do something simple as x & ReservedWord ?
There was a problem hiding this comment.
The Set#has hashes the input string and checks if the hash value is in the table (source), since the the input string is scanned only during hash key is computed and the collision is rare by definition of hash, checking presence takes 1 scan of input and O(1) of the set size.
The StringEquals first checks the length, then the memory address and then fallbacks to byte-by-byte comparison. (source) If we check word === "await" and then isStrictBindReservedWord(word), the word is read for 1.1 times (average on the universe distribution of string length), 1 for the hash computation and 1/10 for comparison with "await". word === "await" also introduces extra branch conditions.
To avoid maintaining another reserved words list
I agree it is not ideal to duplicate keywords, WDYT if I add canBeReservedWord to helper-validator-identifier in Babel 7.15? Since that will be a feature, I can draft a subseqent PR that moves canBeReservedWord to the validator package based on this one so we don't have to wait until 7.15.
why not do something simple as x & ReservedWord ?
@KFlash Can you elaborate? How can & achieves searching here?
There was a problem hiding this comment.
👍 for deferring it to 7.15
There was a problem hiding this comment.
@JLHwung It's possible if you make things complex and add some bitwise masks behind each reserved word token.
There was a problem hiding this comment.
Can you verify whether using a prototye-less object is faster? My intuition is that Sets are algorithmically fast, but the actual implementation in V8 is slow.
There was a problem hiding this comment.
@jridgewell It's faster because I "pack" all reserved words into a set of bitwise masks. It also consumes less memory. You are only dealing with series of number. see Kataw for an example. Same you can do with tokens, keywords, and everything you need. No need for a lookup.
Witch one is fastest? Set and lookup or this one
(token & ReserwedWord) === Reservedword // return true if reserved word
or a switch statement with 60 cases and 100lines of code to get all expression nodes or this one?
(token & IsExpressionNode) === IsExpressionNode // return true if expression node
It's very obvious what's fastest, but I think it's out of scope for Babel
There was a problem hiding this comment.
@jridgewell I come up with a benchmark file comparing these three approaches: Set#has, Object brand checks and (switch length + string equals).
The benchmark result on Node.js 16.1.0:
object brand checks different string: 2059273 ops/sec ±0.42% (0ms)
object brand checks constant string: 31712533 ops/sec ±0.22% (0ms)
set#has different string: 17756450 ops/sec ±1.26% (0ms)
set#has constant string: 38050433 ops/sec ±1.54% (0ms)
string#equals different string: 13789367 ops/sec ±0.44% (0ms)
string#equals constant string: 709703804 ops/sec ±0.42% (0ms)
On different string cases, where each input is in different memory address, Set#has is 700% faster than prototype-less object brand check. On constant string cases, Set#has is 20% faster than prototype-less object brand check.
On different string cases, Set#has is 20% faster than (switch length + string equals). The latter significantly outperform Set#has when the input is a constant string, which is expected because V8 optimizes out the per-character equality branch of StringEquals. However in Babel parser, each identifier name have different address.
@KFlash Yes we can do a single bitwise check if we replace the data structure of token storage and create different token for different keywords. However Babel currently exposes its token storage via options.tokens so we have to maintain compatibility here.
| } | ||
| // Most identifiers are not reservedWord-like, they don't need special | ||
| // treatments afterward, which very likely ends up throwing errors | ||
| if (!canBeReservedWord(word)) { |
There was a problem hiding this comment.
Or maybe we can move this check after the if yield/await/arguments checks, and just use isStrictBindReservedWord? Intuitively it shouldn't be slower, but I didn't check.
This PR aims for improving the performance of
checkReservedWordin common scenarios. It is based on the assumption that even words likeenum,protectedcould be a valid identifier name, such usage is rare. So in order to improve performance, we offer fast path for common names, and fallback to the slow path when it will be likely to throw an error.This PR also improves the parse scope utility functions: They are computationally equivalent and reflect my early work on tackling this issue.
Benchmark results
The baseline is c6bebfe, current is this PR.
Length-11 identifiers
Length-2 identifiers
Length-1 identifiers
await as identifier
As is expected, current is slower than the baseline when parsing valid
awaitas identifier, because of the extra code path.