Simplify or optimize regexes with polynomial time worst cases by weswigham · Pull Request #44197 · microsoft/TypeScript

weswigham · 2021-05-21T08:23:26Z

A friend of mine, @robmcl4, is a PhD student as UCSB and is doing some security research involving regex performance. In the course of that, he's developed a kind of fuzzer for v8's regex engine that's pretty good at identifying regexes that have polynomial or worse worst-case time in specifically v8's regex implementation. In doing said research, he happened to run it over the regexes in our codebase (along with many other codebases).

In this PR, I fix all the regexes said fuzzer indicates may be problematic so they are more performant. (I checked any replacements don't also do very poorly using said fuzzer).

Most of the time, fixing the perf just means using less regexes, however one particular example stands out! A regex we use a ton in multiple places, /\s+$/g, actually has terrible performance when used to trim whitespace from a string with a lot of leading whitespace (check the benchmark here). Lines having leading whitespace is pretty common, so I'm hoping these fixes might actually have some small but measurable real-world perf gains.

weswigham · 2021-05-21T08:26:11Z

@typescript-bot perf test this

typescript-bot · 2021-05-21T08:26:14Z

Heya @weswigham, I've started to run the perf test suite on this PR at bbb2dee. You can monitor the build here.

Update: The results are in!

typescript-bot · 2021-05-21T09:01:34Z

@weswigham
The results of the perf run you requested are in!

Here they are:

Comparison Report - master..44197

_Metric	_master	₄₄₁₉₇	_Delta	_Best	_Worst
_{Angular - node (v10.16.3, x64)}
_{Memory used}	_{345,166k (± 0.03%)}	_{345,175k (± 0.02%)}	_{+10k (+ 0.00%)}	_344,996k	_345,257k
_{Parse Time}	_{1.92s (± 0.35%)}	_{1.91s (± 0.58%)}	_{-0.01s (- 0.37%)}	_1.88s	_1.93s
_{Bind Time}	_{0.84s (± 0.62%)}	_{0.83s (± 0.67%)}	_{-0.01s (- 0.83%)}	_0.82s	_0.84s
_{Check Time}	_{5.26s (± 0.38%)}	_{5.28s (± 0.55%)}	_{+0.02s (+ 0.34%)}	_5.22s	_5.36s
_{Emit Time}	_{5.52s (± 0.58%)}	_{5.50s (± 0.47%)}	_{-0.02s (- 0.36%)}	_5.45s	_5.57s
_{Total Time}	_{13.53s (± 0.32%)}	_{13.52s (± 0.28%)}	_{-0.02s (- 0.13%)}	_13.43s	_13.60s
_{Compiler-Unions - node (v10.16.3, x64)}
_{Memory used}	_{200,337k (± 0.05%)}	_{200,375k (± 0.05%)}	_{+38k (+ 0.02%)}	_200,171k	_200,655k
_{Parse Time}	_{0.78s (± 0.97%)}	_{0.78s (± 0.61%)}	_{-0.00s (- 0.25%)}	_0.77s	_0.79s
_{Bind Time}	_{0.53s (± 1.55%)}	_{0.53s (± 1.14%)}	_{-0.00s (- 0.38%)}	_0.51s	_0.53s
_{Check Time}	_{7.54s (± 0.69%)}	_{7.54s (± 0.69%)}	_{-0.00s (- 0.04%)}	_7.42s	_7.63s
_{Emit Time}	_{2.26s (± 2.03%)}	_{2.24s (± 0.57%)}	_{-0.01s (- 0.62%)}	_2.21s	_2.27s
_{Total Time}	_{11.11s (± 0.76%)}	_{11.09s (± 0.53%)}	_{-0.02s (- 0.19%)}	_10.96s	_11.20s
_{Monaco - node (v10.16.3, x64)}
_{Memory used}	_{341,680k (± 0.01%)}	_{341,708k (± 0.02%)}	_{+28k (+ 0.01%)}	_341,543k	_341,811k
_{Parse Time}	_{1.56s (± 0.63%)}	_{1.55s (± 0.44%)}	_{-0.00s (- 0.26%)}	_1.54s	_1.57s
_{Bind Time}	_{0.75s (± 0.92%)}	_{0.74s (± 0.70%)}	_{-0.00s (- 0.54%)}	_0.73s	_0.75s
_{Check Time}	_{5.39s (± 0.40%)}	_{5.39s (± 0.37%)}	_{+0.00s (+ 0.02%)}	_5.33s	_5.43s
_{Emit Time}	_{2.97s (± 0.68%)}	_{2.99s (± 0.80%)}	_{+0.02s (+ 0.61%)}	_2.93s	_3.04s
_{Total Time}	_{10.66s (± 0.31%)}	_{10.67s (± 0.37%)}	_{+0.01s (+ 0.11%)}	_10.56s	_10.75s
_{TFS - node (v10.16.3, x64)}
_{Memory used}	_{304,273k (± 0.04%)}	_{304,228k (± 0.02%)}	_{-45k (- 0.01%)}	_304,094k	_304,289k
_{Parse Time}	_{1.21s (± 0.62%)}	_{1.21s (± 0.46%)}	_{-0.00s (- 0.16%)}	_1.20s	_1.22s
_{Bind Time}	_{0.70s (± 0.85%)}	_{0.70s (± 0.83%)}	_{-0.00s (- 0.28%)}	_0.69s	_0.72s
_{Check Time}	_{4.87s (± 0.61%)}	_{4.85s (± 0.46%)}	_{-0.02s (- 0.43%)}	_4.81s	_4.91s
_{Emit Time}	_{3.12s (± 2.05%)}	_{3.12s (± 1.33%)}	_{-0.00s (- 0.03%)}	_3.05s	_3.21s
_{Total Time}	_{9.91s (± 0.71%)}	_{9.88s (± 0.61%)}	_{-0.02s (- 0.23%)}	_9.78s	_10.03s
_{material-ui - node (v10.16.3, x64)}
_{Memory used}	_{474,127k (± 0.01%)}	_{474,101k (± 0.01%)}	_{-25k (- 0.01%)}	_473,969k	_474,284k
_{Parse Time}	_{1.94s (± 0.82%)}	_{1.94s (± 1.02%)}	_{-0.00s (- 0.15%)}	_1.90s	_1.98s
_{Bind Time}	_{0.65s (± 0.86%)}	_{0.65s (± 0.61%)}	_{+0.00s (+ 0.46%)}	_0.64s	_0.66s
_{Check Time}	_{14.25s (± 0.67%)}	_{14.19s (± 0.33%)}	_{-0.05s (- 0.39%)}	_14.13s	_14.35s
_{Emit Time}	_{0.00s (± 0.00%)}	_{0.00s (± 0.00%)}	_{0.00s ( NaN%)}	_0.00s	_0.00s
_{Total Time}	_{16.85s (± 0.56%)}	_{16.78s (± 0.26%)}	_{-0.06s (- 0.36%)}	_16.68s	_16.90s
_{Angular - node (v12.1.0, x64)}
_{Memory used}	_{322,666k (± 0.03%)}	_{322,660k (± 0.02%)}	_{-6k (- 0.00%)}	_322,464k	_322,845k
_{Parse Time}	_{1.90s (± 0.70%)}	_{1.90s (± 0.94%)}	_{-0.00s (- 0.05%)}	_1.88s	_1.96s
_{Bind Time}	_{0.82s (± 0.89%)}	_{0.82s (± 0.83%)}	_{-0.01s (- 0.73%)}	_0.80s	_0.83s
_{Check Time}	_{5.15s (± 0.35%)}	_{5.13s (± 0.52%)}	_{-0.01s (- 0.29%)}	_5.08s	_5.19s
_{Emit Time}	_{5.76s (± 0.65%)}	_{5.75s (± 0.88%)}	_{-0.01s (- 0.14%)}	_5.67s	_5.87s
_{Total Time}	_{13.63s (± 0.38%)}	_{13.60s (± 0.51%)}	_{-0.03s (- 0.24%)}	_13.50s	_13.78s
_{Compiler-Unions - node (v12.1.0, x64)}
_{Memory used}	_{187,837k (± 0.06%)}	_{187,810k (± 0.07%)}	_{-27k (- 0.01%)}	_187,358k	_188,019k
_{Parse Time}	_{0.77s (± 0.88%)}	_{0.77s (± 0.62%)}	_{-0.01s (- 0.90%)}	_0.75s	_0.77s
_{Bind Time}	_{0.53s (± 0.89%)}	_{0.53s (± 0.75%)}	_{+0.00s (+ 0.19%)}	_0.52s	_0.54s
_{Check Time}	_{7.01s (± 0.76%)}	_{7.00s (± 0.45%)}	_{-0.01s (- 0.21%)}	_6.94s	_7.08s
_{Emit Time}	_{2.25s (± 1.03%)}	_{2.24s (± 0.62%)}	_{-0.01s (- 0.58%)}	_2.21s	_2.26s
_{Total Time}	_{10.56s (± 0.67%)}	_{10.53s (± 0.31%)}	_{-0.03s (- 0.32%)}	_10.45s	_10.59s
_{Monaco - node (v12.1.0, x64)}
_{Memory used}	_{324,080k (± 0.03%)}	_{324,030k (± 0.02%)}	_{-50k (- 0.02%)}	_323,931k	_324,213k
_{Parse Time}	_{1.54s (± 0.72%)}	_{1.53s (± 0.86%)}	_{-0.01s (- 0.71%)}	_1.51s	_1.57s
_{Bind Time}	_{0.72s (± 0.56%)}	_{0.71s (± 0.52%)}	_{-0.00s (- 0.56%)}	_0.71s	_0.72s
_{Check Time}	_{5.22s (± 0.49%)}	_{5.22s (± 0.60%)}	_{+0.00s (+ 0.04%)}	_5.16s	_5.32s
_{Emit Time}	_{3.03s (± 0.95%)}	_{3.02s (± 0.69%)}	_{-0.01s (- 0.40%)}	_2.99s	_3.08s
_{Total Time}	_{10.52s (± 0.50%)}	_{10.49s (± 0.49%)}	_{-0.02s (- 0.22%)}	_10.39s	_10.64s
_{TFS - node (v12.1.0, x64)}
_{Memory used}	_{288,735k (± 0.02%)}	_{288,728k (± 0.01%)}	_{-8k (- 0.00%)}	_288,649k	_288,809k
_{Parse Time}	_{1.21s (± 0.73%)}	_{1.22s (± 0.82%)}	_{+0.01s (+ 0.83%)}	_1.20s	_1.24s
_{Bind Time}	_{0.69s (± 0.64%)}	_{0.70s (± 0.86%)}	_{+0.00s (+ 0.43%)}	_0.68s	_0.71s
_{Check Time}	_{4.77s (± 0.36%)}	_{4.76s (± 0.38%)}	_{-0.00s (- 0.08%)}	_4.72s	_4.79s
_{Emit Time}	_{3.15s (± 0.59%)}	_{3.15s (± 1.30%)}	_{+0.00s (+ 0.16%)}	_3.07s	_3.26s
_{Total Time}	_{9.81s (± 0.25%)}	_{9.83s (± 0.60%)}	_{+0.01s (+ 0.14%)}	_9.67s	_9.95s
_{material-ui - node (v12.1.0, x64)}
_{Memory used}	_{452,001k (± 0.01%)}	_{452,012k (± 0.01%)}	_{+12k (+ 0.00%)}	_451,867k	_452,092k
_{Parse Time}	_{1.95s (± 0.68%)}	_{1.95s (± 0.44%)}	_{-0.00s (- 0.20%)}	_1.93s	_1.97s
_{Bind Time}	_{0.64s (± 1.47%)}	_{0.64s (± 0.62%)}	_{-0.00s (- 0.31%)}	_0.63s	_0.65s
_{Check Time}	_{12.82s (± 0.52%)}	_{12.82s (± 0.43%)}	_{-0.00s (- 0.04%)}	_12.72s	_12.97s
_{Emit Time}	_{0.00s (± 0.00%)}	_{0.00s (± 0.00%)}	_{0.00s ( NaN%)}	_0.00s	_0.00s
_{Total Time}	_{15.42s (± 0.48%)}	_{15.41s (± 0.37%)}	_{-0.02s (- 0.10%)}	_15.30s	_15.56s
_{Angular - node (v14.15.1, x64)}
_{Memory used}	_{321,421k (± 0.01%)}	_{321,406k (± 0.01%)}	_{-15k (- 0.00%)}	_321,368k	_321,433k
_{Parse Time}	_{1.90s (± 0.49%)}	_{1.90s (± 0.61%)}	_{+0.00s (+ 0.11%)}	_1.88s	_1.93s
_{Bind Time}	_{0.87s (± 0.92%)}	_{0.87s (± 0.83%)}	_{0.00s ( 0.00%)}	_0.86s	_0.89s
_{Check Time}	_{5.17s (± 0.22%)}	_{5.17s (± 0.52%)}	_{-0.00s (- 0.04%)}	_5.12s	_5.23s
_{Emit Time}	_{5.79s (± 0.79%)}	_{5.80s (± 0.89%)}	_{+0.01s (+ 0.14%)}	_5.75s	_5.95s
_{Total Time}	_{13.73s (± 0.44%)}	_{13.73s (± 0.50%)}	_{+0.01s (+ 0.04%)}	_13.63s	_13.92s
_{Compiler-Unions - node (v14.15.1, x64)}
_{Memory used}	_{187,749k (± 0.66%)}	_{189,111k (± 0.51%)}	_{+1,362k (+ 0.73%)}	_186,516k	_189,816k
_{Parse Time}	_{0.80s (± 0.69%)}	_{0.80s (± 0.65%)}	_{-0.00s (- 0.12%)}	_0.79s	_0.81s
_{Bind Time}	_{0.55s (± 0.66%)}	_{0.55s (± 0.66%)}	_{0.00s ( 0.00%)}	_0.55s	_0.56s
_{Check Time}	_{7.14s (± 0.60%)}	_{7.12s (± 0.61%)}	_{-0.02s (- 0.24%)}	_7.02s	_7.23s
_{Emit Time}	_{2.27s (± 0.91%)}	_{2.27s (± 0.96%)}	_{+0.01s (+ 0.44%)}	_2.20s	_2.31s
_{Total Time}	_{10.77s (± 0.42%)}	_{10.75s (± 0.54%)}	_{-0.02s (- 0.15%)}	_10.56s	_10.84s
_{Monaco - node (v14.15.1, x64)}
_{Memory used}	_{323,173k (± 0.00%)}	_{323,135k (± 0.00%)}	_{-37k (- 0.01%)}	_323,106k	_323,177k
_{Parse Time}	_{1.57s (± 0.63%)}	_{1.56s (± 0.72%)}	_{-0.01s (- 0.57%)}	_1.54s	_1.59s
_{Bind Time}	_{0.75s (± 0.67%)}	_{0.74s (± 0.49%)}	_{-0.00s (- 0.13%)}	_0.74s	_0.75s
_{Check Time}	_{5.21s (± 0.19%)}	_{5.21s (± 0.54%)}	_{+0.00s (+ 0.04%)}	_5.17s	_5.28s
_{Emit Time}	_{3.06s (± 0.47%)}	_{3.09s (± 1.14%)}	_{+0.02s (+ 0.82%)}	_3.03s	_3.19s
_{Total Time}	_{10.59s (± 0.20%)}	_{10.60s (± 0.57%)}	_{+0.02s (+ 0.15%)}	_10.54s	_10.77s
_{TFS - node (v14.15.1, x64)}
_{Memory used}	_{287,680k (± 0.01%)}	_{287,689k (± 0.01%)}	_{+9k (+ 0.00%)}	_287,655k	_287,714k
_{Parse Time}	_{1.28s (± 1.86%)}	_{1.26s (± 1.35%)}	_{-0.02s (- 1.87%)}	_1.23s	_1.30s
_{Bind Time}	_{0.72s (± 0.72%)}	_{0.72s (± 1.06%)}	_{-0.00s (- 0.42%)}	_0.70s	_0.73s
_{Check Time}	_{4.83s (± 0.60%)}	_{4.80s (± 0.48%)}	_{-0.03s (- 0.68%)}	_4.76s	_4.88s
_{Emit Time}	_{3.23s (± 1.17%)}	_{3.19s (± 0.26%)}	_{-0.04s (- 1.12%)}	_3.17s	_3.21s
_{Total Time}	_{10.06s (± 0.49%)}	_{9.96s (± 0.25%)}	_{-0.09s (- 0.93%)}	_9.91s	_10.01s
_{material-ui - node (v14.15.1, x64)}
_{Memory used}	_{450,261k (± 0.00%)}	_{450,234k (± 0.00%)}	_{-27k (- 0.01%)}	_450,188k	_450,298k
_{Parse Time}	_{2.00s (± 0.77%)}	_{1.98s (± 0.73%)}	_{-0.01s (- 0.65%)}	_1.94s	_2.02s
_{Bind Time}	_{0.70s (± 1.00%)}	_{0.70s (± 0.82%)}	_{+0.00s (+ 0.14%)}	_0.69s	_0.71s
_{Check Time}	_{12.98s (± 0.57%)}	_{13.00s (± 0.60%)}	_{+0.02s (+ 0.12%)}	_12.82s	_13.19s
_{Emit Time}	_{0.00s (± 0.00%)}	_{0.00s (± 0.00%)}	_{0.00s ( NaN%)}	_0.00s	_0.00s
_{Total Time}	_{15.68s (± 0.53%)}	_{15.68s (± 0.46%)}	_{0.00s ( 0.00%)}	_15.51s	_15.82s

System

_{Machine Name}	_ts-ci-ubuntu
_Platform	_{linux 4.4.0-206-generic}
_Architecture	_x64
_{Available Memory}	_{16 GB}
_{Available Memory}	_{2 GB}
_CPUs	_{4 × Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz}

Hosts

node (v10.16.3, x64)
node (v12.1.0, x64)
node (v14.15.1, x64)

Scenarios

Angular - node (v10.16.3, x64)
Angular - node (v12.1.0, x64)
Angular - node (v14.15.1, x64)
Compiler-Unions - node (v10.16.3, x64)
Compiler-Unions - node (v12.1.0, x64)
Compiler-Unions - node (v14.15.1, x64)
Monaco - node (v10.16.3, x64)
Monaco - node (v12.1.0, x64)
Monaco - node (v14.15.1, x64)
TFS - node (v10.16.3, x64)
TFS - node (v12.1.0, x64)
TFS - node (v14.15.1, x64)
material-ui - node (v10.16.3, x64)
material-ui - node (v12.1.0, x64)
material-ui - node (v14.15.1, x64)

_Benchmark	_Name	_Iterations
_Current	₄₄₁₉₇	₁₀
_Baseline	_master	₁₀

Developer Information:

Download Benchmark

weswigham · 2021-05-21T09:17:15Z

Looks like perf is either the same or ever so slightly positive, kinda as expected.

amcasey

LGTM. Are there instructions somewhere for checking whether a regex is bad?

weswigham · 2021-05-21T20:54:34Z

LGTM. Are there instructions somewhere for checking whether a regex is bad?

Not yet! We'll probably have to wait until the research is published for a formal tool. (I have a prerelease build, but the output is pretty researchy). There are some other, existing tools tied to other research papers, but they involving checking against a DB of known-bad regexes online, or identifying known-bad substructures, which doesn't really help as much when making novel regexes. @robmcl4 may have more concrete plans for the software, I can't really speak for him there.

sandersn · 2021-05-21T21:37:27Z

Not yet! We'll probably have to wait until the research is published for a formal tool.

Orrrr, you could rewrite it in Rust today and be instantly famous.

Co-authored-by: David Michon <dmichon-msft@users.noreply.github.com>

…od (its faster)

Simplify or optimize regexes with polynomial time worst cases

bbb2dee

typescript-bot assigned weswigham May 21, 2021

typescript-bot added Author: Team For Uncommitted Bug PR for untriaged, rejected, closed or missing bug labels May 21, 2021

weswigham requested review from rbuckton and sandersn May 21, 2021 09:17

weswigham marked this pull request as ready for review May 21, 2021 09:17

amcasey approved these changes May 21, 2021

View reviewed changes

dmichon-msft reviewed May 21, 2021

View reviewed changes

Comment thread src/compiler/core.ts Outdated

Comment thread src/compiler/utilities.ts Outdated

weswigham and others added 2 commits May 24, 2021 14:33

PR feedback & cleanup

fc71dae

Co-authored-by: David Michon <dmichon-msft@users.noreply.github.com>

Use builtin scanner function for checking whitespace in fallback meth…

0523fc6

…od (its faster)

weswigham merged commit fcabb5c into microsoft:master May 24, 2021

This was referenced Sep 21, 2021

Cannot detect sourceMappingURL if there is a trailing newline #45982

Closed

fix(sourcemap): accept a sourceMappingURL that ends with a newline #45983

Merged

microsoft locked as resolved and limited conversation to collaborators Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify or optimize regexes with polynomial time worst cases#44197

Simplify or optimize regexes with polynomial time worst cases#44197
weswigham merged 3 commits into
microsoft:masterfrom
weswigham:regex-polynomial-fixing

weswigham commented May 21, 2021 •

edited

Loading

Uh oh!

weswigham commented May 21, 2021

Uh oh!

typescript-bot commented May 21, 2021 •

edited

Loading

Uh oh!

typescript-bot commented May 21, 2021

Uh oh!

weswigham commented May 21, 2021

Uh oh!

amcasey left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

weswigham commented May 21, 2021

Uh oh!

sandersn commented May 21, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

weswigham commented May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weswigham commented May 21, 2021

Uh oh!

typescript-bot commented May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

typescript-bot commented May 21, 2021

Uh oh!

weswigham commented May 21, 2021

Uh oh!

amcasey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

weswigham commented May 21, 2021

Uh oh!

sandersn commented May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

weswigham commented May 21, 2021 •

edited

Loading

typescript-bot commented May 21, 2021 •

edited

Loading

sandersn commented May 21, 2021 •

edited

Loading