Skip to content

Simplify or optimize regexes with polynomial time worst cases#44197

Merged
weswigham merged 3 commits into
microsoft:masterfrom
weswigham:regex-polynomial-fixing
May 24, 2021
Merged

Simplify or optimize regexes with polynomial time worst cases#44197
weswigham merged 3 commits into
microsoft:masterfrom
weswigham:regex-polynomial-fixing

Conversation

@weswigham

@weswigham weswigham commented May 21, 2021

Copy link
Copy Markdown
Member

A friend of mine, @robmcl4, is a PhD student as UCSB and is doing some security research involving regex performance. In the course of that, he's developed a kind of fuzzer for v8's regex engine that's pretty good at identifying regexes that have polynomial or worse worst-case time in specifically v8's regex implementation. In doing said research, he happened to run it over the regexes in our codebase (along with many other codebases).

In this PR, I fix all the regexes said fuzzer indicates may be problematic so they are more performant. (I checked any replacements don't also do very poorly using said fuzzer).

Most of the time, fixing the perf just means using less regexes, however one particular example stands out! A regex we use a ton in multiple places, /\s+$/g, actually has terrible performance when used to trim whitespace from a string with a lot of leading whitespace (check the benchmark here). Lines having leading whitespace is pretty common, so I'm hoping these fixes might actually have some small but measurable real-world perf gains.

@typescript-bot typescript-bot added Author: Team For Uncommitted Bug PR for untriaged, rejected, closed or missing bug labels May 21, 2021
@weswigham

Copy link
Copy Markdown
Member Author

@typescript-bot perf test this

@typescript-bot

typescript-bot commented May 21, 2021

Copy link
Copy Markdown
Contributor

Heya @weswigham, I've started to run the perf test suite on this PR at bbb2dee. You can monitor the build here.

Update: The results are in!

@typescript-bot

Copy link
Copy Markdown
Contributor

@weswigham
The results of the perf run you requested are in!

Here they are:

Comparison Report - master..44197

Metric master 44197 Delta Best Worst
Angular - node (v10.16.3, x64)
Memory used 345,166k (± 0.03%) 345,175k (± 0.02%) +10k (+ 0.00%) 344,996k 345,257k
Parse Time 1.92s (± 0.35%) 1.91s (± 0.58%) -0.01s (- 0.37%) 1.88s 1.93s
Bind Time 0.84s (± 0.62%) 0.83s (± 0.67%) -0.01s (- 0.83%) 0.82s 0.84s
Check Time 5.26s (± 0.38%) 5.28s (± 0.55%) +0.02s (+ 0.34%) 5.22s 5.36s
Emit Time 5.52s (± 0.58%) 5.50s (± 0.47%) -0.02s (- 0.36%) 5.45s 5.57s
Total Time 13.53s (± 0.32%) 13.52s (± 0.28%) -0.02s (- 0.13%) 13.43s 13.60s
Compiler-Unions - node (v10.16.3, x64)
Memory used 200,337k (± 0.05%) 200,375k (± 0.05%) +38k (+ 0.02%) 200,171k 200,655k
Parse Time 0.78s (± 0.97%) 0.78s (± 0.61%) -0.00s (- 0.25%) 0.77s 0.79s
Bind Time 0.53s (± 1.55%) 0.53s (± 1.14%) -0.00s (- 0.38%) 0.51s 0.53s
Check Time 7.54s (± 0.69%) 7.54s (± 0.69%) -0.00s (- 0.04%) 7.42s 7.63s
Emit Time 2.26s (± 2.03%) 2.24s (± 0.57%) -0.01s (- 0.62%) 2.21s 2.27s
Total Time 11.11s (± 0.76%) 11.09s (± 0.53%) -0.02s (- 0.19%) 10.96s 11.20s
Monaco - node (v10.16.3, x64)
Memory used 341,680k (± 0.01%) 341,708k (± 0.02%) +28k (+ 0.01%) 341,543k 341,811k
Parse Time 1.56s (± 0.63%) 1.55s (± 0.44%) -0.00s (- 0.26%) 1.54s 1.57s
Bind Time 0.75s (± 0.92%) 0.74s (± 0.70%) -0.00s (- 0.54%) 0.73s 0.75s
Check Time 5.39s (± 0.40%) 5.39s (± 0.37%) +0.00s (+ 0.02%) 5.33s 5.43s
Emit Time 2.97s (± 0.68%) 2.99s (± 0.80%) +0.02s (+ 0.61%) 2.93s 3.04s
Total Time 10.66s (± 0.31%) 10.67s (± 0.37%) +0.01s (+ 0.11%) 10.56s 10.75s
TFS - node (v10.16.3, x64)
Memory used 304,273k (± 0.04%) 304,228k (± 0.02%) -45k (- 0.01%) 304,094k 304,289k
Parse Time 1.21s (± 0.62%) 1.21s (± 0.46%) -0.00s (- 0.16%) 1.20s 1.22s
Bind Time 0.70s (± 0.85%) 0.70s (± 0.83%) -0.00s (- 0.28%) 0.69s 0.72s
Check Time 4.87s (± 0.61%) 4.85s (± 0.46%) -0.02s (- 0.43%) 4.81s 4.91s
Emit Time 3.12s (± 2.05%) 3.12s (± 1.33%) -0.00s (- 0.03%) 3.05s 3.21s
Total Time 9.91s (± 0.71%) 9.88s (± 0.61%) -0.02s (- 0.23%) 9.78s 10.03s
material-ui - node (v10.16.3, x64)
Memory used 474,127k (± 0.01%) 474,101k (± 0.01%) -25k (- 0.01%) 473,969k 474,284k
Parse Time 1.94s (± 0.82%) 1.94s (± 1.02%) -0.00s (- 0.15%) 1.90s 1.98s
Bind Time 0.65s (± 0.86%) 0.65s (± 0.61%) +0.00s (+ 0.46%) 0.64s 0.66s
Check Time 14.25s (± 0.67%) 14.19s (± 0.33%) -0.05s (- 0.39%) 14.13s 14.35s
Emit Time 0.00s (± 0.00%) 0.00s (± 0.00%) 0.00s ( NaN%) 0.00s 0.00s
Total Time 16.85s (± 0.56%) 16.78s (± 0.26%) -0.06s (- 0.36%) 16.68s 16.90s
Angular - node (v12.1.0, x64)
Memory used 322,666k (± 0.03%) 322,660k (± 0.02%) -6k (- 0.00%) 322,464k 322,845k
Parse Time 1.90s (± 0.70%) 1.90s (± 0.94%) -0.00s (- 0.05%) 1.88s 1.96s
Bind Time 0.82s (± 0.89%) 0.82s (± 0.83%) -0.01s (- 0.73%) 0.80s 0.83s
Check Time 5.15s (± 0.35%) 5.13s (± 0.52%) -0.01s (- 0.29%) 5.08s 5.19s
Emit Time 5.76s (± 0.65%) 5.75s (± 0.88%) -0.01s (- 0.14%) 5.67s 5.87s
Total Time 13.63s (± 0.38%) 13.60s (± 0.51%) -0.03s (- 0.24%) 13.50s 13.78s
Compiler-Unions - node (v12.1.0, x64)
Memory used 187,837k (± 0.06%) 187,810k (± 0.07%) -27k (- 0.01%) 187,358k 188,019k
Parse Time 0.77s (± 0.88%) 0.77s (± 0.62%) -0.01s (- 0.90%) 0.75s 0.77s
Bind Time 0.53s (± 0.89%) 0.53s (± 0.75%) +0.00s (+ 0.19%) 0.52s 0.54s
Check Time 7.01s (± 0.76%) 7.00s (± 0.45%) -0.01s (- 0.21%) 6.94s 7.08s
Emit Time 2.25s (± 1.03%) 2.24s (± 0.62%) -0.01s (- 0.58%) 2.21s 2.26s
Total Time 10.56s (± 0.67%) 10.53s (± 0.31%) -0.03s (- 0.32%) 10.45s 10.59s
Monaco - node (v12.1.0, x64)
Memory used 324,080k (± 0.03%) 324,030k (± 0.02%) -50k (- 0.02%) 323,931k 324,213k
Parse Time 1.54s (± 0.72%) 1.53s (± 0.86%) -0.01s (- 0.71%) 1.51s 1.57s
Bind Time 0.72s (± 0.56%) 0.71s (± 0.52%) -0.00s (- 0.56%) 0.71s 0.72s
Check Time 5.22s (± 0.49%) 5.22s (± 0.60%) +0.00s (+ 0.04%) 5.16s 5.32s
Emit Time 3.03s (± 0.95%) 3.02s (± 0.69%) -0.01s (- 0.40%) 2.99s 3.08s
Total Time 10.52s (± 0.50%) 10.49s (± 0.49%) -0.02s (- 0.22%) 10.39s 10.64s
TFS - node (v12.1.0, x64)
Memory used 288,735k (± 0.02%) 288,728k (± 0.01%) -8k (- 0.00%) 288,649k 288,809k
Parse Time 1.21s (± 0.73%) 1.22s (± 0.82%) +0.01s (+ 0.83%) 1.20s 1.24s
Bind Time 0.69s (± 0.64%) 0.70s (± 0.86%) +0.00s (+ 0.43%) 0.68s 0.71s
Check Time 4.77s (± 0.36%) 4.76s (± 0.38%) -0.00s (- 0.08%) 4.72s 4.79s
Emit Time 3.15s (± 0.59%) 3.15s (± 1.30%) +0.00s (+ 0.16%) 3.07s 3.26s
Total Time 9.81s (± 0.25%) 9.83s (± 0.60%) +0.01s (+ 0.14%) 9.67s 9.95s
material-ui - node (v12.1.0, x64)
Memory used 452,001k (± 0.01%) 452,012k (± 0.01%) +12k (+ 0.00%) 451,867k 452,092k
Parse Time 1.95s (± 0.68%) 1.95s (± 0.44%) -0.00s (- 0.20%) 1.93s 1.97s
Bind Time 0.64s (± 1.47%) 0.64s (± 0.62%) -0.00s (- 0.31%) 0.63s 0.65s
Check Time 12.82s (± 0.52%) 12.82s (± 0.43%) -0.00s (- 0.04%) 12.72s 12.97s
Emit Time 0.00s (± 0.00%) 0.00s (± 0.00%) 0.00s ( NaN%) 0.00s 0.00s
Total Time 15.42s (± 0.48%) 15.41s (± 0.37%) -0.02s (- 0.10%) 15.30s 15.56s
Angular - node (v14.15.1, x64)
Memory used 321,421k (± 0.01%) 321,406k (± 0.01%) -15k (- 0.00%) 321,368k 321,433k
Parse Time 1.90s (± 0.49%) 1.90s (± 0.61%) +0.00s (+ 0.11%) 1.88s 1.93s
Bind Time 0.87s (± 0.92%) 0.87s (± 0.83%) 0.00s ( 0.00%) 0.86s 0.89s
Check Time 5.17s (± 0.22%) 5.17s (± 0.52%) -0.00s (- 0.04%) 5.12s 5.23s
Emit Time 5.79s (± 0.79%) 5.80s (± 0.89%) +0.01s (+ 0.14%) 5.75s 5.95s
Total Time 13.73s (± 0.44%) 13.73s (± 0.50%) +0.01s (+ 0.04%) 13.63s 13.92s
Compiler-Unions - node (v14.15.1, x64)
Memory used 187,749k (± 0.66%) 189,111k (± 0.51%) +1,362k (+ 0.73%) 186,516k 189,816k
Parse Time 0.80s (± 0.69%) 0.80s (± 0.65%) -0.00s (- 0.12%) 0.79s 0.81s
Bind Time 0.55s (± 0.66%) 0.55s (± 0.66%) 0.00s ( 0.00%) 0.55s 0.56s
Check Time 7.14s (± 0.60%) 7.12s (± 0.61%) -0.02s (- 0.24%) 7.02s 7.23s
Emit Time 2.27s (± 0.91%) 2.27s (± 0.96%) +0.01s (+ 0.44%) 2.20s 2.31s
Total Time 10.77s (± 0.42%) 10.75s (± 0.54%) -0.02s (- 0.15%) 10.56s 10.84s
Monaco - node (v14.15.1, x64)
Memory used 323,173k (± 0.00%) 323,135k (± 0.00%) -37k (- 0.01%) 323,106k 323,177k
Parse Time 1.57s (± 0.63%) 1.56s (± 0.72%) -0.01s (- 0.57%) 1.54s 1.59s
Bind Time 0.75s (± 0.67%) 0.74s (± 0.49%) -0.00s (- 0.13%) 0.74s 0.75s
Check Time 5.21s (± 0.19%) 5.21s (± 0.54%) +0.00s (+ 0.04%) 5.17s 5.28s
Emit Time 3.06s (± 0.47%) 3.09s (± 1.14%) +0.02s (+ 0.82%) 3.03s 3.19s
Total Time 10.59s (± 0.20%) 10.60s (± 0.57%) +0.02s (+ 0.15%) 10.54s 10.77s
TFS - node (v14.15.1, x64)
Memory used 287,680k (± 0.01%) 287,689k (± 0.01%) +9k (+ 0.00%) 287,655k 287,714k
Parse Time 1.28s (± 1.86%) 1.26s (± 1.35%) -0.02s (- 1.87%) 1.23s 1.30s
Bind Time 0.72s (± 0.72%) 0.72s (± 1.06%) -0.00s (- 0.42%) 0.70s 0.73s
Check Time 4.83s (± 0.60%) 4.80s (± 0.48%) -0.03s (- 0.68%) 4.76s 4.88s
Emit Time 3.23s (± 1.17%) 3.19s (± 0.26%) -0.04s (- 1.12%) 3.17s 3.21s
Total Time 10.06s (± 0.49%) 9.96s (± 0.25%) -0.09s (- 0.93%) 9.91s 10.01s
material-ui - node (v14.15.1, x64)
Memory used 450,261k (± 0.00%) 450,234k (± 0.00%) -27k (- 0.01%) 450,188k 450,298k
Parse Time 2.00s (± 0.77%) 1.98s (± 0.73%) -0.01s (- 0.65%) 1.94s 2.02s
Bind Time 0.70s (± 1.00%) 0.70s (± 0.82%) +0.00s (+ 0.14%) 0.69s 0.71s
Check Time 12.98s (± 0.57%) 13.00s (± 0.60%) +0.02s (+ 0.12%) 12.82s 13.19s
Emit Time 0.00s (± 0.00%) 0.00s (± 0.00%) 0.00s ( NaN%) 0.00s 0.00s
Total Time 15.68s (± 0.53%) 15.68s (± 0.46%) 0.00s ( 0.00%) 15.51s 15.82s
System
Machine Namets-ci-ubuntu
Platformlinux 4.4.0-206-generic
Architecturex64
Available Memory16 GB
Available Memory2 GB
CPUs4 × Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
Hosts
  • node (v10.16.3, x64)
  • node (v12.1.0, x64)
  • node (v14.15.1, x64)
Scenarios
  • Angular - node (v10.16.3, x64)
  • Angular - node (v12.1.0, x64)
  • Angular - node (v14.15.1, x64)
  • Compiler-Unions - node (v10.16.3, x64)
  • Compiler-Unions - node (v12.1.0, x64)
  • Compiler-Unions - node (v14.15.1, x64)
  • Monaco - node (v10.16.3, x64)
  • Monaco - node (v12.1.0, x64)
  • Monaco - node (v14.15.1, x64)
  • TFS - node (v10.16.3, x64)
  • TFS - node (v12.1.0, x64)
  • TFS - node (v14.15.1, x64)
  • material-ui - node (v10.16.3, x64)
  • material-ui - node (v12.1.0, x64)
  • material-ui - node (v14.15.1, x64)
Benchmark Name Iterations
Current 44197 10
Baseline master 10

Developer Information:

Download Benchmark

@weswigham

Copy link
Copy Markdown
Member Author

Looks like perf is either the same or ever so slightly positive, kinda as expected.

@weswigham weswigham requested review from rbuckton and sandersn May 21, 2021 09:17
@weswigham weswigham marked this pull request as ready for review May 21, 2021 09:17

@amcasey amcasey left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Are there instructions somewhere for checking whether a regex is bad?

Comment thread src/compiler/commandLineParser.ts
Comment thread src/compiler/commandLineParser.ts Outdated
Comment thread src/compiler/commandLineParser.ts Outdated
Comment thread src/compiler/core.ts Outdated
Comment thread src/compiler/debug.ts Outdated
Comment thread src/compiler/parser.ts Outdated
Comment thread src/compiler/scanner.ts Outdated
Comment thread src/compiler/utilities.ts Outdated
Comment thread src/compiler/utilities.ts Outdated
@weswigham

Copy link
Copy Markdown
Member Author

LGTM. Are there instructions somewhere for checking whether a regex is bad?

Not yet! We'll probably have to wait until the research is published for a formal tool. (I have a prerelease build, but the output is pretty researchy). There are some other, existing tools tied to other research papers, but they involving checking against a DB of known-bad regexes online, or identifying known-bad substructures, which doesn't really help as much when making novel regexes. @robmcl4 may have more concrete plans for the software, I can't really speak for him there.

@sandersn

sandersn commented May 21, 2021

Copy link
Copy Markdown
Member

Not yet! We'll probably have to wait until the research is published for a formal tool.

Orrrr, you could rewrite it in Rust today and be instantly famous.

Comment thread src/compiler/core.ts Outdated
Comment thread src/compiler/utilities.ts Outdated
weswigham and others added 2 commits May 24, 2021 14:33
Co-authored-by: David Michon <dmichon-msft@users.noreply.github.com>
@weswigham weswigham merged commit fcabb5c into microsoft:master May 24, 2021
@microsoft microsoft locked as resolved and limited conversation to collaborators Oct 21, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Author: Team For Uncommitted Bug PR for untriaged, rejected, closed or missing bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants