`bumpalo`-based arenas by fitzgen · Pull Request #2 · bytecodealliance/arena-btree

fitzgen · 2022-09-27T22:37:39Z

Alternative to #1

… type

If we ever need them we can add them, but this is just going to make things go a little faster here.

Amanieu · 2022-09-29T01:33:13Z

It turns out my benchmarks from #1 were wrong, I had accidentally left debug assertions on in release mode (I was debugging another issue and forgot to turn them back off afterwards). The std BTreeMap wasn't affected. Here are a new set of benchmark results with debug assertions disabled.

std BTreeMap:

          2,139.54 msec task-clock                #    1.000 CPUs utilized          
                 4      context-switches          #    1.870 /sec                   
                 0      cpu-migrations            #    0.000 /sec                   
            35,008      page-faults               #   16.362 K/sec                  
     5,642,728,968      cycles                    #    2.637 GHz                    
    11,399,984,081      instructions              #    2.02  insn per cycle         
     1,716,455,724      branches                  #  802.254 M/sec                  
        38,997,435      branch-misses             #    2.27% of all branches

arena BTreeMap #1

          2,289.05 msec task-clock                #    1.000 CPUs utilized          
                 5      context-switches          #    2.184 /sec                   
                 0      cpu-migrations            #    0.000 /sec                   
            38,999      page-faults               #   17.037 K/sec                  
     5,986,327,420      cycles                    #    2.615 GHz                    
    11,958,129,918      instructions              #    2.00  insn per cycle         
     1,752,578,062      branches                  #  765.636 M/sec                  
        39,874,075      branch-misses             #    2.28% of all branches

arena BTreeMap #2

          2,166.25 msec task-clock                #    1.000 CPUs utilized          
                 4      context-switches          #    1.847 /sec                   
                 0      cpu-migrations            #    0.000 /sec                   
            34,786      page-faults               #   16.058 K/sec                  
     5,709,502,207      cycles                    #    2.636 GHz                    
    11,388,274,391      instructions              #    1.99  insn per cycle         
     1,717,719,537      branches                  #  792.946 M/sec                  
        39,164,113      branch-misses             #    2.28% of all branches

So it seems that (at least in my case) #2 has no performance difference compared to the std BTreeMap while #1 is about 5% slower (rather than 20% as previously reported).

cfallin · 2022-09-29T01:59:44Z

Interesting to note from your numbers:

Option 1, with u32 indices and a Vec of nodes:

        38,999      page-faults               #   17.037 K/sec

Option 2, with borrows (i.e., real 64-bit pointers) and a bumpalo arena:

        34,786      page-faults               #   16.058 K/sec

I've found the page-faults metric from perf to be a reasonable proxy for total memory allocations in a heap-grows-monotonically-ish batch program (like a compiler); and the difference here suggest to me that the approach using a Vec is consuming more memory, leading to more faulted-in zeroed pages and more cache-misses as well.

I'm curious if the Vec growth is to blame, and what would happen if we did a Vec::with_capacity at the beginning with a guess for the eventual node count?

Amanieu · 2022-09-29T02:05:13Z

What is strange is that in regalloc2 (bytecodealliance/regalloc2#88), #1 is much faster than both std and #2 (which seem to have similar performance bytecodealliance/regalloc2#92). Perhaps regalloc2 is much more cache-bound than my benchmark and the u32 indices are helping a lot?

Amanieu · 2022-09-29T02:09:24Z

Using Vec::with_capacity does seem to reduce the page faults. No impact on performance though.

          2,142.89 msec task-clock                #    1.000 CPUs utilized          
                 3      context-switches          #    1.400 /sec                   
                 1      cpu-migrations            #    0.467 /sec                   
            33,161      page-faults               #   15.475 K/sec                  
     6,003,358,962      cycles                    #    2.802 GHz                    
    11,936,071,446      instructions              #    1.99  insn per cycle         
     1,748,444,800      branches                  #  815.928 M/sec                  
        39,567,381      branch-misses             #    2.26% of all branches

fitzgen added 30 commits September 1, 2022 10:01

No core::error::Error

b89ba71

cargo fmt

f40269d

No #[stable] attributes

8dbf43c

Remove unstable items

5a622d5

Turn use crate:: imports into use std:: imports

be1b661

Switch imports of Allocator trait to crate::alloc::ArenaAllocator…

08d8400

… type

Remove an unused and missing import of unstable item

0577e99

Use Box directly

abe0c63

Replace all usage of the Allocator trait with ArenaAllocator

59585dc

A few more allocator trait-y stuff to use ArenaAllocator

7faca62

Remove some more rustc-internal attributes

bf85c38

Do not use unstable range patterns

c1666f5

Remove unstable extend_one usage

c3054eb

Replace global allocator with ArenaAllocator in tests

4328b8b

Remove usage of specialization for is_set_val

0665451

Remove must_use from an impl block

87e951c

remove abort intrinsic

84b4cc1

Remove "unsafe" from "unsafe impl Drop"

c4c48d0

Remove alloc.clone() calls

0ad250d

Re-add drain_filter, fix some more alloc usage

ac0a112

Comment out some tests of unstable methods

d8b4e28

Implement some workarounds for uses of unstable methods

dcb91b2

Don't implement difference/union/etc for now

f19fd76

If we ever need them we can add them, but this is just going to make things go a little faster here.

Add a readme

58ff3a1

fix a bunch more errors

1b6a8df

Disable a couple tests that use unstable methods we removed

41d04c4

Make a bunch of types pub(crate)

36cd06e

comment out a bunch of tests for methods we aren't supporting

444e827

Push allocation types down into methods of ArenaAllocator

7d0e500

Push K, V types into ArenaAllocator, not just its methods

38d621f

fitzgen added 14 commits September 1, 2022 15:52

rename allocation methods

bda60cd

Use typed deallocation methods

9337748

Add dummy alloc/dealloc implementations

5e6f34f

all tests passing! not using arenas yet tho

c2fdf31

Add more info to README

2c27eb4

Add differential fuzzing between std and this crate

ac1548f

Rename Arena to InnerArena

bc53518

Rename ArenaAllocator to Arena

64b655f

Get bumpalo-based arenas working

a8882b0

Add github actions CI

ddc4365

Add miri and fuzzing to github actions

e6d107e

Do not use a local arbitrary crate

f48c219

Update readme

2b570e3

get doc tests using arenas

513e7dd

fitzgen requested a review from cfallin September 27, 2022 22:37

fitzgen force-pushed the bumpalo-based-arenas branch from 25350d9 to 1839826 Compare October 14, 2022 16:11

TEMP always bump allocate and skip free lists

0ae687f

fitzgen force-pushed the bumpalo-based-arenas branch from 1839826 to 0ae687f Compare October 14, 2022 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`bumpalo`-based arenas#2

`bumpalo`-based arenas#2
fitzgen wants to merge 45 commits intomainfrom
bumpalo-based-arenas

fitzgen commented Sep 27, 2022

Uh oh!

Amanieu commented Sep 29, 2022

Uh oh!

cfallin commented Sep 29, 2022

Uh oh!

Amanieu commented Sep 29, 2022

Uh oh!

Amanieu commented Sep 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fitzgen commented Sep 27, 2022

Uh oh!

Amanieu commented Sep 29, 2022

Uh oh!

cfallin commented Sep 29, 2022

Uh oh!

Amanieu commented Sep 29, 2022

Uh oh!

Amanieu commented Sep 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants