Avoid boilerplate in ExpressionAnalyzer comparing & hashing by kripken · Pull Request #3332 · WebAssembly/binaryen

kripken · 2020-11-09T22:35:03Z

Expands on #3294:

Scope names must be distinguished as either defs or uses.
Error when a core #define is missing, which is less error-prone, as
suggested by @tlively
Add DELEGATE_GET_FIELD which lets one define "get the field"
once and then all the loops can use it. This helps avoid boilerplate for
loops at least in some cases (when there is a single object on which
to get the field).

With those, it is possible to replace boilerplate in comparisons and
hashing logic. This also fixes a bug where BrOnExn::sent was not
scanned there.

Add some unit tests for hashing. We didn't have any, and hashing can be
subtly wrong without observable external effects (just more collisions).

Co-authored-by: Heejin Ahn <aheejin@gmail.com>

aheejin · 2020-11-11T13:22:25Z

+#define DELEGATE_FIELD_TYPE(id, name) visitType(cast->name);
+#define DELEGATE_FIELD_ADDRESS(id, name) visitAddress(cast->name);
+
+#define DELEGATE_FIELD_SCOPE_NAME_DEF(id, name) noteScopeName(cast->name);


Suggested change

#define DELEGATE_FIELD_SCOPE_NAME_DEF(id, name) noteScopeName(cast->name);

#define DELEGATE_FIELD_SCOPE_NAME_DEF(id, name) \

noteScopeName(cast->name); \

visitScopeName(cast->name);

I might be mistaken, but shouldn't we also hash scope names for blocks and loops?

I think it's actually better not to. The current code will give the same hash for (block $x) (a block with an unneeded name) and (block) (with no name). We only give a different hash if there is a noticeable difference in break targets.

But this is a subtle point that I didn't fully think about until now, so it's definitely worth a comment, added.

By the way we can't hash expressions with dangling names, which means, something like this, right?

(block $label0 (br $label1) ;; $label1 is an open name, possibly an outer name )

Is this intentional?

It looks like you're right, that wasn't supported before (and doesn't change here). I'm slightly surprised that wasn't an issue. But I guess we don't hash many things atm. Worth fixing eventually.

Yeah what I was thinking about was

(block $a (br $a) )

(block $b (br $a) )

I was wondering if these two would get the same hash, but actually we weren't even supporting the second case because $b was a dangling name. If we want to support dangling names eventually (maybe treat them literally in the same way as we do for comparison), we need to hash block/loop labels too. But given that this hasn't been an issue so far maybe all these are not necessary...

Good point, yes, if we support dangling names we'd need to look at this carefully too.

MaxGraey · 2020-11-13T02:15:04Z

It seems this cause to regression on our test suite. Probably it relate to function deduplication pass which depend on ExpressionAnalyzer. Some of functions with same bodies but different names (produced during monomorphisation) doesn't fold into one anymore.

kripken · 2020-11-13T02:49:25Z

@MaxGraey Sorry to hear that. Can you get a binaryen testcase from that? That might be the fastest way to investigate.

MaxGraey · 2020-11-13T02:51:57Z

Sure, I'll make some example tomorrow

MaxGraey · 2020-11-13T10:29:25Z

Minimal example:

Wast code (folded)

yes, even hidden code blocks!

(module
  (type $t0 (func (param i32 i32) (result i32)))
  (type $t1 (func (param i32 i32 i32 i32)))
  (type $t2 (func (result i32)))

  (import "env" "abort" (func $~lib/builtins/abort (type $t1)))

  (table $T0 1 funcref)
  (memory $memory (export "memory") 1)
  (global $g0 (mut i32) (i32.const 64))
  (global $g1 (mut i32) (i32.const 144))
  (data $d0 (i32.const 12) "\08\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\08\00\00\00\01\00\00\00\02\00\00\00")
  (data $d1 (i32.const 44) "\10\00\00\00\01\00\00\00\00\00\00\00\03\00\00\00\10\00\00\00 \00\00\00 \00\00\00\08\00\00\00\02\00\00\00")
  (data $d2 (i32.const 92) "\08\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\08\00\00\00\01\00\00\00\02\00\00\00")
  (data $d3 (i32.const 124) "\10\00\00\00\01\00\00\00\00\00\00\00\04\00\00\00\10\00\00\00p\00\00\00p\00\00\00\08\00\00\00\02\00\00\00")
  (data $d4 (i32.const 172) "$\00\00\00\01\00\00\00\00\00\00\00\01\00\00\00$\00\00\00I\00n\00d\00e\00x\00 \00o\00u\00t\00 \00o\00f\00 \00r\00a\00n\00g\00e\00")
  (data $d5 (i32.const 236) "\1a\00\00\00\01\00\00\00\00\00\00\00\01\00\00\00\1a\00\00\00~\00l\00i\00b\00/\00a\00r\00r\00a\00y\00.\00t\00s\00")

  (func $~lib/array/Array<i32>#__uget (type $t0) (param $0 i32) (param $1 i32) (result i32)
    (i32.load
      (i32.add
        (i32.load offset=4
          (local.get $0))
        (i32.shl
          (local.get $1)
          (i32.const 2))))
  )

  (func $~lib/array/Array<i32>#__get (type $t0) (param $0 i32) (param $1 i32) (result i32)
    (local $2 i32)
    (if $I0
      (i32.ge_u
        (local.get $1)
        (i32.load offset=12
          (local.get $0)))
      (then
        (call $~lib/builtins/abort
          (i32.const 192)
          (i32.const 256)
          (i32.const 104)
          (i32.const 42))
        (unreachable)))
    (local.set $2
      (call $~lib/array/Array<i32>#__uget
        (local.get $0)
        (local.get $1)))
    (drop
      (i32.const 0))
    (local.get $2)
  )

  (func $~lib/array/Array<u32>#__uget (type $t0) (param $0 i32) (param $1 i32) (result i32)
    (i32.load
      (i32.add
        (i32.load offset=4
          (local.get $0))
        (i32.shl
          (local.get $1)
          (i32.const 2))))
  )

  (func $~lib/array/Array<u32>#__get (type $t0) (param $0 i32) (param $1 i32) (result i32)
    (local $2 i32)
    (if $I0
      (i32.ge_u
        (local.get $1)
        (i32.load offset=12
          (local.get $0)))
      (then
        (call $~lib/builtins/abort
          (i32.const 192)
          (i32.const 256)
          (i32.const 104)
          (i32.const 42))
        (unreachable)))
    (local.set $2
      (call $~lib/array/Array<u32>#__uget
        (local.get $0)
        (local.get $1)))
    (drop
      (i32.const 0))
    (local.get $2)
  )

  (func $test/sum (export "sum") (type $t2) (result i32)
    (i32.add
      (i32.add
        (i32.add
          (call $~lib/array/Array<i32>#__get
            (global.get $g0)
            (i32.const 0))
          (call $~lib/array/Array<i32>#__get
            (global.get $g0)
            (i32.const 1)))
        (call $~lib/array/Array<u32>#__get
          (global.get $g1)
          (i32.const 0)))
      (call $~lib/array/Array<u32>#__get
        (global.get $g1)
        (i32.const 1)))
  )
)

wasm sample: dedup-issue.wasm.zip

Ideally it should keep only one function func $~lib/array/Array<i32>#__get, but currently it also include func $~lib/array/Array<u32>#__get after optimization.

Note: Unfortunately it could be reproduced only with our pipeline which use "duplicate-function-elimination" once at the beginning and once at the end of pipeline

kripken · 2020-11-13T17:08:37Z

@MaxGraey What are the steps to reproduce? I need at least the optimization and shrink level, to figure out what is being run in your pipeline.

I tried wasm-opt dedup-issue.wasm --duplicate-function-elimination -Os --duplicate-function-elimination and also with -O3 and I see no difference in either.

MaxGraey · 2020-11-13T17:17:37Z

Ah, right. Sorry it's really hard to reproduce. That's example not represent issue. I could share git diff with full test codegen: AssemblyScript/assemblyscript@773d191#diff-0c397325520a84df73070867f14d1d4bb93b2f1643c2b49e4fcbbccec5a45671

I will try extract minimal example again. But it seems will not minimalistic due to affect only to huge test cases

kripken · 2020-11-13T17:18:20Z

Hmm, it looks like func $~lib/array/Array<i32>#__get and $~lib/array/Array<u32>#__get are not identical. They are almost identical, but for

-   (call $~lib/array/Array<i32>#__uget
+   (call $~lib/array/Array<u32>#__uget

Those two functions, however, do look like duplicate function elimination should merge them. I'll look some more.

MaxGraey · 2020-11-13T17:20:54Z

Hmm, it looks like func $~lib/array/Array#__get and $~lib/array/Array#__get are not identical

Yes! They should fold to one $~lib/array/Array<i32>#__get but this doesn't happen sometimes. And I can't figure outing full reproduction context.

kripken · 2020-11-13T17:24:43Z

It looks like they are optimized properly. It does take multiple iterations in the pass. How many iterations it does depends on the optimization level. In opt level 3 or shrink level 1 it will do as many iterations as necessary. When I run that, I see it optimize those functions properly.

So maybe the issue is you run the pass with a low optimize or shrink level?

I've also meanwhile created a fuzzer to compare the output before and after this landed, and 5,000 iterations have found no difference.

MaxGraey · 2020-11-13T17:28:58Z

Nothing changed, just updated binaryen. optimized builds in AS is pretty close to -O4 but as I told before we use slightly modified pipeline when "duplicate-function-elimination" use twice (in the beginning and at the end of pipeline)

How many iterations it does depends on the optimization level.

We apply that passes only once. Only with --converge mode it may applied several times

MaxGraey · 2020-11-13T17:30:56Z

I could try restore original binaryen's pipeline and compare some of regressed tests again with different binaryen versions

kripken · 2020-11-13T17:42:39Z

Meanwhile the fuzzer seems to have found something on iteration 8,000 🤣 So hopefully I can look at a reduced case there. But if you can find one that would be good too.

MaxGraey · 2020-11-13T17:53:55Z

Great. I tried restore original pipeline and this issue not reproduced on that original pipeline. It seems need special combination of passes or it's ordering) So it seems only fuzzer could helps with that

We used to check if a load's sign matters before hashing it. If the load does not extend, then the sign doesn't matter, and we ignored the value there. It turns out that value could be garbage, as we didn't assign it in the binary reader, if it wasn't relevant. In the rewrite this was missed, and actually it's not really possible to do, since we have just a macro for the field, but not the object it is on - which there may be more than one. To fix this, just always assign the field. This is simpler anyhow, and avoids confusion not just here but probably when debugging. The testcase here is reduced from the fuzzer, and is not a 100% guarantee to catch a read of uninitialized memory, but it can't hurt, and with ASan it may be pretty consistent.

kripken added 9 commits November 9, 2020 12:19

Add a test

5d5a091

wip

88939b1

fix

af5f459

format

3b40fc7

builds

8c8d0bc

format

dbcded4

simpler

b80728c

simplify more

0390791

format

4b6a188

kripken requested review from aheejin and tlively November 9, 2020 22:35

aheejin reviewed Nov 10, 2020

View reviewed changes

Comment thread src/wasm-delegations-fields.h

Comment thread src/wasm-delegations-fields.h Outdated

Comment thread src/wasm-delegations-fields.h Outdated

Comment thread src/wasm-delegations-fields.h Outdated

Apply suggestions from code review

2adf162

Co-authored-by: Heejin Ahn <aheejin@gmail.com>

tlively reviewed Nov 10, 2020

View reviewed changes

Comment thread src/ir/ExpressionAnalyzer.cpp Outdated

kripken added 2 commits November 10, 2020 12:45

feedback and cleanup

496854c

Merge remote-tracking branch 'origin/fields' into fields

fea8eb2

aheejin reviewed Nov 11, 2020

View reviewed changes

kripken added 5 commits November 11, 2020 13:22

move docs

86587e8

more

8bd5bd8

Merge remote-tracking branch 'origin/master' into fields

85a888a

fix

84a6847

comment

b20f0bd

aheejin approved these changes Nov 11, 2020

View reviewed changes

kripken merged commit d0d96a8 into master Nov 11, 2020

kripken deleted the fields branch November 11, 2020 23:35

aheejin mentioned this pull request Nov 16, 2020

[Experiment] Try to fix ExpressionAnalyser for AS #3369

Closed

-#define DELEGATE_FIELD_SCOPE_NAME_DEF(id, name) noteScopeName(cast->name);
+#define DELEGATE_FIELD_SCOPE_NAME_DEF(id, name)                                \
+  noteScopeName(cast->name);                                                   \
+  visitScopeName(cast->name);

Conversation

kripken commented Nov 9, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aheejin Nov 11, 2020

Choose a reason for hiding this comment

Uh oh!

kripken Nov 11, 2020

Choose a reason for hiding this comment

Uh oh!

aheejin Nov 11, 2020

Choose a reason for hiding this comment

Uh oh!

kripken Nov 11, 2020

Choose a reason for hiding this comment

Uh oh!

aheejin Nov 11, 2020

Choose a reason for hiding this comment

Uh oh!

kripken Nov 11, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MaxGraey commented Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Nov 13, 2020

Uh oh!

MaxGraey commented Nov 13, 2020

Uh oh!

MaxGraey commented Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

yes, even hidden code blocks!

Uh oh!

kripken commented Nov 13, 2020

Uh oh!

MaxGraey commented Nov 13, 2020

Uh oh!

kripken commented Nov 13, 2020

Uh oh!

MaxGraey commented Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Nov 13, 2020

Uh oh!

MaxGraey commented Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaxGraey commented Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Nov 13, 2020

Uh oh!

MaxGraey commented Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MaxGraey commented Nov 13, 2020 •

edited

Loading

MaxGraey commented Nov 13, 2020 •

edited

Loading

MaxGraey commented Nov 13, 2020 •

edited

Loading

MaxGraey commented Nov 13, 2020 •

edited

Loading

MaxGraey commented Nov 13, 2020 •

edited

Loading

MaxGraey commented Nov 13, 2020 •

edited

Loading