Skip to content

fix and implement more unary ops#1442

Merged
kripken merged 8 commits into
WebAssembly:masterfrom
froydnj:unary-op-fixes
Feb 26, 2018
Merged

fix and implement more unary ops#1442
kripken merged 8 commits into
WebAssembly:masterfrom
froydnj:unary-op-fixes

Conversation

@froydnj

@froydnj froydnj commented Feb 23, 2018

Copy link
Copy Markdown
Contributor

This series takes care of all the easy unary ops, fixing ones that were broken and adding ones that weren't implemented. Float conversions and bit reinterpretations are left for another day.

Passes wasm2asm tests locally (node tests only, no spidermonkey).

Comment thread src/passes/I64ToI32Lowering.cpp Outdated
builder->makeUnary(EqZInt32, builder->makeGetLocal(highBits, i32)),
builder->makeUnary(EqZInt32, curr->value)
TempVar lowBits = getTemp();
SetLocal* setLow = builder->makeSetLocal(lowBits, curr->value);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding this. Why is a temporary needed here? Also, setLow just copies the value in curr->value, which is 64-bit, don't we need to truncate it to get the low bits?

@froydnj froydnj Feb 23, 2018

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My (limited) understanding is that the corresponding get_local generated later will perform the truncation. The reasons for this are not clear to me, but it is at least consistent with what all the other i64 lowering does.

The asm.js looks like this before the patch:

 function $5($0, $0$hi) {
  $0 = $0 | 0;
  $0$hi = $0$hi | 0;
  var i64toi32_i32$0 = 0, $3 = 0, $4 = 0, $5 = 0, $6 = 0, $7 = 0, $8 = 0, $9 = 0, $10 = 0;
  $4 = (i64toi32_i32$0 | 0) == (0 | 0);
  i64toi32_i32$0 = $0$hi;
  return $4 & ($0 | 0) == (0 | 0) | 0 | 0;
 }

Notice that we've completely dropped anything having to do with the high bits. After the patch (without the eqz (or x y) optimization introduced later in the patch series), it looks like:

 function $5($0, $0$hi) {
  $0 = $0 | 0;
  $0$hi = $0$hi | 0;
  var i64toi32_i32$0 = 0, i64toi32_i32$1 = 0, $4 = 0, $5 = 0, $6 = 0, $7 = 0, $8 = 0, $9 = 0, $10 = 0, $11 = 0, $12 = 0, $13 = 0, $14 = 0, wasm2asm_i32$0 = 0;
  return ($0$hi | 0) == (0 | 0) & ($0 | 0) == (0 | 0) | 0 | 0;
  return wasm2asm_i32$0 | 0;
 }

which is much more sensible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see, thanks. What I believe happens is the lowering was already done, so this node is 32-bit, and the high 32 bits are received in fetchOutParam.

If that's right, then you don't need the set_local/get_local (or the block) here, since the low bits are used exactly once, you can just use them directly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I believe there is no need for the temporary variable. It could be auto lowBits = curr->value;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that any different from the original code, which used curr->value directly?

@kripken kripken Feb 23, 2018

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original code got the part of receiving the low and high bits right, but it just did an AND operation on them, which your PR fixes, it should indeed be an EQZ of an OR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code wanted to do (and (eqz $hi) (eqz $lo)) which is a valid translation of i64.eqz: if both 32-bit parts are equal to zero, the 64-bit integer is equal to zero. But the code as written was generating the above asm.js, which was wrong.

Even with the (eqz (or $hi $lo)) algorithm, if you change it to use a bare curr->value:

    TempVar highBits = fetchOutParam(curr->value);

    Block* result = builder->blockify(
      builder->makeUnary(
        EqZInt32,
        builder->makeBinary(
          OrInt32,
          builder->makeGetLocal(highBits, i32),
          curr->value
        )
      )
    );

    replaceCurrent(result);

You get incorrect results, because the asm.js looks like:

 function $5($0, $0$hi) {
  $0 = $0 | 0;
  $0$hi = $0$hi | 0;
  var i64toi32_i32$0 = 0, $3 = 0, $4 = 0, $5 = 0, $6 = 0, $7 = 0, $8 = 0, $9 = 0, $10 = 0, $11 = 0;
  $3 = i64toi32_i32$0;
  i64toi32_i32$0 = $0$hi;
  return ($3 | $0 | 0 | 0) == (0 | 0) | 0;
 }

which is pretty bogus.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's due to an ordering bug, see 3572e57 . The issue is that curr->value sets up both the low and high bits, so it needs to execute first before we use either. And in an or the order doesn't matter so we can just flip it.

Btw, I think it's easier to understand looking at the pass by itself as in that commit, and testing the pass directly, what do you think?

Comment thread src/passes/I64ToI32Lowering.cpp Outdated
builder->makeUnary(EqZInt32, builder->makeGetLocal(highBits, i32)),
builder->makeUnary(EqZInt32, curr->value)

Block* result = builder->blockify(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the block here, can just do auto* result = and then the makeUnary

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this kind of gets to my point, that when the low bits are just used once, the situation is much simpler - no need for a temp, no need for a block, which makes it shorter to write and at least in my opinion easier to understand.

but reading the other cases, i do see your point that in many or most you do need to read the low bits more than once, in which case for consistency maybe it's better for this to be that way too. i don't feel strongly either way, up to you.

@kripken

kripken commented Feb 26, 2018

Copy link
Copy Markdown
Member

Looks good to me. Should we merge this or did you have more thoughts on refactoring?

@froydnj

froydnj commented Feb 26, 2018

Copy link
Copy Markdown
Contributor Author

Should we merge this or did you have more thoughts on refactoring?

I have no additional thoughts on refactoring. I can see the point about shorter/simpler code though; I think there are fewer local variables generated with this version than my original blocked version.

@kripken

kripken commented Feb 26, 2018

Copy link
Copy Markdown
Member

Ok, cool. Let's merge this. We can always improve things later if we find a good way.

@kripken kripken merged commit 8bbf9b7 into WebAssembly:master Feb 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants