Skip to content

Commit 353e2c4

Browse files
authored
Stack IR (WebAssembly#1623)
This adds a new IR, "Stack IR". This represents wasm at a very low level, as a simple stream of instructions, basically the same as wasm's binary format. This is unlike Binaryen IR which is structured and in a tree format. This gives some small wins on binary sizes, less than 1% in most cases, usually 0.25-0.50% or so. That's not much by itself, but looking forward this prepares us for multi-value, which we really need an IR like this to be able to optimize well. Also, it's possible there is more we can do already - currently there are just a few stack IR optimizations implemented, DCE local2stack - check if a set_local/get_local pair can be removed, which keeps the set's value on the stack, which if the stars align it can be popped instead of the get. Block removal - remove any blocks with no branches, as they are valid in wasm binary format. Implementation-wise, the IR is defined in wasm-stack.h. A new StackInst is defined, representing a single instruction. Most are simple reflections of Binaryen IR (an add, a load, etc.), and just pointers to them. Control flow constructs are expanded into multiple instructions, like a block turns into a block begin and end, and we may also emit extra unreachables to handle the fact Binaryen IR has unreachable blocks/ifs/loops but wasm does not. Overall, all the Binaryen IR differences with wasm vanish on the way to stack IR. Where this IR lives: Each Function now has a unique_ptr to stack IR, that is, a function may have stack IR alongside the main IR. If the stack IR is present, we write it out during binary writing; if not, we do the same binaryen IR => wasm binary process as before (this PR should not affect speed there). This design lets us use normal Passes on stack IR, in particular this PR defines 3 passes: Generate stack IR Optimize stack IR (might be worth splitting out into separate passes eventually) Print stack IR for debugging purposes Having these as normal passes is convenient as then they can run in parallel across functions and all the other conveniences of our current Pass system. However, a downside of keeping the second IR as an option on Functions, and using normal Passes to operate on it, means that we may get out of sync: if you generate stack IR, then modify binaryen IR, then the stack IR may no longer be valid (for example, maybe you removed locals or modified instructions in place etc.). To avoid that, Passes now define if they modify Binaryen IR or not; if they do, we throw away the stack IR. Miscellaneous notes: Just writing Stack IR, then writing to binary - no optimizations - is 20% slower than going directly to binary, which is one reason why we still support direct writing. This does lead to some "fun" C++ template code to make that convenient: there is a single StackWriter class, templated over the "mode", which is either Binaryen2Binary (direct writing), Binaryen2Stack, or Stack2Binary. This avoids a lot of boilerplate as the 3 modes share a lot of code in overlapping ways. Stack IR does not support source maps / debug info. We just don't use that IR if debug info is present. A tiny text format comment (if emitting non-minified text) indicates stack IR is present, if it is ((; has Stack IR ;)). This may help with debugging, just in case people forget. There is also a pass to print out the stack IR for debug purposes, as mentioned above. The sieve binaryen.js test was actually not validating all along - these new opts broke it in a more noticeable manner. Fixed. Added extra checks in pass-debug mode, to verify that if stack IR should have been thrown out, it was. This should help avoid any confusion with the IR being invalid. Added a comment about the possible future of stack IR as the main IR, depending on optimization results, following some discussion earlier today.
1 parent 0483d81 commit 353e2c4

106 files changed

Lines changed: 5323 additions & 1740 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ There are a few differences between Binaryen IR and the WebAssembly language:
4848

4949
As a result, you might notice that round-trip conversions (wasm => Binaryen IR => wasm) change code a little in some corner cases.
5050

51+
* When optimizing Binaryen uses an additional IR, Stack IR (see `src/wasm-stack.h`). Stack IR allows a bunch of optimizations that are tailored for the stack machine form of WebAssembly's binary format (but Stack IR is less efficient for general optimizations than the main Binaryen IR). If you have a wasm file that has been particularly well-optimized, a simple round-trip conversion (just read and write, without optimization) may cause more noticeable differences, as Binaryen fits it into Binaryen IR's more structured format. If you also optimize during the round-trip conversion then Stack IR opts will be run and the final wasm will be better optimized.
52+
5153
Notes when working with Binaryen IR:
5254

5355
* As mentioned above, Binaryen IR has a tree structure. As a result, each expression should have exactly one parent - you should not "reuse" a node by having it appear more than once in the tree. The motivation for this limitation is that when we optimize we modify nodes, so if they appear more than once in the tree, a change in one place can appear in another incorrectly.

build-js.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,7 @@ echo "building shared bitcode"
131131
$BINARYEN_SRC/passes/SimplifyLocals.cpp \
132132
$BINARYEN_SRC/passes/SpillPointers.cpp \
133133
$BINARYEN_SRC/passes/SSAify.cpp \
134+
$BINARYEN_SRC/passes/StackIR.cpp \
134135
$BINARYEN_SRC/passes/TrapMode.cpp \
135136
$BINARYEN_SRC/passes/Untee.cpp \
136137
$BINARYEN_SRC/passes/Vacuum.cpp \

src/ir/hashed.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,6 @@ class HashedExpressionMap : public std::unordered_map<HashedExpression, T, Expre
5858
struct FunctionHasher : public WalkerPass<PostWalker<FunctionHasher>> {
5959
bool isFunctionParallel() override { return true; }
6060

61-
typedef uint32_t HashType;
62-
6361
struct Map : public std::map<Function*, HashType> {};
6462

6563
FunctionHasher(Map* output) : output(output) {}

src/ir/module-utils.h

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,23 @@ struct BinaryIndexes {
5252
}
5353
};
5454

55+
inline Function* copyFunction(Function* func, Module& out) {
56+
auto* ret = new Function();
57+
ret->name = func->name;
58+
ret->result = func->result;
59+
ret->params = func->params;
60+
ret->vars = func->vars;
61+
ret->type = Name(); // start with no named type; the names in the other module may differ
62+
ret->localNames = func->localNames;
63+
ret->localIndices = func->localIndices;
64+
ret->debugLocations = func->debugLocations;
65+
ret->body = ExpressionManipulator::copy(func->body, out);
66+
// TODO: copy Stack IR
67+
assert(!func->stackIR);
68+
out.addFunction(ret);
69+
return ret;
70+
}
71+
5572
inline void copyModule(Module& in, Module& out) {
5673
// we use names throughout, not raw points, so simple copying is fine
5774
// for everything *but* expressions
@@ -65,9 +82,7 @@ inline void copyModule(Module& in, Module& out) {
6582
out.addExport(new Export(*curr));
6683
}
6784
for (auto& curr : in.functions) {
68-
auto* func = new Function(*curr);
69-
func->body = ExpressionManipulator::copy(func->body, out);
70-
out.addFunction(func);
85+
copyFunction(curr.get(), out);
7186
}
7287
for (auto& curr : in.globals) {
7388
out.addGlobal(new Global(*curr));
@@ -85,19 +100,6 @@ inline void copyModule(Module& in, Module& out) {
85100
out.debugInfoFileNames = in.debugInfoFileNames;
86101
}
87102

88-
inline Function* copyFunction(Module& in, Module& out, Name name) {
89-
Function *ret = out.getFunctionOrNull(name);
90-
if (ret != nullptr) {
91-
return ret;
92-
}
93-
auto* curr = in.getFunction(name);
94-
auto* func = new Function(*curr);
95-
func->body = ExpressionManipulator::copy(func->body, out);
96-
func->type = Name();
97-
out.addFunction(func);
98-
return func;
99-
}
100-
101103
} // namespace ModuleUtils
102104

103105
} // namespace wasm

src/ir/utils.h

Lines changed: 48 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ struct ReFinalize : public WalkerPass<PostWalker<ReFinalize, OverriddenVisitor<R
8888

8989
std::map<Name, Type> breakValues;
9090

91-
void visitBlock(Block *curr) {
91+
void visitBlock(Block* curr) {
9292
if (curr->list.size() == 0) {
9393
curr->type = none;
9494
return;
@@ -129,42 +129,42 @@ struct ReFinalize : public WalkerPass<PostWalker<ReFinalize, OverriddenVisitor<R
129129
}
130130
}
131131
}
132-
void visitIf(If *curr) { curr->finalize(); }
133-
void visitLoop(Loop *curr) { curr->finalize(); }
134-
void visitBreak(Break *curr) {
132+
void visitIf(If* curr) { curr->finalize(); }
133+
void visitLoop(Loop* curr) { curr->finalize(); }
134+
void visitBreak(Break* curr) {
135135
curr->finalize();
136136
updateBreakValueType(curr->name, getValueType(curr->value));
137137
}
138-
void visitSwitch(Switch *curr) {
138+
void visitSwitch(Switch* curr) {
139139
curr->finalize();
140140
auto valueType = getValueType(curr->value);
141141
for (auto target : curr->targets) {
142142
updateBreakValueType(target, valueType);
143143
}
144144
updateBreakValueType(curr->default_, valueType);
145145
}
146-
void visitCall(Call *curr) { curr->finalize(); }
147-
void visitCallImport(CallImport *curr) { curr->finalize(); }
148-
void visitCallIndirect(CallIndirect *curr) { curr->finalize(); }
149-
void visitGetLocal(GetLocal *curr) { curr->finalize(); }
150-
void visitSetLocal(SetLocal *curr) { curr->finalize(); }
151-
void visitGetGlobal(GetGlobal *curr) { curr->finalize(); }
152-
void visitSetGlobal(SetGlobal *curr) { curr->finalize(); }
153-
void visitLoad(Load *curr) { curr->finalize(); }
154-
void visitStore(Store *curr) { curr->finalize(); }
155-
void visitAtomicRMW(AtomicRMW *curr) { curr->finalize(); }
156-
void visitAtomicCmpxchg(AtomicCmpxchg *curr) { curr->finalize(); }
146+
void visitCall(Call* curr) { curr->finalize(); }
147+
void visitCallImport(CallImport* curr) { curr->finalize(); }
148+
void visitCallIndirect(CallIndirect* curr) { curr->finalize(); }
149+
void visitGetLocal(GetLocal* curr) { curr->finalize(); }
150+
void visitSetLocal(SetLocal* curr) { curr->finalize(); }
151+
void visitGetGlobal(GetGlobal* curr) { curr->finalize(); }
152+
void visitSetGlobal(SetGlobal* curr) { curr->finalize(); }
153+
void visitLoad(Load* curr) { curr->finalize(); }
154+
void visitStore(Store* curr) { curr->finalize(); }
155+
void visitAtomicRMW(AtomicRMW* curr) { curr->finalize(); }
156+
void visitAtomicCmpxchg(AtomicCmpxchg* curr) { curr->finalize(); }
157157
void visitAtomicWait(AtomicWait* curr) { curr->finalize(); }
158158
void visitAtomicWake(AtomicWake* curr) { curr->finalize(); }
159-
void visitConst(Const *curr) { curr->finalize(); }
160-
void visitUnary(Unary *curr) { curr->finalize(); }
161-
void visitBinary(Binary *curr) { curr->finalize(); }
162-
void visitSelect(Select *curr) { curr->finalize(); }
163-
void visitDrop(Drop *curr) { curr->finalize(); }
164-
void visitReturn(Return *curr) { curr->finalize(); }
165-
void visitHost(Host *curr) { curr->finalize(); }
166-
void visitNop(Nop *curr) { curr->finalize(); }
167-
void visitUnreachable(Unreachable *curr) { curr->finalize(); }
159+
void visitConst(Const* curr) { curr->finalize(); }
160+
void visitUnary(Unary* curr) { curr->finalize(); }
161+
void visitBinary(Binary* curr) { curr->finalize(); }
162+
void visitSelect(Select* curr) { curr->finalize(); }
163+
void visitDrop(Drop* curr) { curr->finalize(); }
164+
void visitReturn(Return* curr) { curr->finalize(); }
165+
void visitHost(Host* curr) { curr->finalize(); }
166+
void visitNop(Nop* curr) { curr->finalize(); }
167+
void visitUnreachable(Unreachable* curr) { curr->finalize(); }
168168

169169
void visitFunction(Function* curr) {
170170
// we may have changed the body from unreachable to none, which might be bad
@@ -197,33 +197,33 @@ struct ReFinalize : public WalkerPass<PostWalker<ReFinalize, OverriddenVisitor<R
197197
// Re-finalize a single node. This is slow, if you want to refinalize
198198
// an entire ast, use ReFinalize
199199
struct ReFinalizeNode : public OverriddenVisitor<ReFinalizeNode> {
200-
void visitBlock(Block *curr) { curr->finalize(); }
201-
void visitIf(If *curr) { curr->finalize(); }
202-
void visitLoop(Loop *curr) { curr->finalize(); }
203-
void visitBreak(Break *curr) { curr->finalize(); }
204-
void visitSwitch(Switch *curr) { curr->finalize(); }
205-
void visitCall(Call *curr) { curr->finalize(); }
206-
void visitCallImport(CallImport *curr) { curr->finalize(); }
207-
void visitCallIndirect(CallIndirect *curr) { curr->finalize(); }
208-
void visitGetLocal(GetLocal *curr) { curr->finalize(); }
209-
void visitSetLocal(SetLocal *curr) { curr->finalize(); }
210-
void visitGetGlobal(GetGlobal *curr) { curr->finalize(); }
211-
void visitSetGlobal(SetGlobal *curr) { curr->finalize(); }
212-
void visitLoad(Load *curr) { curr->finalize(); }
213-
void visitStore(Store *curr) { curr->finalize(); }
200+
void visitBlock(Block* curr) { curr->finalize(); }
201+
void visitIf(If* curr) { curr->finalize(); }
202+
void visitLoop(Loop* curr) { curr->finalize(); }
203+
void visitBreak(Break* curr) { curr->finalize(); }
204+
void visitSwitch(Switch* curr) { curr->finalize(); }
205+
void visitCall(Call* curr) { curr->finalize(); }
206+
void visitCallImport(CallImport* curr) { curr->finalize(); }
207+
void visitCallIndirect(CallIndirect* curr) { curr->finalize(); }
208+
void visitGetLocal(GetLocal* curr) { curr->finalize(); }
209+
void visitSetLocal(SetLocal* curr) { curr->finalize(); }
210+
void visitGetGlobal(GetGlobal* curr) { curr->finalize(); }
211+
void visitSetGlobal(SetGlobal* curr) { curr->finalize(); }
212+
void visitLoad(Load* curr) { curr->finalize(); }
213+
void visitStore(Store* curr) { curr->finalize(); }
214214
void visitAtomicRMW(AtomicRMW* curr) { curr->finalize(); }
215215
void visitAtomicCmpxchg(AtomicCmpxchg* curr) { curr->finalize(); }
216216
void visitAtomicWait(AtomicWait* curr) { curr->finalize(); }
217217
void visitAtomicWake(AtomicWake* curr) { curr->finalize(); }
218-
void visitConst(Const *curr) { curr->finalize(); }
219-
void visitUnary(Unary *curr) { curr->finalize(); }
220-
void visitBinary(Binary *curr) { curr->finalize(); }
221-
void visitSelect(Select *curr) { curr->finalize(); }
222-
void visitDrop(Drop *curr) { curr->finalize(); }
223-
void visitReturn(Return *curr) { curr->finalize(); }
224-
void visitHost(Host *curr) { curr->finalize(); }
225-
void visitNop(Nop *curr) { curr->finalize(); }
226-
void visitUnreachable(Unreachable *curr) { curr->finalize(); }
218+
void visitConst(Const* curr) { curr->finalize(); }
219+
void visitUnary(Unary* curr) { curr->finalize(); }
220+
void visitBinary(Binary* curr) { curr->finalize(); }
221+
void visitSelect(Select* curr) { curr->finalize(); }
222+
void visitDrop(Drop* curr) { curr->finalize(); }
223+
void visitReturn(Return* curr) { curr->finalize(); }
224+
void visitHost(Host* curr) { curr->finalize(); }
225+
void visitNop(Nop* curr) { curr->finalize(); }
226+
void visitUnreachable(Unreachable* curr) { curr->finalize(); }
227227

228228
void visitFunctionType(FunctionType* curr) { WASM_UNREACHABLE(); }
229229
void visitImport(Import* curr) { WASM_UNREACHABLE(); }

src/pass.h

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,10 @@ struct PassOptions {
7676
ret.setDefaultOptimizationOptions();
7777
return ret;
7878
}
79+
80+
static PassOptions getWithoutOptimization() {
81+
return PassOptions(); // defaults are to not optimize
82+
}
7983
};
8084

8185
//
@@ -137,6 +141,9 @@ struct PassRunner {
137141
// Adds the default optimization passes that work on
138142
// entire modules as a whole, and make sense to
139143
// run after function passes.
144+
// This is run at the very end of the optimization
145+
// process - you can assume no other opts will be run
146+
// afterwards.
140147
void addDefaultGlobalOptimizationPostPasses();
141148

142149
// Run the passes on the module
@@ -174,7 +181,16 @@ struct PassRunner {
174181
private:
175182
void doAdd(Pass* pass);
176183

184+
void runPass(Pass* pass);
177185
void runPassOnFunction(Pass* pass, Function* func);
186+
187+
// After running a pass, handle any changes due to
188+
// how the pass is defined, such as clearing away any
189+
// temporary data structures that the pass declares it
190+
// invalidates.
191+
// If a function is passed, we operate just on that function;
192+
// otherwise, the whole module.
193+
void handleAfterEffects(Pass* pass, Function* func=nullptr);
178194
};
179195

180196
//
@@ -223,6 +239,14 @@ class Pass {
223239
// this will create the parent class.
224240
virtual Pass* create() { WASM_UNREACHABLE(); }
225241

242+
// Whether this pass modifies the Binaryen IR in the module. This is true for
243+
// most passes, except for passes that have no side effects, or passes that
244+
// only modify other things than Binaryen IR (for example, the Stack IR
245+
// passes only modify that IR).
246+
// This property is important as if Binaryen IR is modified, we need to throw
247+
// out any Stack IR - it would need to be regenerated and optimized.
248+
virtual bool modifiesBinaryenIR() { return true; }
249+
226250
std::string name;
227251

228252
protected:

src/passes/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ SET(passes_SOURCES
3232
Precompute.cpp
3333
Print.cpp
3434
PrintCallGraph.cpp
35+
StackIR.cpp
3536
RedundantSetElimination.cpp
3637
RelooperJumpThreading.cpp
3738
ReReloop.cpp

src/passes/I64ToI32Lowering.cpp

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
#include "emscripten-optimizer/istring.h"
2828
#include "support/name.h"
2929
#include "wasm-builder.h"
30+
#include "ir/module-utils.h"
3031
#include "ir/names.h"
3132
#include "asmjs/shared-constants.h"
3233

@@ -143,19 +144,20 @@ struct I64ToI32Lowering : public WalkerPass<PostWalker<I64ToI32Lowering>> {
143144
highBitVars.clear();
144145
labelHighBitVars.clear();
145146
freeTemps.clear();
146-
Function oldFunc(*func);
147+
Module temp;
148+
auto* oldFunc = ModuleUtils::copyFunction(func, temp);
147149
func->params.clear();
148150
func->vars.clear();
149151
func->localNames.clear();
150152
func->localIndices.clear();
151153
Index newIdx = 0;
152-
Names::ensureNames(&oldFunc);
153-
for (Index i = 0; i < oldFunc.getNumLocals(); ++i) {
154-
assert(oldFunc.hasLocalName(i));
155-
Name lowName = oldFunc.getLocalName(i);
154+
Names::ensureNames(oldFunc);
155+
for (Index i = 0; i < oldFunc->getNumLocals(); ++i) {
156+
assert(oldFunc->hasLocalName(i));
157+
Name lowName = oldFunc->getLocalName(i);
156158
Name highName = makeHighName(lowName);
157-
Type paramType = oldFunc.getLocalType(i);
158-
auto builderFunc = (i < oldFunc.getVarIndexBase()) ?
159+
Type paramType = oldFunc->getLocalType(i);
160+
auto builderFunc = (i < oldFunc->getVarIndexBase()) ?
159161
Builder::addParam :
160162
static_cast<Index (*)(Function*, Name, Type)>(Builder::addVar);
161163
if (paramType == i64) {

src/passes/MemoryPacking.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ namespace wasm {
2424
const Index OVERHEAD = 8;
2525

2626
struct MemoryPacking : public Pass {
27+
bool modifiesBinaryenIR() override { return false; }
28+
2729
void run(PassRunner* runner, Module* module) override {
2830
if (!module->memory.exists) return;
2931
std::vector<Memory::Segment> packed;

src/passes/Metrics.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ static Counts lastCounts;
3232

3333
// Prints metrics between optimization passes.
3434
struct Metrics : public WalkerPass<PostWalker<Metrics, UnifiedExpressionVisitor<Metrics>>> {
35+
bool modifiesBinaryenIR() override { return false; }
36+
3537
bool byFunction;
3638

3739
Counts counts;

0 commit comments

Comments
 (0)