Skip to content

TypeBuilder#3418

Merged
tlively merged 8 commits into
WebAssembly:masterfrom
tlively:type-builder
Dec 11, 2020
Merged

TypeBuilder#3418
tlively merged 8 commits into
WebAssembly:masterfrom
tlively:type-builder

Conversation

@tlively

@tlively tlively commented Dec 3, 2020

Copy link
Copy Markdown
Member

Introduce TypeBuilder, a utility for constructing heap types in terms of other
heap types that may have not yet been defined. Internally, it works by using new
TypeInfo variants that contain pointers to HeapTypes rather than HeapType
values. That lets the TypeBuilder create temporary types that can refer to
uninitialized heap types owned by the TypeBuilder. Those temporary types can in
turn be used to initialize the very heap types they refer to. Since temporary
types are only valid for the lifetime of their TypeBuilder, there is a
canonicalization step that converts the temporary types reachable from the
TypeBuilder's heap types into globally interned canonical types. Only after that
canonicalization step can the heap types be used to construct non-temporary
types.

This PR allows Types to be built in terms of as of yet undefined heap types, but
it currently errors out in the presence of recursive types. Supporting recursive
types will require further work to canonicalize them into finite, acyclic
representations. Currently any attempt to compare recursive types would
infinitely recurse.

cc @dcodeIO

Introduce TypeBuilder, a utility for constructing heap types in terms of other
heap types that may have not yet been defined. Internally, it works by using new
TypeInfo variants that contain pointers to HeapTypes rather than HeapType
values. That lets the TypeBuilder create temporary types that can refer to
uninitialized heap types owned by the TypeBuilder. Those temporary types can in
turn be used to initialize the very heap types they refer to. Since temporary
types are only valid for the lifetime of their TypeBuilder, there is a
canonicalization step that converts the temporary types reachable from the
TypeBuilder's heap types into globally interned canonical types. Only after that
canonicalization step can the heap types be used to construct non-temporary
types.

This PR allows Types to be built in terms of as of yet undefined heap types, but
it currently errors out in the presence of recursive types. Supporting recursive
types will require further work to canonicalize them into finite, acyclic
representations. Currently any attempt to compare recursive types would
infinitely recurse.
@tlively tlively requested review from aheejin and kripken December 3, 2020 03:53

@kripken kripken left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and just to make sure about the overall plan: This PR does not modify module-utils.h, but your next PR would?

Comment thread src/wasm-type.h
Comment thread src/wasm/wasm-type.cpp Outdated
struct TempRtt {
uint32_t depth;
HeapType* heapType;
};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder once again if maybe HeapTypes should always be by a pointer. That is, that on line 37/40 we'd have HeapType* instead of HeapType.

I may be missing something, but it seems like that would make it simpler to implement the TypeBuilder, as we wouldn't need to add TempRef and TempRtt. And, separately, it would also avoid the risk of copies being passed around.

If memory safety is a concern here, then our usage pattern is very simple - allocate types when creating a wasm, and then use them. We could just store them in a global arena that is freed on process exit, like strings are. Note that that is a separate question from whether we intern them or not (which I am not sure about either way).

One of my concerns about TempRef and TempRtt, more specifically, is that they end up intruding in various places throughout this file. If somehow the TypeBuilder could encapsulate that complexity into itself then that would be another option to think about.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll experiment with this.

Comment thread src/wasm/wasm-type.cpp
@tlively

tlively commented Dec 3, 2020

Copy link
Copy Markdown
Member Author

Oh, and just to make sure about the overall plan: This PR does not modify module-utils.h, but your next PR would?

The next PR would use TypeBuilder in the binary and text parsers. The PR after that would update the type finding code in module-utils.h to do a full type traversal and would remove the code that does the almost-topographical sort of the types.

tlively added a commit to tlively/binaryen that referenced this pull request Dec 5, 2020
Interns HeapTypes using the same patterns and utilities already used to intern
Types. This allows HeapTypes to efficiently be compared for equality and hashed,
which may be important for very large struct types in the future. This change
also has the benefit of increasing symmetry between the APIs of Type and
HeapType, which will make the developer experience more consistent. Finally,
this change will make TypeBuilder (WebAssembly#3418) much simpler because it will no longer
have to introduce TypeInfo variants to refer to HeapTypes indirectly.
tlively added a commit that referenced this pull request Dec 8, 2020
Interns HeapTypes using the same patterns and utilities already used to intern
Types. This allows HeapTypes to efficiently be compared for equality and hashed,
which may be important for very large struct types in the future. This change
also has the benefit of increasing symmetry between the APIs of Type and
HeapType, which will make the developer experience more consistent. Finally,
this change will make TypeBuilder (#3418) much simpler because it will no longer
have to introduce TypeInfo variants to refer to HeapTypes indirectly.
@tlively

tlively commented Dec 9, 2020

Copy link
Copy Markdown
Member Author

@kripken I rewrote this on top of #3428 and expanded many of the comments. I think it's much cleaner now 👍 PTAL!

Comment thread src/wasm-type.h
void setHeapType(size_t i, Signature signature);
void setHeapType(size_t i, const Struct& struct_);
void setHeapType(size_t i, Struct&& struct_);
void setHeapType(size_t i, Array array);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct/array asymmetry feels a little wrong to me each time I see it. May be worth thinking more about this eventually.

Comment thread src/wasm/wasm-type.cpp
HeapTypeStore globalHeapTypeStore;

// Specialized to simplify programming generically over Types and HeapTypes.
template<typename T> struct MetaTypeInfo {};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR, but I am wondering more and more if we shouldn't merge Type and HeapType perhaps.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is useful to maintain the distinction between them, despite their similarities. They're certainly not interchangeable in the public API.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point that they matter in the public API. But OTOH, the public API also differentiates struct from array types, but we don't have a StructType or ArrayType. In other words, we could have a single Type, with a isHeap() like we have isTuple(),isRef(),isRtt() etc. Maybe I'm overthinking this though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think having them separately is clearer. They represent fundamentally different kinds of type: Type is what every value and expression has, and HeapType is what we list in the type section, and they cannot be used interchangeably and one is not a subtype of the other. On the other hand, all of tuple, ref, and rtt are subtypes of Type.

Comment thread src/wasm/wasm-type.cpp Outdated
};

// IDs of scanned Types and HeapTypes, used to prevent repeated scanning.
std::unordered_set<uint64_t> scanned;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe an ID type for uint64_t?

Comment thread src/wasm/wasm-type.cpp

struct Entry {
bool initialized = false;
HeapTypeInfo info = Signature();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there Signature here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HeapTypeInfo does not have a default initializer, but we want to be able to default initialize Entry, so we have to give info an explicit default value. I'll add a comment clarifying that it is arbitrary.

Comment thread src/wasm/wasm-type.cpp
// replacing and deduplicating the temporary type and heaptypes backed by
// storage owned by the TypeBuilder into normal types and heap types backed by
// the global stores.
struct Canonicalizer {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could maybe be in an anonymous namespace?

Comment thread src/wasm/wasm-type.cpp Outdated
}
}

void Canonicalizer::makeReachabilityFixpoint() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't "Fixed Point" the term? Or is "fixpoint" a common shorthand?

Suggested change
void Canonicalizer::makeReachabilityFixpoint() {
void Canonicalizer::makeReachabilityFixedPoint() {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wikipedia article starts "In mathematics, a fixed point (sometimes shortened to fixpoint" so I think we could go either way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, not sure how I missed that in the link...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem :) I'll go with your suggestion since it is indeed the more standard name.

Comment thread src/wasm/wasm-type.cpp
void scanType(Type* child);
void makeReachabilityFixpoint();
template<typename T>
void canonicalize(T* type, std::unordered_map<T, T>& canonicals);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment on this method? the out param is not immediately obvious.

Comment thread src/wasm/wasm-type.cpp
}

Type TypeBuilder::getTempTupleType(const Tuple& tuple) {
return impl->typeStore.canonicalize(tuple);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? For a Ref type there is an index i, which gives us the necessary indirection, but here the Tuple contains actual Types..? (There also doesn't seem to be a test for getTempTupleType.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for when you need to create a tuple of temporary types, for example when constructing a function signature. You can't create a normal tuple type for that because that would allow references to the temporary types to leak out into the global type store. In the worst case, a signature type created later could be canonicalized to a type that refers to the original temporary types which have long since become invalid.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see. Is this an internal API then (and therefore not tested as part of the public API)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, wait, the user needs to create those tuples, for each function..?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, users need to do this. I forgot to say that I will add a test for this 👍

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why are tuples different than everything else, though? The others all get size_t i.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuples don't have associated HeapTypes, so they don't need to take a heap type index. They still need to be backed by the TypeBuilder's TypeStore, though.

@aheejin aheejin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't finished reading, but some initial basic questions.. I might be missing some key understanding.

(ref $sig) => (ref (func (param (ref (struct (field (ref null (array (mut externref))) (mut (ref extern)))))) (result (ref (array (mut externref))) i32)))
(ref $struct) => (ref (struct (field (ref null (array (mut externref))) (mut (ref extern)))))
(ref $array) => (ref (array (mut externref)))
(ref null $array) => (ref null (array (mut externref)))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 'building types" do? These seems to be the same as "After setting heap types" already. Also I wonder how can refStruct be already referring to the array type at this point, given that it was constructed with a dummy empty refArray?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "building" step replaces the temporary types and heap types with permanent types and heap types. There is no way to differentiate between temporary and permanent types while printing, unfortunately.

refStruct was initialized by line builder.setHeapType(1, struct_);, which replaces the HeapTypeInfo backing the HeapType that refStruct refers to. This mutability of HeapTypeInfos is the key feature of the TypeBuilder that allows it to create types before the types they depend on have been created.

Comment thread test/example/type-builder.cpp Outdated
Type refStruct = builder.getTempRefType(1, false);
Type refArray = builder.getTempRefType(2, false);
Type refNullArray = builder.getTempRefType(2, true);
Type refExt(HeapType::ext, false);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but why is this not extern and ext? I think it would be less confusing to follow the spec's names.. (Not necessarily in this PR though)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we can't use extern because it is a keyword. Between the alternatives extern_ and ext I thought ext looked better, but perhaps the ugliness of extern would be worth the clarity. @kripken, what do you think about this? I'd be happy to change it if we collectively think extern_ would be better.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference to extern_.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll make this change in a followup 👍

Comment thread src/wasm/wasm-type.cpp
Type TypeBuilder::getTempRttType(size_t i, uint32_t depth) {
assert(i < impl->entries.size() && "Index out of bounds");
return impl->typeStore.canonicalize(Rtt(depth, impl->entries[i].get()));
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add tests that use this method too?

Comment on lines +52 to +55
assert(refSig != newRefSig);
assert(refStruct != newRefStruct);
assert(refArray != newRefArray);
assert(refNullArray != newRefNullArray);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should these be different? The printing results in type-builder.txt look the same..

@tlively tlively Dec 10, 2020

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types on the left are temporary types whose lifetime ends with the TypeBuilder and the types on the right are permanent globally canonical types.

Comment thread src/wasm-type.h
// temporary types.
struct TypeBuilder {
struct Impl;
std::unique_ptr<Impl> impl;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just Impl impl;?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to hide the definition of Impl in the .cpp file because it depends on implementation details like HeapTypeInfo. We can't do that and have Impl impl; because hiding the definition necessarily makes Impl and incomplete type at this point. This is the classic PImpl idiom.

Comment thread src/wasm/wasm-type.cpp
Type* type;
HeapType* heapType;
};
Item(Type* type) : kind(TypeKind), type(type) {}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this for Type too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, notice that scanHeapType records Type*s to scan in the future and scanType records both HeapType* and Type* (the latter in the tuple case).

Comment thread src/wasm/wasm-type.cpp Outdated
// The work list of Types and HeapTypes remaining to be scanned.
std::vector<Item> scanList;

// The list of Types and HeapTypes to visit (in forward preorder).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different comments say different things, such as postorder, reverse postorder, and forward preorder. Are they referring to the same thing or different things?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry I can clear that up. The list stored in forward preorder as this comment says, but then it is finally traversed from back to front, giving reverse postorder. I'll clarify that here and use "reverse postorder" everywhere I am currently just using "postorder."

Comment thread src/wasm/wasm-type.cpp
};

// IDs of scanned Types and HeapTypes, used to prevent repeated scanning.
std::unordered_set<TypeID> scanned;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just Type instead of TypeID? TypeID or the fact that Type is an interned ID seems to be an implementation detail of Type class. The same for reaches map below.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it needs to refer to HeapTypes as well. This gives a simple way to store both in the same collection where we don't need to be able to differentiate them because we know that no compound Type and HeapType will ever have the same ID. I used the same trick in reaches.

Comment thread test/example/type-builder.cpp Outdated
Comment on lines +13 to +14
// (type $struct (struct (field (ref null $array) (mut ref extern))))
// (type $array (array (mut ref null extern)))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// (type $struct (struct (field (ref null $array) (mut ref extern))))
// (type $array (array (mut ref null extern)))
// (type $struct (struct (field (ref null $array) (mut externref))))
// (type $array (array (mut externref)))

Nit: For simplicity (and also to match the output below)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks! I'll replace the second one (ref null extern), but the first one ref extern is not the same as externref because it is non-nullable.

@tlively

tlively commented Dec 10, 2020

Copy link
Copy Markdown
Member Author

I believe I have addressed all of the feedback so far 👍

@kripken

kripken commented Dec 10, 2020

Copy link
Copy Markdown
Member

Nice work!

I have some thoughts for possible followups as mentioned earlier, but nothing urgent, and we can discuss those separately.

@tlively tlively merged commit e1978e0 into WebAssembly:master Dec 11, 2020
@tlively tlively deleted the type-builder branch December 11, 2020 00:42
@tlively tlively mentioned this pull request Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants