TypeBuilder#3418
Conversation
Introduce TypeBuilder, a utility for constructing heap types in terms of other heap types that may have not yet been defined. Internally, it works by using new TypeInfo variants that contain pointers to HeapTypes rather than HeapType values. That lets the TypeBuilder create temporary types that can refer to uninitialized heap types owned by the TypeBuilder. Those temporary types can in turn be used to initialize the very heap types they refer to. Since temporary types are only valid for the lifetime of their TypeBuilder, there is a canonicalization step that converts the temporary types reachable from the TypeBuilder's heap types into globally interned canonical types. Only after that canonicalization step can the heap types be used to construct non-temporary types. This PR allows Types to be built in terms of as of yet undefined heap types, but it currently errors out in the presence of recursive types. Supporting recursive types will require further work to canonicalize them into finite, acyclic representations. Currently any attempt to compare recursive types would infinitely recurse.
kripken
left a comment
There was a problem hiding this comment.
Oh, and just to make sure about the overall plan: This PR does not modify module-utils.h, but your next PR would?
| struct TempRtt { | ||
| uint32_t depth; | ||
| HeapType* heapType; | ||
| }; |
There was a problem hiding this comment.
This makes me wonder once again if maybe HeapTypes should always be by a pointer. That is, that on line 37/40 we'd have HeapType* instead of HeapType.
I may be missing something, but it seems like that would make it simpler to implement the TypeBuilder, as we wouldn't need to add TempRef and TempRtt. And, separately, it would also avoid the risk of copies being passed around.
If memory safety is a concern here, then our usage pattern is very simple - allocate types when creating a wasm, and then use them. We could just store them in a global arena that is freed on process exit, like strings are. Note that that is a separate question from whether we intern them or not (which I am not sure about either way).
One of my concerns about TempRef and TempRtt, more specifically, is that they end up intruding in various places throughout this file. If somehow the TypeBuilder could encapsulate that complexity into itself then that would be another option to think about.
There was a problem hiding this comment.
I'll experiment with this.
The next PR would use TypeBuilder in the binary and text parsers. The PR after that would update the type finding code in module-utils.h to do a full type traversal and would remove the code that does the almost-topographical sort of the types. |
Interns HeapTypes using the same patterns and utilities already used to intern Types. This allows HeapTypes to efficiently be compared for equality and hashed, which may be important for very large struct types in the future. This change also has the benefit of increasing symmetry between the APIs of Type and HeapType, which will make the developer experience more consistent. Finally, this change will make TypeBuilder (WebAssembly#3418) much simpler because it will no longer have to introduce TypeInfo variants to refer to HeapTypes indirectly.
Interns HeapTypes using the same patterns and utilities already used to intern Types. This allows HeapTypes to efficiently be compared for equality and hashed, which may be important for very large struct types in the future. This change also has the benefit of increasing symmetry between the APIs of Type and HeapType, which will make the developer experience more consistent. Finally, this change will make TypeBuilder (#3418) much simpler because it will no longer have to introduce TypeInfo variants to refer to HeapTypes indirectly.
| void setHeapType(size_t i, Signature signature); | ||
| void setHeapType(size_t i, const Struct& struct_); | ||
| void setHeapType(size_t i, Struct&& struct_); | ||
| void setHeapType(size_t i, Array array); |
There was a problem hiding this comment.
The struct/array asymmetry feels a little wrong to me each time I see it. May be worth thinking more about this eventually.
| HeapTypeStore globalHeapTypeStore; | ||
|
|
||
| // Specialized to simplify programming generically over Types and HeapTypes. | ||
| template<typename T> struct MetaTypeInfo {}; |
There was a problem hiding this comment.
Not in this PR, but I am wondering more and more if we shouldn't merge Type and HeapType perhaps.
There was a problem hiding this comment.
I think it is useful to maintain the distinction between them, despite their similarities. They're certainly not interchangeable in the public API.
There was a problem hiding this comment.
That's a fair point that they matter in the public API. But OTOH, the public API also differentiates struct from array types, but we don't have a StructType or ArrayType. In other words, we could have a single Type, with a isHeap() like we have isTuple(),isRef(),isRtt() etc. Maybe I'm overthinking this though.
There was a problem hiding this comment.
I personally think having them separately is clearer. They represent fundamentally different kinds of type: Type is what every value and expression has, and HeapType is what we list in the type section, and they cannot be used interchangeably and one is not a subtype of the other. On the other hand, all of tuple, ref, and rtt are subtypes of Type.
| }; | ||
|
|
||
| // IDs of scanned Types and HeapTypes, used to prevent repeated scanning. | ||
| std::unordered_set<uint64_t> scanned; |
|
|
||
| struct Entry { | ||
| bool initialized = false; | ||
| HeapTypeInfo info = Signature(); |
There was a problem hiding this comment.
HeapTypeInfo does not have a default initializer, but we want to be able to default initialize Entry, so we have to give info an explicit default value. I'll add a comment clarifying that it is arbitrary.
| // replacing and deduplicating the temporary type and heaptypes backed by | ||
| // storage owned by the TypeBuilder into normal types and heap types backed by | ||
| // the global stores. | ||
| struct Canonicalizer { |
There was a problem hiding this comment.
This could maybe be in an anonymous namespace?
| } | ||
| } | ||
|
|
||
| void Canonicalizer::makeReachabilityFixpoint() { |
There was a problem hiding this comment.
Isn't "Fixed Point" the term? Or is "fixpoint" a common shorthand?
| void Canonicalizer::makeReachabilityFixpoint() { | |
| void Canonicalizer::makeReachabilityFixedPoint() { |
There was a problem hiding this comment.
That wikipedia article starts "In mathematics, a fixed point (sometimes shortened to fixpoint" so I think we could go either way.
There was a problem hiding this comment.
Oh, sorry, not sure how I missed that in the link...
There was a problem hiding this comment.
no problem :) I'll go with your suggestion since it is indeed the more standard name.
| void scanType(Type* child); | ||
| void makeReachabilityFixpoint(); | ||
| template<typename T> | ||
| void canonicalize(T* type, std::unordered_map<T, T>& canonicals); |
There was a problem hiding this comment.
maybe a comment on this method? the out param is not immediately obvious.
| } | ||
|
|
||
| Type TypeBuilder::getTempTupleType(const Tuple& tuple) { | ||
| return impl->typeStore.canonicalize(tuple); |
There was a problem hiding this comment.
How does this work? For a Ref type there is an index i, which gives us the necessary indirection, but here the Tuple contains actual Types..? (There also doesn't seem to be a test for getTempTupleType.)
There was a problem hiding this comment.
This is for when you need to create a tuple of temporary types, for example when constructing a function signature. You can't create a normal tuple type for that because that would allow references to the temporary types to leak out into the global type store. In the worst case, a signature type created later could be canonicalized to a type that refers to the original temporary types which have long since become invalid.
There was a problem hiding this comment.
I think I see. Is this an internal API then (and therefore not tested as part of the public API)?
There was a problem hiding this comment.
No, wait, the user needs to create those tuples, for each function..?
There was a problem hiding this comment.
Right, users need to do this. I forgot to say that I will add a test for this 👍
There was a problem hiding this comment.
Hmm, why are tuples different than everything else, though? The others all get size_t i.
There was a problem hiding this comment.
Tuples don't have associated HeapTypes, so they don't need to take a heap type index. They still need to be backed by the TypeBuilder's TypeStore, though.
| (ref $sig) => (ref (func (param (ref (struct (field (ref null (array (mut externref))) (mut (ref extern)))))) (result (ref (array (mut externref))) i32))) | ||
| (ref $struct) => (ref (struct (field (ref null (array (mut externref))) (mut (ref extern))))) | ||
| (ref $array) => (ref (array (mut externref))) | ||
| (ref null $array) => (ref null (array (mut externref))) |
There was a problem hiding this comment.
What does 'building types" do? These seems to be the same as "After setting heap types" already. Also I wonder how can refStruct be already referring to the array type at this point, given that it was constructed with a dummy empty refArray?
There was a problem hiding this comment.
The "building" step replaces the temporary types and heap types with permanent types and heap types. There is no way to differentiate between temporary and permanent types while printing, unfortunately.
refStruct was initialized by line builder.setHeapType(1, struct_);, which replaces the HeapTypeInfo backing the HeapType that refStruct refers to. This mutability of HeapTypeInfos is the key feature of the TypeBuilder that allows it to create types before the types they depend on have been created.
| Type refStruct = builder.getTempRefType(1, false); | ||
| Type refArray = builder.getTempRefType(2, false); | ||
| Type refNullArray = builder.getTempRefType(2, true); | ||
| Type refExt(HeapType::ext, false); |
There was a problem hiding this comment.
Unrelated to this PR, but why is this not extern and ext? I think it would be less confusing to follow the spec's names.. (Not necessarily in this PR though)
There was a problem hiding this comment.
Unfortunately we can't use extern because it is a keyword. Between the alternatives extern_ and ext I thought ext looked better, but perhaps the ugliness of extern would be worth the clarity. @kripken, what do you think about this? I'd be happy to change it if we collectively think extern_ would be better.
There was a problem hiding this comment.
I have a slight preference to extern_.
There was a problem hiding this comment.
Sounds good. I'll make this change in a followup 👍
| Type TypeBuilder::getTempRttType(size_t i, uint32_t depth) { | ||
| assert(i < impl->entries.size() && "Index out of bounds"); | ||
| return impl->typeStore.canonicalize(Rtt(depth, impl->entries[i].get())); | ||
| } |
There was a problem hiding this comment.
Maybe add tests that use this method too?
| assert(refSig != newRefSig); | ||
| assert(refStruct != newRefStruct); | ||
| assert(refArray != newRefArray); | ||
| assert(refNullArray != newRefNullArray); |
There was a problem hiding this comment.
Why should these be different? The printing results in type-builder.txt look the same..
There was a problem hiding this comment.
The types on the left are temporary types whose lifetime ends with the TypeBuilder and the types on the right are permanent globally canonical types.
| // temporary types. | ||
| struct TypeBuilder { | ||
| struct Impl; | ||
| std::unique_ptr<Impl> impl; |
There was a problem hiding this comment.
We need to hide the definition of Impl in the .cpp file because it depends on implementation details like HeapTypeInfo. We can't do that and have Impl impl; because hiding the definition necessarily makes Impl and incomplete type at this point. This is the classic PImpl idiom.
| Type* type; | ||
| HeapType* heapType; | ||
| }; | ||
| Item(Type* type) : kind(TypeKind), type(type) {} |
There was a problem hiding this comment.
Yes, notice that scanHeapType records Type*s to scan in the future and scanType records both HeapType* and Type* (the latter in the tuple case).
| // The work list of Types and HeapTypes remaining to be scanned. | ||
| std::vector<Item> scanList; | ||
|
|
||
| // The list of Types and HeapTypes to visit (in forward preorder). |
There was a problem hiding this comment.
Different comments say different things, such as postorder, reverse postorder, and forward preorder. Are they referring to the same thing or different things?
There was a problem hiding this comment.
Yeah sorry I can clear that up. The list stored in forward preorder as this comment says, but then it is finally traversed from back to front, giving reverse postorder. I'll clarify that here and use "reverse postorder" everywhere I am currently just using "postorder."
| }; | ||
|
|
||
| // IDs of scanned Types and HeapTypes, used to prevent repeated scanning. | ||
| std::unordered_set<TypeID> scanned; |
There was a problem hiding this comment.
Why not just Type instead of TypeID? TypeID or the fact that Type is an interned ID seems to be an implementation detail of Type class. The same for reaches map below.
There was a problem hiding this comment.
Because it needs to refer to HeapTypes as well. This gives a simple way to store both in the same collection where we don't need to be able to differentiate them because we know that no compound Type and HeapType will ever have the same ID. I used the same trick in reaches.
| // (type $struct (struct (field (ref null $array) (mut ref extern)))) | ||
| // (type $array (array (mut ref null extern))) |
There was a problem hiding this comment.
| // (type $struct (struct (field (ref null $array) (mut ref extern)))) | |
| // (type $array (array (mut ref null extern))) | |
| // (type $struct (struct (field (ref null $array) (mut externref)))) | |
| // (type $array (array (mut externref))) |
Nit: For simplicity (and also to match the output below)
There was a problem hiding this comment.
Good catch, thanks! I'll replace the second one (ref null extern), but the first one ref extern is not the same as externref because it is non-nullable.
|
I believe I have addressed all of the feedback so far 👍 |
|
Nice work! I have some thoughts for possible followups as mentioned earlier, but nothing urgent, and we can discuss those separately. |
Introduce TypeBuilder, a utility for constructing heap types in terms of other
heap types that may have not yet been defined. Internally, it works by using new
TypeInfo variants that contain pointers to HeapTypes rather than HeapType
values. That lets the TypeBuilder create temporary types that can refer to
uninitialized heap types owned by the TypeBuilder. Those temporary types can in
turn be used to initialize the very heap types they refer to. Since temporary
types are only valid for the lifetime of their TypeBuilder, there is a
canonicalization step that converts the temporary types reachable from the
TypeBuilder's heap types into globally interned canonical types. Only after that
canonicalization step can the heap types be used to construct non-temporary
types.
This PR allows Types to be built in terms of as of yet undefined heap types, but
it currently errors out in the presence of recursive types. Supporting recursive
types will require further work to canonicalize them into finite, acyclic
representations. Currently any attempt to compare recursive types would
infinitely recurse.
cc @dcodeIO