Support building recursive types#3602
Conversation
cf6f6e1 to
3273adb
Compare
|
This doesn't actually work because it doesn't canonicalize recusive Types before "non-recursive" HeapTypes that contain them, where "non-recursive" means the HeapType doesn't contain itself (but it's still an infinite tree because it contains a "recursive" type). So I don't forget, I need to:
|
| return typename Info::type_t(id); | ||
| } | ||
|
|
||
| void insertWithoutCanonicalizing(std::unique_ptr<Info>&& info) { |
There was a problem hiding this comment.
The other method above is "canonicalize" but it both canonicalizes and inserts. Perhaps it could be renamed to be more consistent with this name? canonicalizeAndInsert vs just insert perhaps?
There was a problem hiding this comment.
canonicalize does not take ownership of anything because it copies the type info object, so I think it's current name describes its intended interface. I got rid of this new method, though.
| // their canonicalized versions. | ||
| // Visit the use sites of Types and HeapTypes, replacing them with their | ||
| // canonicalized versions in an order that guarantees no temporary types will | ||
| // be leaked into the global stores.. |
There was a problem hiding this comment.
| // be leaked into the global stores.. | |
| // be leaked into the global stores. |
| // will outlive the TypeBuilder without their IDs changing. | ||
| for (auto& entry : builder.impl->entries) { | ||
| if (entry.info->recursive) { | ||
| globalHeapTypeStore.insertWithoutCanonicalizing(std::move(entry.info)); |
There was a problem hiding this comment.
Can we do the same for all types? I understand this is needed for recursive types, but if it's possible for non-recursive ones as well that could be simpler.
There was a problem hiding this comment.
If we end up partially or wholly switching to nominal types, it would make sense to move ownership of the nominal types from the canonicalizing global stores to the individual Modules. But for structural types, deduplication is a matter of correctness. If the proposal keeps fully structural typing, we will have to implement canonicalization of recursive types at some point as well.
There was a problem hiding this comment.
Oh, I see, I didn't get that before. So this does not canonicalize recursive types despite it being technically incorrect, but it may be good enough for now? (Why is it good enough for now?)
There was a problem hiding this comment.
This should be good enough for now because producers will generally emit one type declaration per source type and not deduplicate or mix the two. Because producers don’t treat the types as interchangeable, it’s ok that Binaryen doesn’t either. Another piece is that for now neither Binaryen nor the producers are dealing with systems of multiple modules. If they were, it would be important for Binaryen to recognize when the same type is declared in multiple modules, which would require properly canonicalizing recursive types.
| void Canonicalizer::findRecursiveTypes() { | ||
| for (auto& kv : ancestors) { | ||
| TypeID curr = kv.first; | ||
| if (kv.second.count(kv.first) != 0) { |
There was a problem hiding this comment.
| if (kv.second.count(kv.first) != 0) { | |
| if (kv.second.count(curr) != 0) { |
Also capturing kv.second to a local would give it a name which could be more readable I think.
Updates TypeBuilder to support recursive types. Recursive types are particularly problematic because under the current scheme it is necessary to canonicalize the uses of a type's immediate children before canonicalizing the type itself to avoid leaking non-canonical, temporary types out of the TypeBuilder and into the global type stores. In the case of recursive types, it is not possible to do this because of their cyclic nature. In principle this could be overcome by hashing recursive types based on their structure rather than their contents, but that would be complicated. Instead, this PR takes the shortcut of not canonicalizing self-referential HeapTypes at all, but rather moving them out of the TypeBuilder and into the global type store without changing their addresses or needing to update any of their use sites. This breaks all cycles and makes it possible to canonicalize the other types correctly. Note that this PR _only_ adds support for building recursive types. Doing almost anything with the types, such as printing, comparing, or emitting them will certainly lead to infinite recursions. A follow up PR will update all these operations to work correctly with recursive types.
3273adb to
d5b77b4
Compare
kripken
left a comment
There was a problem hiding this comment.
Nice!
lgtm after we figure out those comments.
| } while (changed); | ||
|
|
||
| // Find HeapTypes that reach themselves | ||
| for (auto& entry : builder.impl->entries) { |
There was a problem hiding this comment.
Can this loop be on fixedPoint? Basically the set we care about is { x \in children(x) | x \in AllThings } so I'm not sure why this starts with builder.impl->entries.
There was a problem hiding this comment.
If we looped over fixedPoint, we would have to first traverse it to collect a set of all TypeIDs it references and at some point we would also have to check whether each TypeID corresponded to a HeapType. In contrast, the HeapTypes correspond 1:1 to builder entries, so it takes less code find them that way.
| for (auto& kv : childrenDAG) { | ||
| kv.second.erase(id); | ||
| } | ||
| } |
There was a problem hiding this comment.
I don't know if this matters much, but for performance I suspect the reversed loops might be faster. I am assuming that most types are not self-referential - does that sound reasonable?
If so, then cache locality would benefit from having the inner loop be on the smaller thing, selfReferentialHeapTypes. But then you'd need two loops to avoid iterator invalidation I guess,
for (auto& kv : childrenDAG) {
kv.second.erase(selfReferentialHeapTypes.begin(), selfReferentialHeapTypes.end());
}
childrenDAG.erase(selfReferentialHeapTypes.begin(), selfReferentialHeapTypes.end());There was a problem hiding this comment.
IIUC, the variant of the erase method that takes iterators only accepts iterators into the collection it is being called on. So I don't think it's valid to pass iterators of selfReferentialHeapTypes to the erase method of the sets in childrenDAG. I could still invert the loops, but then there would be more loops overall :( Overall I prefer the current version.
for (TypeID id : selfReferentialHeapTypes) {
childrenDAG.erase(id);
}
for (auto& kv : childrenDAG) {
for (TypeID id : selfReferentialHeapTypes) {
kv.second.erase(id);
}
}
There was a problem hiding this comment.
Oh right, I mixed up that API...
Given it's more code it's not worth doing this possible optimization now, I agree. Maybe a TODO to investigate? Then if we profile in the future and see this function is slow, we'd have a hint.
Updates TypeBuilder to support recursive types. Recursive types are particularly
problematic because under the current scheme it is necessary to canonicalize the
uses of a type's immediate children before canonicalizing the type itself to
avoid leaking non-canonical, temporary types out of the TypeBuilder and into the
global type stores. In the case of recursive types, it is not possible to do
this because of their cyclic nature. In principle this could be overcome by
hashing recursive types based on their structure rather than their contents, but
that would be complicated. Instead, this PR takes the shortcut of not
canonicalizing self-referential HeapTypes at all, but rather moving them out of the
TypeBuilder and into the global type store without changing their addresses or
needing to update any of their use sites. This breaks all cycles and makes it
possible to canonicalize the other types correctly.
Note that this PR only adds support for building recursive types. Doing almost
anything with the types, such as printing, comparing, or emitting them will
certainly lead to infinite recursions. A follow up PR will update all these
operations to work correctly with recursive types.