Skip to content

Support building recursive types#3602

Merged
tlively merged 5 commits into
mainfrom
recursive-types
Feb 25, 2021
Merged

Support building recursive types#3602
tlively merged 5 commits into
mainfrom
recursive-types

Conversation

@tlively

@tlively tlively commented Feb 24, 2021

Copy link
Copy Markdown
Member

Updates TypeBuilder to support recursive types. Recursive types are particularly
problematic because under the current scheme it is necessary to canonicalize the
uses of a type's immediate children before canonicalizing the type itself to
avoid leaking non-canonical, temporary types out of the TypeBuilder and into the
global type stores. In the case of recursive types, it is not possible to do
this because of their cyclic nature. In principle this could be overcome by
hashing recursive types based on their structure rather than their contents, but
that would be complicated. Instead, this PR takes the shortcut of not
canonicalizing self-referential HeapTypes at all, but rather moving them out of the
TypeBuilder and into the global type store without changing their addresses or
needing to update any of their use sites. This breaks all cycles and makes it
possible to canonicalize the other types correctly.

Note that this PR only adds support for building recursive types. Doing almost
anything with the types, such as printing, comparing, or emitting them will
certainly lead to infinite recursions. A follow up PR will update all these
operations to work correctly with recursive types.

@tlively tlively requested a review from kripken February 24, 2021 07:29
@tlively tlively marked this pull request as draft February 24, 2021 08:43
@tlively

tlively commented Feb 24, 2021

Copy link
Copy Markdown
Member Author

This doesn't actually work because it doesn't canonicalize recusive Types before "non-recursive" HeapTypes that contain them, where "non-recursive" means the HeapType doesn't contain itself (but it's still an infinite tree because it contains a "recursive" type).

So I don't forget, I need to:

  • Do a topological sort on the child-parent graph (before calculating its fixed point) with the self-reaching HeapTypes removed.
  • Set recursive = true on all infinite types, not just the self-reaching ones.

Comment thread src/wasm/wasm-type.cpp Outdated
return typename Info::type_t(id);
}

void insertWithoutCanonicalizing(std::unique_ptr<Info>&& info) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other method above is "canonicalize" but it both canonicalizes and inserts. Perhaps it could be renamed to be more consistent with this name? canonicalizeAndInsert vs just insert perhaps?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canonicalize does not take ownership of anything because it copies the type info object, so I think it's current name describes its intended interface. I got rid of this new method, though.

Comment thread src/wasm/wasm-type.cpp Outdated
// their canonicalized versions.
// Visit the use sites of Types and HeapTypes, replacing them with their
// canonicalized versions in an order that guarantees no temporary types will
// be leaked into the global stores..

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// be leaked into the global stores..
// be leaked into the global stores.

Comment thread src/wasm/wasm-type.cpp Outdated
// will outlive the TypeBuilder without their IDs changing.
for (auto& entry : builder.impl->entries) {
if (entry.info->recursive) {
globalHeapTypeStore.insertWithoutCanonicalizing(std::move(entry.info));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the same for all types? I understand this is needed for recursive types, but if it's possible for non-recursive ones as well that could be simpler.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up partially or wholly switching to nominal types, it would make sense to move ownership of the nominal types from the canonicalizing global stores to the individual Modules. But for structural types, deduplication is a matter of correctness. If the proposal keeps fully structural typing, we will have to implement canonicalization of recursive types at some point as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, I didn't get that before. So this does not canonicalize recursive types despite it being technically incorrect, but it may be good enough for now? (Why is it good enough for now?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be good enough for now because producers will generally emit one type declaration per source type and not deduplicate or mix the two. Because producers don’t treat the types as interchangeable, it’s ok that Binaryen doesn’t either. Another piece is that for now neither Binaryen nor the producers are dealing with systems of multiple modules. If they were, it would be important for Binaryen to recognize when the same type is declared in multiple modules, which would require properly canonicalizing recursive types.

Comment thread src/wasm/wasm-type.cpp Outdated
void Canonicalizer::findRecursiveTypes() {
for (auto& kv : ancestors) {
TypeID curr = kv.first;
if (kv.second.count(kv.first) != 0) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (kv.second.count(kv.first) != 0) {
if (kv.second.count(curr) != 0) {

Also capturing kv.second to a local would give it a name which could be more readable I think.

Updates TypeBuilder to support recursive types. Recursive types are particularly
problematic because under the current scheme it is necessary to canonicalize the
uses of a type's immediate children before canonicalizing the type itself to
avoid leaking non-canonical, temporary types out of the TypeBuilder and into the
global type stores. In the case of recursive types, it is not possible to do
this because of their cyclic nature. In principle this could be overcome by
hashing recursive types based on their structure rather than their contents, but
that would be complicated. Instead, this PR takes the shortcut of not
canonicalizing self-referential HeapTypes at all, but rather moving them out of the
TypeBuilder and into the global type store without changing their addresses or
needing to update any of their use sites. This breaks all cycles and makes it
possible to canonicalize the other types correctly.

Note that this PR _only_ adds support for building recursive types. Doing almost
anything with the types, such as printing, comparing, or emitting them will
certainly lead to infinite recursions. A follow up PR will update all these
operations to work correctly with recursive types.
@tlively tlively marked this pull request as ready for review February 24, 2021 20:13

@kripken kripken left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

lgtm after we figure out those comments.

Comment thread src/wasm/wasm-type.cpp
} while (changed);

// Find HeapTypes that reach themselves
for (auto& entry : builder.impl->entries) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this loop be on fixedPoint? Basically the set we care about is { x \in children(x) | x \in AllThings } so I'm not sure why this starts with builder.impl->entries.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we looped over fixedPoint, we would have to first traverse it to collect a set of all TypeIDs it references and at some point we would also have to check whether each TypeID corresponded to a HeapType. In contrast, the HeapTypes correspond 1:1 to builder entries, so it takes less code find them that way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, sounds good.

Comment thread src/wasm/wasm-type.cpp
for (auto& kv : childrenDAG) {
kv.second.erase(id);
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this matters much, but for performance I suspect the reversed loops might be faster. I am assuming that most types are not self-referential - does that sound reasonable?

If so, then cache locality would benefit from having the inner loop be on the smaller thing, selfReferentialHeapTypes. But then you'd need two loops to avoid iterator invalidation I guess,

for (auto& kv : childrenDAG) {
  kv.second.erase(selfReferentialHeapTypes.begin(), selfReferentialHeapTypes.end());
}
childrenDAG.erase(selfReferentialHeapTypes.begin(), selfReferentialHeapTypes.end());

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the variant of the erase method that takes iterators only accepts iterators into the collection it is being called on. So I don't think it's valid to pass iterators of selfReferentialHeapTypes to the erase method of the sets in childrenDAG. I could still invert the loops, but then there would be more loops overall :( Overall I prefer the current version.

  for (TypeID id : selfReferentialHeapTypes) {
    childrenDAG.erase(id);
  }
  for (auto& kv : childrenDAG) {
    for (TypeID id : selfReferentialHeapTypes) {
      kv.second.erase(id);
    }
  }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I mixed up that API...

Given it's more code it's not worth doing this possible optimization now, I agree. Maybe a TODO to investigate? Then if we profile in the future and see this function is slow, we'd have a hint.

@tlively tlively merged commit d1d631d into main Feb 25, 2021
@tlively tlively deleted the recursive-types branch February 25, 2021 02:57
@tlively tlively mentioned this pull request Feb 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants