Intern HeapTypes and clean up types code#3428
Conversation
Interns HeapTypes using the same patterns and utilities already used to intern Types. This allows HeapTypes to efficiently be compared for equality and hashed, which may be important for very large struct types in the future. This change also has the benefit of increasing symmetry between the APIs of Type and HeapType, which will make the developer experience more consistent. Finally, this change will make TypeBuilder (WebAssembly#3418) much simpler because it will no longer have to introduce TypeInfo variants to refer to HeapTypes indirectly.
|
|
||
| Signature getSignature() const; | ||
| const Struct& getStruct() const; | ||
| Array getArray() const; |
There was a problem hiding this comment.
| Array getArray() const; | |
| const Array& getArray() const; |
There was a problem hiding this comment.
Is the reason Arrays and Structs differ in this way that Arrays are "small" and Structs are possibly of unbounded size?
There was a problem hiding this comment.
Yes, that's what I was thinking 👍 Arrays contain only one Field each, which in turn contains only three items taking up two machine words. That seems small enough to pass around by value.
There was a problem hiding this comment.
That does make sense, but it does make the API asymmetrical, which seems a little surprising. Or do you think for users of the API the difference wouldn't be noticeable?
A comment might be good either way.
There was a problem hiding this comment.
Yeah, the asymmetry is a little unfortunate, but I think it will be ok. What would you think of deleting operator= to prevent folks from accidentally doing auto s = getStruct() and making a copy of the struct? Would that be worth it or too inconvenient?
Will add comments.
There was a problem hiding this comment.
That seems good, assuming I understand correctly and it would just force people to add the & in auto& struct_ = type.getStruct(); - ? If so sgtm. But regardless, that could also be separate.
| std::string toString() const; | ||
| }; | ||
|
|
||
| class HeapType { |
There was a problem hiding this comment.
I think it would be good to add a comment explaining what a HeapType is, and why it's separate from Type.
There was a problem hiding this comment.
Also it would be good to comment that Type and HeapType are basically interned IDs so are supposed to be passed/used by values.
| std::string toString() const; | ||
| }; | ||
|
|
||
| class HeapType { |
There was a problem hiding this comment.
Also it would be good to comment that Type and HeapType are basically interned IDs so are supposed to be passed/used by values.
| break; | ||
| } else { | ||
| if (info.ref.heapType == HeapType::i31) { | ||
| return Type::i31ref; |
There was a problem hiding this comment.
Any reason to change this from map lookup to switch-case jump table? Wouldn't map lookup be cheaper?
There was a problem hiding this comment.
I'm not sure about the performance - the map lookup at least requires taking a lock and calculating a hash. I wouldn't expect this to have a measurable impact (but I haven't measured it). The reason I changed it was that what we had before was kind of a hack. It used tuple TypeInfos to store individual BasicTypes even though empty and singleton tuples were otherwise not allowed anywhere. In contrast, there is no need for HeapTypeInfo to be able to represent a BasicHeapType and it seemed cleaner to make that separation of concerns explicit for TypeInfo as well.
There was a problem hiding this comment.
Hmm, I thought it would be simpler if every Type has its matching TypeInfo and we can get one using the other, but as long as it is implementation detail and well encapsulated from outside I think either way should be fine.
There was a problem hiding this comment.
Thanks :) This change should not be observable outside of this function.
| case HeapType::ArrayKind: | ||
| assert(heapType.array.element.type.isSingle()); | ||
| break; | ||
| } else { |
There was a problem hiding this comment.
How about adding an assertion failure in case heapType is not nullable and is not i31, because we don't support it yet?
There was a problem hiding this comment.
I agree that erroring out when we use non-nullable types would make sense (including for i31ref, which we don't support any better than other non-nullable types). But I think the validator would be a better place for that error since we support creating and querying non-nullable types, just not using them in the IR.
There was a problem hiding this comment.
What I was wondering is, even if the type is basic, if it is not nullable, it will fall out of all these ifs and call Store<TypeInfo>::canonicalize(info), which will create TypeInfo for it, in which case it can cause an unintended consequence which is not easy to catch later. Maybe I'm missing something?
There was a problem hiding this comment.
I don't think the act of making a TypeInfo for a non-nullable type can make anything bad happen. We also depend on being able to do that in test/example/typinfo.cpp.
The bad things only start happening once you use a non-nullable type for locals in the IR.
There was a problem hiding this comment.
But didn't you say you don't create TypeInfos anymore with (nullable) basic types?
There was a problem hiding this comment.
Oh, I see what you're saying. We only provide BasicTypes for types explicitly given their own names in the spec proposals, so (ref null extern), which is equivalent to externref, is canonicalized into Type::externref, but on the other hand (ref extern) would be backed by a TypeInfo. This is working as intended. i31ref is weird because it is non-nullable by definition.
|
|
||
| } // namespace wasm | ||
|
|
||
| namespace std { |
There was a problem hiding this comment.
Any reason we put hash functions in two separate chunks? Can we merge them into one chunk?
There was a problem hiding this comment.
There are a couple unfortunate things going on here.
-
The declaration of specializations of
std::hashforTypeInfoandHeapTypeInfoneeds to be done before the declaration ofStore, otherwise the declaration ofStore::typeIDstries to use the default instantiation and fails. -
There's no way to declare specializations from a different namespace, so there's no way to avoid leaving the anonymous and
wasmnamespaces and enteringstdjust for these two declarations.
There's nothing stopping us from moving the hash definitions from down below up here, but I thought it would still be better to keep them separate despite the extra verbosity.
Thanks, I'm glad you think so :D |
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
| // This type annotation is unused. Beware it needing to be used in the future! | ||
| getHeapType(); |
There was a problem hiding this comment.
cc @kripken here's where I fixed that unused variable error I was getting.
Interns HeapTypes using the same patterns and utilities already used to intern
Types. This allows HeapTypes to efficiently be compared for equality and hashed,
which may be important for very large struct types in the future. This change
also has the benefit of increasing symmetry between the APIs of Type and
HeapType, which will make the developer experience more consistent. Finally,
this change will make TypeBuilder (#3418) much simpler because it will no longer
have to introduce TypeInfo variants to refer to HeapTypes indirectly.