asm2wasm debuginfo#895
Conversation
|
Some error on the bots and locally |
d46b672 to
23c5752
Compare
ignore --debuginfo if not emitting text, as wasm binaries don't support that yet
23c5752 to
55b7289
Compare
…binary, all we can do is the Names section
ff420e1 to
3e75a78
Compare
|
This should be good now, if anyone wants to review. |
| if (out + 100 >= end) { | ||
| Fatal() << "error in handling debug info"; | ||
| } | ||
| if (input[0] == '/' && input[1] == '/' && input[2] == '@' && input[3] == 'l' && input[4] == 'i' && input[5] == 'n' && input[6] == 'e') { |
There was a problem hiding this comment.
What about startsWith(input, "//@line") ? Providing startsWith in a utility header.
There was a problem hiding this comment.
Fixing. I worry it will be less efficient, but I guess a modern compiler should be able to do this well...
| *out++ = *input++; | ||
| *out++ = *input++; | ||
| *out++ = *input++; | ||
| *out++ = *input++; |
| // before it - this is guaranteed to be correct without opts, | ||
| // and is usually decently accurate with them. | ||
| auto size = strlen(input); | ||
| auto upperBound = Index(size * 1.25) + 100; |
There was a problem hiding this comment.
Magic numbers are magic :)
Does this 100 match the 100 a few lines down?
Would be good to replace these with named constants for clarity.
There was a problem hiding this comment.
Yeah, good point, fixing.
| Function* processFunction(Ref ast); | ||
|
|
||
| public: | ||
| CallImport* isDebugInfo(Expression* curr) { |
There was a problem hiding this comment.
is___ makes me thing "boolean". Better name would be just debugInfo, which can still be used in a boolean context.
There was a problem hiding this comment.
Good point, yeah. Changing to checkDebugInfo which is consistent with other stuff in wasm.h.
|
|
||
| #include <atomic> | ||
| #include <cassert> | ||
| #include <cstdlib> |
There was a problem hiding this comment.
It's needed for abort, this just wasn't noticed before because other locations happened to have the include anyhow, I think.
|
Thanks for the feedback, added a commit with fixes. |
|
|
||
| #include <atomic> | ||
| #include <cassert> | ||
| #include <cstdlib> |
| std::string DEBUGINFO_INTRINSIC = EMSCRIPTEN_DEBUGINFO.str; | ||
| auto DEBUGINFO_INTRINSIC_SIZE = DEBUGINFO_INTRINSIC.size(); | ||
| bool seenUseAsm = false; | ||
| auto startsWith = [&](const char *s) { |
There was a problem hiding this comment.
I would make this a top-level function as opposed to a lambda
- when reading this code it can be assumed that
startsWithexists elsewhere, so it doesn't interrupt the flow of reading this function - the way its currently implemented a better name would be
inputAtTheCurrentPositionStartsWith("foo"). I feelstartsWith(input, "foo")is more explicit at the call site and requires less context for the reader to keep in their head
Both should be inlineable by the compiler so it should be equivalent to the unrolled version, either way.
There was a problem hiding this comment.
Fair points, but which toplevel? Of the object? Or the entire file?
There was a problem hiding this comment.
In my head it's a top-level function that requires no state, so of the entire file. In theory it could be put in a utils header and shared.
There was a problem hiding this comment.
ok, refactored to top level in the file
| out += line.size(); | ||
| *out++ = ')'; | ||
| *out++ = ';'; | ||
| } else if (!seenUseAsm && (startsWith("asm'") || startsWith("asm\""))) { |
There was a problem hiding this comment.
This could theoretically be optimized by the compiler to be equivalent to what came before, but if (!seenUseAsm && startsWith(input, "asm") && (input[3] == '\'' || input[3] == '"')) is a possible rewrite that should almost-certainly be the same, while being I think equally legible. (I feel ambivalent about this either way though)
| // before it - this is guaranteed to be correct without opts, | ||
| // and is usually decently accurate with them. | ||
| const auto SCALE_FACTOR = 1.25; // an upper bound on how much more space we need as a multiple of the original | ||
| const auto ADD_FACTOR = 100; // an upper bound on how much we write for each debug info element itself |
There was a problem hiding this comment.
These comments are pretty helpful, thanks!
| static bool startsWith(const char* string, const char *prefix) { | ||
| while (1) { | ||
| if (*prefix == 0) return true; | ||
| if (*string++ != *prefix++) return false; |
There was a problem hiding this comment.
Should also probably return false if *string == 0 for completeness, e.g. startsWith("foo", "foobar")
jgravelle-google
left a comment
There was a problem hiding this comment.
Modulo nits, lgtm!
| } else if (!seenUseAsm && (startsWith(input, "asm'") || startsWith(input, "asm\""))) { | ||
| // end of "use asm" or "almost asm" | ||
| seenUseAsm = true; | ||
| memcpy(out, input, 5); |
There was a problem hiding this comment.
Probably worth pulling the 5s into a constant?
Actually yeah it's not obvious why 5 and not 4, presumably consuming asm\'\n?
There was a problem hiding this comment.
Heh, it's actually the semicolon at the end, "use asm";. I'll clarify.
| strcpy(out, line.c_str()); | ||
| out += line.size(); | ||
| *out++ = ')'; | ||
| *out++ = ';'; |
There was a problem hiding this comment.
Nothing to do here just a thought/comment/observation:
This interleaving section of strcpy and *out++ = char is pretty verbose and I wish there was a cleaner/more compact way to do it.
Something like outStream << DEBUGINFO_INTRINSIC << '(' << fileIndex << ',' << line << ");", though that would require a very different model of how this is done and probably isn't worth it?
There was a problem hiding this comment.
Yeah, might be a better way. The code is not expected to change much if at all, though, so I think it's maintainable as it is. If it were evolving code definitely it would need to be refactored.
Users have complained that debugging wasm is hard, and it'll be a while til we have proper source maps or such. Meanwhile, this pr might help a little, it makes asm2wasm use debug info when present, and emit it as comments in the wast. For example, hello world might look like this:
Technically, the debug info is translated into intrinsic calls. This is for efficiency, so that they incur no overhead when not using it (so we don't make each node have a debug info field or anything like that). Then we compile (and possibly optimize), and then we translate those intrinsics into annotations on the nodes, which are a map on the side per function of
node => annotation. Then the printer just prints the annotation with the node, if there is one.