This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Demangle symbol names for use by the browser debugger
ClosedPublic

Authored by ncw on Mar 9 2018, 10:19 AM.

Details

Summary

I hope this isn't too controversial?

The "name" section is what the browser debugger will show, and it of course knows nothing about the Itanium mangling conventions - it doesn't even know that the Wasm file came from C++. So the demangled names should go in there?

I seem to remember a GitHub comment, perhaps from @sunfish or someone else saying that this had been the intention - but maybe I'm misremembering.

Alternatively, DWARF-like debugging data attached to the Wasm file could play a similar role, but then, what's the point of the "name" section if the demangled name for the user is actually going to be coming from somewhere else?

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

ncw created this revision.Mar 9 2018, 10:19 AM
sbc100 accepted this revision.Mar 9 2018, 4:20 PM

LGTM, with comments

test/wasm/cxx-symbols.ll
1 ↗(On Diff #137792)

Maybe call this demangle.ll?

Maybe don't make foo hidden so it get exported, that way we will see if the export is mangled or not.

Perhaps check less about the output? Just the name section? Just the name and export section? I'm a little torn on this since a good test should really only test one thing ,but having the full output can be nice in that you see exactly what a given change does. But including the full yaml output also makes the tests more brittle than they need to be.

wasm/Writer.cpp
531

For some reason I feel like this should just be static helper function rather than a lambda. I can imagine it being useful elsewhere too.

This revision is now accepted and ready to land.Mar 9 2018, 4:20 PM
sbc100 added inline comments.Mar 9 2018, 4:23 PM
wasm/Writer.cpp
532

I'm not sure the demanding config option should be used here. Lets just pick one format the for the name section and stick with it (i.e. lets always demanding here).

I had indented the demanding config option (which I notice is actually uninitialzed!, oops) to be for the linker output and error mesages to make them more readable. That seems orthogonal to the format the names sections which should be fixed one way to the other IMHO.

ncw updated this revision to Diff 138048.Mar 12 2018, 10:20 AM

Updated: addressed feedback, undid toString mangling changes

ncw updated this revision to Diff 138049.Mar 12 2018, 10:22 AM

Updated: un-hid function used in test

LGTM if you reduce it to just the name section changes.

test/wasm/cxx-symbols.ll
1 ↗(On Diff #137792)

What do you think about this? cxx-mangling.ll maybe? name-mangling.ll?

wasm/Driver.cpp
246

Why this change?

wasm/Symbols.cpp
186 ↗(On Diff #138049)

Make this a separate change?

wasm/Writer.cpp
536

It seems like all the users of demangleItanium use it in this way, so perhaps its worth changing the function to have this behavior (i.e. no need to return an Optional at all and just return the original if demangling fail). We can do that as a followup perhaps.

sbc100 accepted this revision.Mar 12 2018, 2:36 PM
ncw marked an inline comment as done.Mar 13 2018, 6:32 AM

Landing with just the name section changes (in Writer.cpp and Driver.cpp).

test/wasm/cxx-symbols.ll
1 ↗(On Diff #137792)

OK, I'll rename to cxx-mangling.ll

wasm/Driver.cpp
246

To make it consistent - here we're demangling a symbol name and putting it in the "name" section, and in Writer.cpp we're also demangling a symbol name and putting it in a name section.

In Writer, you agreed we wanted to demangle, regardless of whether Config->Demangle was set... so here we should also be demangling regardless of whether the config option is set.

I've added this condition to the test, to make it clearer that it's part of the same change, and is tested in the same place.

wasm/Symbols.cpp
186 ↗(On Diff #138049)

Yup, OK.

wasm/Writer.cpp
536

I hummed and ha-ed about that. It has to return a std::string since the demangled name isn't cached anywhere - but you don't really want to return a std::string copy for the non-mangled symbol names (since that would be doing a string copy for every symbol name, which is potentially wasteful).

So inlining it here is the only way to avoid a silly copy in the non-mangled case.

In our codebase, we have a class StringCopyOpt that looks like this:

class StringCopyOpt {
  StringCopyOpt(char* s, bool own=false) : s(s), own(own) {}
  ~StringCopyOpt() { if (own) delete [] s; }
  // move ctor and operator, etc...
  char* s; bool own;
};

So it allows automatic cleanup, but without requiring you make a copy for every return value, just because some return values need a copy. I can't find anything similar in LLVM, but it would be ideal as the return value from demangleItanium.

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.
ncw marked an inline comment as done.
ncw marked an inline comment as done.
sbc100 added inline comments.Mar 13 2018, 12:05 PM
wasm/Driver.cpp
246

It looks like what you are doing here is creating a new function, with a given name. Function name should be mangled shouldn't they? They only get demangled when you make the name section. Don't you want the stub function to have the same name as the symbol? I'm still confused why you demangle here.

ncw added inline comments.Mar 14 2018, 9:30 AM
wasm/Driver.cpp
246

I'm demangling here because in my mind the function name is the demangled thing? The InputChunk's name is set from reading in the "name" section - or if that doesn't exist it falls back to using the symbol name.

If we want the symbol to appear as "prefix + demangled name" (eg "undefined function foo()") in the browser stack traces, then we kind of have to demangle at the point where we create the SyntheticFunction. We can't create something that will demangle to that, given the prefix...

So what's messy is that WasmFunction::Name is already a sometimes-mangled-sometimes-demangled field (since it can be set from the "name" section or the symbol name).

Do you have a suggestion for how you'd do it - so that its name in the "name" section is "undefined function foo()", but so that the SyntheticFunction itself has a different name?

Hmm.. its kind of shame maybe that we decided to demangle the names in the name section as they no longer match the symbol table names.

If we want to stick with the decision than maybe we should move to having libObject completely ignore the name section. For example, all of the native tooling expects symbol names to be mangled with the option to de-mangle them (think nm vs nm -C).

Anyway, I think all internal names and symbols should be mangled and valid C identifiers. Demangling things should be reserved for the moment of displaying. For this reason I think we should either reverse our decision to store demangled names in the name section, or ignore the name section when it comes to naming things internally and rely on the symbol table instead.

Another way of thinking about this: Names for sections, segments, chunks, and symtab entries should be all be valid C identifiers. If we allow for pretty names with spaces and unicode and whatnot it should be reserved for the names section only. If we choose go this route we can't then apply the names from the name section to segments, chunk, symtab entries, etc...

aprantl removed a subscriber: aprantl.Mar 14 2018, 10:48 AM

I think the symbol table (and any names that actually have meaning to the compiler and/or linker) should be mangled names, and only the name section should have demangled names. I also think it makes sense for libObject to only use the symbol table and ignore the name section. In other words, the name section should be thought of as metadata or debug info and in that sense shouldn't be "trusted" to always be correct (although of course we should preserve it wherever we can).

ncw added a comment.Mar 14 2018, 11:47 AM

I opened a can of worms! I think it can be cleaned up though...

Hmm.. its kind of shame maybe that we decided to demangle the names in the name section as they no longer match the symbol table names.

If we want to stick with the decision than maybe we should move to having libObject completely ignore the name section. For example, all of the native tooling expects symbol names to be mangled with the option to de-mangle them (think nm vs nm -C).

That's currently pretty-much how it is. libObject does use the "name" section to assign names to Functions (and will assign names to Segments when the "name" section is soon extended to allow for that...) - but that's fairly reasonable. We never currently have unmangled symbol names, which is what the tooling expects. Symbol names are sacrosanct, I haven't gone near them.

Anyway, I think all internal names and symbols should be mangled and valid C identifiers. Demangling things should be reserved for the moment of displaying. For this reason I think we should either reverse our decision to store demangled names in the name section, or ignore the name section when it comes to naming things internally and rely on the symbol table instead.

I'm not so sure - is there really a big problem with having free-form text for Function/Segment names? That's what it comes down to... the browser needs them unmangled, and the "name" section is a straightforward assignment of names to the Wasm objects... so why not have our Wasm objects named with some demangled names?

If that's not allowed... then the fix should be fairly straightforward. We'll just create a "getDebugName()" field on InputChunk, which will be similar to the existing getName() field, it will be set from the "name" section by libObject, and written out to the "name" section (defaulting to just getName() if there's no DebugName on the Chunk).

I think that would satisfy all the concerns above?

I think the symbol table (and any names that actually have meaning to the compiler and/or linker) should be mangled names, and only the name section should have demangled names. I also think it makes sense for libObject to only use the symbol table and ignore the name section. In other words, the name section should be thought of as metadata or debug info and in that sense shouldn't be "trusted" to always be correct (although of course we should preserve it wherever we can).

Yup, that's roughly much how it is. All meaningful names are mangled currently. Except that currently, libObject passes through the "name" section names, by setting the WasmFunction's Name field - however that's not a "meaningful" name to LLD, it's only used as the string for some logging messages, and for writing back into the "name" section.