Download Raw Diff

Details

Reviewers

sbc100
sunfish

Commits

rG531769b9f9ff: [WebAssembly] Demangle symbol names for use by the browser debugger
rLLD327392: [WebAssembly] Demangle symbol names for use by the browser debugger
rL327392: [WebAssembly] Demangle symbol names for use by the browser debugger

Summary

I hope this isn't too controversial?

The "name" section is what the browser debugger will show, and it of course knows nothing about the Itanium mangling conventions - it doesn't even know that the Wasm file came from C++. So the demangled names should go in there?

I seem to remember a GitHub comment, perhaps from @sunfish or someone else saying that this had been the intention - but maybe I'm misremembering.

Alternatively, DWARF-like debugging data attached to the Wasm file could play a similar role, but then, what's the point of the "name" section if the demangled name for the user is actually going to be coming from somewhere else?

Diff Detail

Repository: rLLD LLVM Linker

Event Timeline

ncw created this revision.Mar 9 2018, 10:19 AM

Herald added subscribers: llvm-commits, JDevlieghere, aheejin and 4 others. · View Herald TranscriptMar 9 2018, 10:19 AM

LGTM, with comments

test/wasm/cxx-symbols.ll
1 ↗	(On Diff #137792)	Maybe call this demangle.ll? Maybe don't make foo hidden so it get exported, that way we will see if the export is mangled or not. Perhaps check less about the output? Just the name section? Just the name and export section? I'm a little torn on this since a good test should really only test one thing ,but having the full output can be nice in that you see exactly what a given change does. But including the full yaml output also makes the tests more brittle than they need to be.
wasm/Writer.cpp
531	For some reason I feel like this should just be static helper function rather than a lambda. I can imagine it being useful elsewhere too.

This revision is now accepted and ready to land.Mar 9 2018, 4:20 PM

sbc100 added inline comments.Mar 9 2018, 4:23 PM

wasm/Writer.cpp
532	I'm not sure the demanding config option should be used here. Lets just pick one format the for the name section and stick with it (i.e. lets always demanding here). I had indented the demanding config option (which I notice is actually uninitialzed!, oops) to be for the linker output and error mesages to make them more readable. That seems orthogonal to the format the names sections which should be fixed one way to the other IMHO.

ncw mentioned this in D44323: [WebAssembly] Remove a second parameter from toString()..Mar 10 2018, 7:44 AM

Updated: addressed feedback, undid toString mangling changes

Updated: un-hid function used in test

Harbormaster completed remote builds in B15978: Diff 138049.Mar 12 2018, 10:22 AM

LGTM if you reduce it to just the name section changes.

test/wasm/cxx-symbols.ll
1 ↗	(On Diff #137792)	What do you think about this? cxx-mangling.ll maybe? name-mangling.ll?
wasm/Driver.cpp
246	Why this change?
wasm/Symbols.cpp
186 ↗	(On Diff #138049)	Make this a separate change?
wasm/Writer.cpp
536	It seems like all the users of demangleItanium use it in this way, so perhaps its worth changing the function to have this behavior (i.e. no need to return an Optional at all and just return the original if demangling fail). We can do that as a followup perhaps.

sbc100 accepted this revision.Mar 12 2018, 2:36 PM

Landing with just the name section changes (in Writer.cpp and Driver.cpp).

test/wasm/cxx-symbols.ll
1 ↗	(On Diff #137792)	OK, I'll rename to cxx-mangling.ll
wasm/Driver.cpp
246	To make it consistent - here we're demangling a symbol name and putting it in the "name" section, and in Writer.cpp we're also demangling a symbol name and putting it in a name section. In Writer, you agreed we wanted to demangle, regardless of whether `Config->Demangle` was set... so here we should also be demangling regardless of whether the config option is set. I've added this condition to the test, to make it clearer that it's part of the same change, and is tested in the same place.
wasm/Symbols.cpp
186 ↗	(On Diff #138049)	Yup, OK.
wasm/Writer.cpp
536	I hummed and ha-ed about that. It has to return a std::string since the demangled name isn't cached anywhere - but you don't really want to return a std::string copy for the non-mangled symbol names (since that would be doing a string copy for every symbol name, which is potentially wasteful). So inlining it here is the only way to avoid a silly copy in the non-mangled case. In our codebase, we have a class StringCopyOpt that looks like this: class StringCopyOpt { StringCopyOpt(char* s, bool own=false) : s(s), own(own) {} ~StringCopyOpt() { if (own) delete [] s; } // move ctor and operator, etc... char* s; bool own; }; So it allows automatic cleanup, but without requiring you make a copy for every return value, just because some return values need a copy. I can't find anything similar in LLVM, but it would be ideal as the return value from demangleItanium.

Closed by commit rL327392: [WebAssembly] Demangle symbol names for use by the browser debugger (authored by ncw). · Explain WhyMar 13 2018, 6:33 AM

Closed by commit rLLD327392: [WebAssembly] Demangle symbol names for use by the browser debugger (authored by ncw). · Explain Why

This revision was automatically updated to reflect the committed changes.

ncw marked an inline comment as done.

sbc100 added inline comments.Mar 13 2018, 12:05 PM

wasm/Driver.cpp
246	It looks like what you are doing here is creating a new function, with a given name. Function name should be mangled shouldn't they? They only get demangled when you make the name section. Don't you want the stub function to have the same name as the symbol? I'm still confused why you demangle here.

ncw added inline comments.Mar 14 2018, 9:30 AM

wasm/Driver.cpp
246	I'm demangling here because in my mind the function name is the demangled thing? The InputChunk's name is set from reading in the "name" section - or if that doesn't exist it falls back to using the symbol name. If we want the symbol to appear as "prefix + demangled name" (eg "undefined function foo()") in the browser stack traces, then we kind of have to demangle at the point where we create the SyntheticFunction. We can't create something that will demangle to that, given the prefix... So what's messy is that WasmFunction::Name is already a sometimes-mangled-sometimes-demangled field (since it can be set from the "name" section or the symbol name). Do you have a suggestion for how you'd do it - so that its name in the "name" section is "undefined function foo()", but so that the SyntheticFunction itself has a different name?

Hmm.. its kind of shame maybe that we decided to demangle the names in the name section as they no longer match the symbol table names.

If we want to stick with the decision than maybe we should move to having libObject completely ignore the name section. For example, all of the native tooling expects symbol names to be mangled with the option to de-mangle them (think nm vs nm -C).

Anyway, I think all internal names and symbols should be mangled and valid C identifiers. Demangling things should be reserved for the moment of displaying. For this reason I think we should either reverse our decision to store demangled names in the name section, or ignore the name section when it comes to naming things internally and rely on the symbol table instead.

Another way of thinking about this: Names for sections, segments, chunks, and symtab entries should be all be valid C identifiers. If we allow for pretty names with spaces and unicode and whatnot it should be reserved for the names section only. If we choose go this route we can't then apply the names from the name section to segments, chunk, symtab entries, etc...

aprantl removed a subscriber: aprantl.Mar 14 2018, 10:48 AM

I think the symbol table (and any names that actually have meaning to the compiler and/or linker) should be mangled names, and only the name section should have demangled names. I also think it makes sense for libObject to only use the symbol table and ignore the name section. In other words, the name section should be thought of as metadata or debug info and in that sense shouldn't be "trusted" to always be correct (although of course we should preserve it wherever we can).

I opened a can of worms! I think it can be cleaned up though...

In D44316#1037421, @sbc100 wrote:

Hmm.. its kind of shame maybe that we decided to demangle the names in the name section as they no longer match the symbol table names.

If we want to stick with the decision than maybe we should move to having libObject completely ignore the name section. For example, all of the native tooling expects symbol names to be mangled with the option to de-mangle them (think nm vs nm -C).

That's currently pretty-much how it is. libObject does use the "name" section to assign names to Functions (and will assign names to Segments when the "name" section is soon extended to allow for that...) - but that's fairly reasonable. We never currently have unmangled symbol names, which is what the tooling expects. Symbol names are sacrosanct, I haven't gone near them.

Anyway, I think all internal names and symbols should be mangled and valid C identifiers. Demangling things should be reserved for the moment of displaying. For this reason I think we should either reverse our decision to store demangled names in the name section, or ignore the name section when it comes to naming things internally and rely on the symbol table instead.

I'm not so sure - is there really a big problem with having free-form text for Function/Segment names? That's what it comes down to... the browser needs them unmangled, and the "name" section is a straightforward assignment of names to the Wasm objects... so why not have our Wasm objects named with some demangled names?

If that's not allowed... then the fix should be fairly straightforward. We'll just create a "getDebugName()" field on InputChunk, which will be similar to the existing getName() field, it will be set from the "name" section by libObject, and written out to the "name" section (defaulting to just getName() if there's no DebugName on the Chunk).

I think that would satisfy all the concerns above?

In D44316#1037454, @dschuff wrote:

I think the symbol table (and any names that actually have meaning to the compiler and/or linker) should be mangled names, and only the name section should have demangled names. I also think it makes sense for libObject to only use the symbol table and ignore the name section. In other words, the name section should be thought of as metadata or debug info and in that sense shouldn't be "trusted" to always be correct (although of course we should preserve it wherever we can).

Yup, that's roughly much how it is. All meaningful names are mangled currently. Except that currently, libObject passes through the "name" section names, by setting the WasmFunction's Name field - however that's not a "meaningful" name to LLD, it's only used as the string for some logging messages, and for writing back into the "name" section.

Diff 138172

test/wasm/cxx-mangling.ll

				; RUN: llc -filetype=obj %s -o %t.o
				; RUN: wasm-ld --demangle --check-signatures -o %t_demangle.wasm %t.o
				; RUN: obj2yaml %t_demangle.wasm \| FileCheck %s
				; RUN: wasm-ld --no-demangle --check-signatures -o %t_nodemangle.wasm %t.o
				; RUN: obj2yaml %t_nodemangle.wasm \| FileCheck %s

				target triple = "wasm32-unknown-unknown-wasm"

				; Check that the EXPORT name is still mangled, but that the "name" custom
				; section contains the unmangled name.

				define void @_Z3fooi(i32 %arg) {
				ret void
				}

				declare extern_weak void @_Z3bari(i32 %arg)

				define void @_start() {
				call void @_Z3fooi(i32 1)
				call void @_Z3bari(i32 1)
				ret void
				}

				; CHECK: - Type: EXPORT
				; CHECK-NEXT: Exports:
				; CHECK-NEXT: - Name: memory
				; CHECK-NEXT: Kind: MEMORY
				; CHECK-NEXT: Index: 0
				; CHECK-NEXT: - Name: __heap_base
				; CHECK-NEXT: Kind: GLOBAL
				; CHECK-NEXT: Index: 1
				; CHECK-NEXT: - Name: __data_end
				; CHECK-NEXT: Kind: GLOBAL
				; CHECK-NEXT: Index: 2
				; CHECK-NEXT: - Name: _start
				; CHECK-NEXT: Kind: FUNCTION
				; CHECK-NEXT: Index: 3
				; CHECK-NEXT: - Name: _Z3fooi
				; CHECK-NEXT: Kind: FUNCTION
				; CHECK-NEXT: Index: 2
				; CHECK-NEXT: - Type: CODE
				; CHECK-NEXT: Functions:
				; CHECK-NEXT: - Index: 0
				; CHECK-NEXT: Locals:
				; CHECK-NEXT: Body: 0B
				; CHECK-NEXT: - Index: 1
				; CHECK-NEXT: Locals:
				; CHECK-NEXT: Body: 000B
				; CHECK-NEXT: - Index: 2
				; CHECK-NEXT: Locals:
				; CHECK-NEXT: Body: 0B
				; CHECK-NEXT: - Index: 3
				; CHECK-NEXT: Locals:
				; CHECK-NEXT: Body: 410110828080800041011081808080000B
				; CHECK-NEXT: - Type: CUSTOM
				; CHECK-NEXT: Name: name
				; CHECK-NEXT: FunctionNames:
				; CHECK-NEXT: - Index: 0
				; CHECK-NEXT: Name: __wasm_call_ctors
				; CHECK-NEXT: - Index: 1
				; CHECK-NEXT: Name: 'undefined function bar(int)'
				; CHECK-NEXT: - Index: 2
				; CHECK-NEXT: Name: 'foo(int)'
				; CHECK-NEXT: - Index: 3
				; CHECK-NEXT: Name: _start
				; CHECK-NEXT: ...

wasm/Driver.cpp

Show All 11 Lines
#include "InputChunks.h"		#include "InputChunks.h"
#include "InputGlobal.h"		#include "InputGlobal.h"
#include "MarkLive.h"		#include "MarkLive.h"
#include "SymbolTable.h"		#include "SymbolTable.h"
#include "Writer.h"		#include "Writer.h"
#include "lld/Common/Args.h"		#include "lld/Common/Args.h"
#include "lld/Common/ErrorHandler.h"		#include "lld/Common/ErrorHandler.h"
#include "lld/Common/Memory.h"		#include "lld/Common/Memory.h"
		#include "lld/Common/Strings.h"
#include "lld/Common/Threads.h"		#include "lld/Common/Threads.h"
#include "lld/Common/Version.h"		#include "lld/Common/Version.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Object/Wasm.h"		#include "llvm/Object/Wasm.h"
#include "llvm/Option/ArgList.h"		#include "llvm/Option/ArgList.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/Process.h"		#include "llvm/Support/Process.h"
▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	for (Symbol *Sym : Symtab->getSymbols()) {

// It is possible for undefined functions not to have a signature (eg. if		// It is possible for undefined functions not to have a signature (eg. if
// added via "--undefined"), but weak undefined ones do have a signature.		// added via "--undefined"), but weak undefined ones do have a signature.
assert(FuncSym->getFunctionType());		assert(FuncSym->getFunctionType());
const WasmSignature &Sig = *FuncSym->getFunctionType();		const WasmSignature &Sig = *FuncSym->getFunctionType();

// Add a synthetic dummy for weak undefined functions. These dummies will		// Add a synthetic dummy for weak undefined functions. These dummies will
// be GC'd if not used as the target of any "call" instructions.		// be GC'd if not used as the target of any "call" instructions.
StringRef StubName = Saver.save("undefined function " + toString(*Sym));		Optional<std::string> SymName = demangleItanium(Sym->getName());
		StringRef StubName =
		Saver.save("undefined function " +
		(SymName ? StringRef(*SymName) : Sym->getName()));
		sbc100Unsubmitted Not Done Reply Inline Actions Why this change? sbc100: Why this change?
		ncwAuthorUnsubmitted Not Done Reply Inline Actions To make it consistent - here we're demangling a symbol name and putting it in the "name" section, and in Writer.cpp we're also demangling a symbol name and putting it in a name section. In Writer, you agreed we wanted to demangle, regardless of whether `Config->Demangle` was set... so here we should also be demangling regardless of whether the config option is set. I've added this condition to the test, to make it clearer that it's part of the same change, and is tested in the same place. ncw: To make it consistent - here we're demangling a symbol name and putting it in the "name"…
		sbc100Unsubmitted Not Done Reply Inline Actions It looks like what you are doing here is creating a new function, with a given name. Function name should be mangled shouldn't they? They only get demangled when you make the name section. Don't you want the stub function to have the same name as the symbol? I'm still confused why you demangle here. sbc100: It looks like what you are doing here is creating a new function, with a given name. Function…
		ncwAuthorUnsubmitted Not Done Reply Inline Actions I'm demangling here because in my mind the function name is the demangled thing? The InputChunk's name is set from reading in the "name" section - or if that doesn't exist it falls back to using the symbol name. If we want the symbol to appear as "prefix + demangled name" (eg "undefined function foo()") in the browser stack traces, then we kind of have to demangle at the point where we create the SyntheticFunction. We can't create something that will demangle to that, given the prefix... So what's messy is that WasmFunction::Name is already a sometimes-mangled-sometimes-demangled field (since it can be set from the "name" section or the symbol name). Do you have a suggestion for how you'd do it - so that its name in the "name" section is "undefined function foo()", but so that the SyntheticFunction itself has a different name? ncw: I'm demangling here because in my mind the function name is the demangled thing? The…
SyntheticFunction *Func = make<SyntheticFunction>(Sig, StubName);		SyntheticFunction *Func = make<SyntheticFunction>(Sig, StubName);
Func->setBody(UnreachableFn);		Func->setBody(UnreachableFn);
// Ensure it compares equal to the null pointer, and so that table relocs		// Ensure it compares equal to the null pointer, and so that table relocs
// don't pull in the stub body (only call-operand relocs should do that).		// don't pull in the stub body (only call-operand relocs should do that).
Func->setTableIndex(0);		Func->setTableIndex(0);
Symtab->SyntheticFunctions.emplace_back(Func);		Symtab->SyntheticFunctions.emplace_back(Func);
// Hide our dummy to prevent export.		// Hide our dummy to prevent export.
uint32_t Flags = WASM_SYMBOL_VISIBILITY_HIDDEN;		uint32_t Flags = WASM_SYMBOL_VISIBILITY_HIDDEN;
▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

wasm/Writer.cpp

Show First 20 Lines • Show All 522 Lines • ▼ Show 20 Lines	void Writer::createNameSection() {
if (NumNames == 0)		if (NumNames == 0)
return;		return;

SyntheticSection *Section = createSyntheticSection(WASM_SEC_CUSTOM, "name");		SyntheticSection *Section = createSyntheticSection(WASM_SEC_CUSTOM, "name");

SubSection Sub(WASM_NAMES_FUNCTION);		SubSection Sub(WASM_NAMES_FUNCTION);
writeUleb128(Sub.OS, NumNames, "name count");		writeUleb128(Sub.OS, NumNames, "name count");

// Names must appear in function index order. As it happens ImportedSymbols		// Names must appear in function index order. As it happens ImportedSymbols
		sbc100Unsubmitted Not Done Reply Inline Actions For some reason I feel like this should just be static helper function rather than a lambda. I can imagine it being useful elsewhere too. sbc100: For some reason I feel like this should just be static helper function rather than a lambda. I…
// and InputFunctions are numbered in order with imported functions coming		// and InputFunctions are numbered in order with imported functions coming
		sbc100Unsubmitted Not Done Reply Inline Actions I'm not sure the demanding config option should be used here. Lets just pick one format the for the name section and stick with it (i.e. lets always demanding here). I had indented the demanding config option (which I notice is actually uninitialzed!, oops) to be for the linker output and error mesages to make them more readable. That seems orthogonal to the format the names sections which should be fixed one way to the other IMHO. sbc100: I'm not sure the demanding config option should be used here. Lets just pick one format the…
// first.		// first.
for (const Symbol *S : ImportedSymbols) {		for (const Symbol *S : ImportedSymbols) {
if (auto *F = dyn_cast<FunctionSymbol>(S)) {		if (auto *F = dyn_cast<FunctionSymbol>(S)) {
writeUleb128(Sub.OS, F->getFunctionIndex(), "func index");		writeUleb128(Sub.OS, F->getFunctionIndex(), "func index");
		sbc100Unsubmitted Not Done Reply Inline Actions It seems like all the users of demangleItanium use it in this way, so perhaps its worth changing the function to have this behavior (i.e. no need to return an Optional at all and just return the original if demangling fail). We can do that as a followup perhaps. sbc100: It seems like all the users of demangleItanium use it in this way, so perhaps its worth…
		ncwAuthorUnsubmitted Not Done Reply Inline Actions I hummed and ha-ed about that. It has to return a std::string since the demangled name isn't cached anywhere - but you don't really want to return a std::string copy for the non-mangled symbol names (since that would be doing a string copy for every symbol name, which is potentially wasteful). So inlining it here is the only way to avoid a silly copy in the non-mangled case. In our codebase, we have a class StringCopyOpt that looks like this: class StringCopyOpt { StringCopyOpt(char* s, bool own=false) : s(s), own(own) {} ~StringCopyOpt() { if (own) delete [] s; } // move ctor and operator, etc... char* s; bool own; }; So it allows automatic cleanup, but without requiring you make a copy for every return value, just because some return values need a copy. I can't find anything similar in LLVM, but it would be ideal as the return value from demangleItanium. ncw: I hummed and ha-ed about that. It has to return a std::string since the demangled name isn't…
writeStr(Sub.OS, F->getName(), "symbol name");		Optional<std::string> Name = demangleItanium(F->getName());
		writeStr(Sub.OS, Name ? StringRef(*Name) : F->getName(), "symbol name");
}		}
}		}
for (const InputFunction *F : InputFunctions) {		for (const InputFunction *F : InputFunctions) {
if (!F->getName().empty()) {		if (!F->getName().empty()) {
writeUleb128(Sub.OS, F->getFunctionIndex(), "func index");		writeUleb128(Sub.OS, F->getFunctionIndex(), "func index");
writeStr(Sub.OS, F->getName(), "symbol name");		Optional<std::string> Name = demangleItanium(F->getName());
		writeStr(Sub.OS, Name ? StringRef(*Name) : F->getName(), "symbol name");
}		}
}		}

Sub.writeTo(Section->getStream());		Sub.writeTo(Section->getStream());
}		}

void Writer::writeHeader() {		void Writer::writeHeader() {
memcpy(Buffer->getBufferStart(), Header.data(), Header.size());		memcpy(Buffer->getBufferStart(), Header.data(), Header.size());
▲ Show 20 Lines • Show All 429 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Demangle symbol names for use by the browser debugger
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 138172

test/wasm/cxx-mangling.ll

wasm/Driver.cpp

wasm/Writer.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Demangle symbol names for use by the browser debuggerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 138172

test/wasm/cxx-mangling.ll

wasm/Driver.cpp

wasm/Writer.cpp

[WebAssembly] Demangle symbol names for use by the browser debugger
ClosedPublic