This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Symbol changes #4: export relocatable, LLD only
ClosedPublic

Authored by ncw on Jan 16 2018, 7:08 AM.

Details

Summary

Fourth chunk split out of D41955.

There is no corresponding LLVM patch to go along with this.

Headline: Fix some bugs in the existing handling of exports when LLD generates "relocatable" output


Changes:

  • LLD's relocatable output was writing out an export for all globals (including file-local syms), but not for functions. Oops. To be consistent with non-relocatable output, all symbols (file-local and public) should be exported. Any symbol targetted by further relocations needs to be exported. The lack of local function exports was just an omission, I think.
  • Second bug: Local symbol names can collide, causing an illegal Wasm file to be generated! Oops again. This only previously affected producing relocatable output from two files, where each had a global with the same name. We need to "budge" the symbol names for locals that are exported on relocatable output.
  • Third bug: LLD's relocatable output wasn't writing out any symbol flags! Thus the local globals weren't being marked as local, and the hidden flag was also stripped...
  • Added tests to exercise colliding local names with/without relocatable flag

Diff Detail

Repository
rL LLVM

Event Timeline

ncw created this revision.Jan 16 2018, 7:08 AM
ncw updated this revision to Diff 129956.Jan 16 2018, 7:18 AM
ncw edited the summary of this revision. (Show Details)

Updated: Fix mistake not exporting weak flag

test/wasm/call-indirect.ll
90 ↗(On Diff #129952)

Test expectations updated: some benign reordering of the imports, due to iterating over Symbols in order of definition, rather than in order of first use

test/wasm/init-fini.ll
146 ↗(On Diff #129952)

Test expectation updated: HIDDEN/LOCAL flag should be written out for LLD's relocatable output

ncw updated this revision to Diff 129974.Jan 16 2018, 9:21 AM

Updated: tiny tweak to calculateImports to fix up case of unresolved weak globals

ncw added inline comments.Jan 17 2018, 7:25 AM
wasm/Writer.cpp
602 ↗(On Diff #129974)

I realise that std::set<std::string> isn't very idiomatic in LLVM, but you can why I chose it. I couldn't think of another way of doing it though that didn't require more code doing explicit buffer management.

Suggestions welcome - otherwise it's hopefully "good enough" for now, especially as it's short-lived and only used for exported functions (which I'd expect to be a small number compared to the overall output size).

ncw updated this revision to Diff 130184.Jan 17 2018, 7:39 AM

Updated:

  • Fixed test after rebase on top of __heap_base change
sbc100 accepted this revision.Jan 17 2018, 12:40 PM

Just a couple of minor nits

test/wasm/Inputs/locals-duplicate1.ll
1 ↗(On Diff #130184)

Are there llvm terms you can use here rather than local and public? I think llvm this internal and external linkage, right?

test/wasm/locals-duplicate.test
1 ↗(On Diff #130184)

Can we give these new files better names? Perhaps duplicate-symbols-internal.test. Locals makes me think of wasm locals, and llvm doesn't use the term local for these symbols either (I think?).

wasm/Writer.cpp
602 ↗(On Diff #129974)

I think llvm::DenseSet<StringRef> is used elsewhere in lld and probably makes sense here. Or llvm::StringSet? I'm not sure what the tradeoffs are between those two.

This revision is now accepted and ready to land.Jan 17 2018, 12:40 PM
ncw added inline comments.Jan 17 2018, 2:37 PM
test/wasm/Inputs/locals-duplicate1.ll
1 ↗(On Diff #130184)

Yes, you're right, maybe those are better terms. "XXX_LOCAL" is the name of the flag in the MC source code corresponding to the IR keyword "internal". "public" is because I couldn't think of the correct opposite!

test/wasm/locals-duplicate.test
1 ↗(On Diff #130184)

Well in LLD we have Symbol::isLocal. The name you suggested is better though.

wasm/Writer.cpp
602 ↗(On Diff #129974)

DenseSet<StringRef> and StringSet both assume that the string itself is owned externally. But the "budged" names can't be stored just as a StringRef, something needs to actually own the buffer to the string itself, hence using std::string. And you can't use a StringRef to refer to things in a vector<std::string> either (can you?) because you have to careful mutating the vector not to reallocate the strings and invalidate any StringRefs (I suppose we can rely on move ctors these days).

Nuisance. I don't suppose it's good style to use make<std::string> or similar magic to allocate strings?

I could maybe see about doing some more explicit buffer management to allow use of an llvm::StringSet.

sbc100 added inline comments.Jan 17 2018, 2:51 PM
test/wasm/locals-duplicate.test
1 ↗(On Diff #130184)

Hmm, you are right. Confusing different nomenclature. I think its because in ELF and Wasm binary format we use local and global. Given that I'm OK with the local and global here.. but maybe just replace public with global?

wasm/Writer.cpp
602 ↗(On Diff #129974)

In that case std::set seems reasonable to me.

ncw updated this revision to Diff 130391.Jan 18 2018, 3:49 AM
ncw marked 2 inline comments as done.

Updated:

  • Fixed terminology in comments
  • Use llvm::StringSaver to get rid of std::set<std::string> in favour of llvm::StringSet
wasm/Writer.cpp
602 ↗(On Diff #129974)

Aha! With some more searching, I've found the idiomatic LLVM solution.

It's called llvm::Saver, a global instance of llvm::StringSaver that lets you get a safe/stable StringRef for a fresh string.

Updated now to use that.

sbc100 added inline comments.Jan 18 2018, 3:11 PM
wasm/Writer.cpp
602 ↗(On Diff #129974)

Nice!

I'm not familiar with StringSaver, but would it make sense to use a saver that has the same lifetime as UsedNames? i.e. a function scoped saver?

sbc100 added inline comments.Jan 18 2018, 3:22 PM
wasm/Writer.cpp
600 ↗(On Diff #130391)

I think this is unrelated? At least all the tests still pass without this line.

sbc100 added inline comments.Jan 18 2018, 3:25 PM
wasm/Writer.cpp
390 ↗(On Diff #130391)

Don't we want to iterate through all symbols here, not just the exported ones? This can be a followup change I think.

This revision was automatically updated to reflect the committed changes.
inglorion added inline comments.
lld/trunk/wasm/Writer.cpp
67

gcc seems unhappy with this declaration. It complains:

llvm/tools/lld/wasm/Writer.cpp:67:17: error: declaration of ‘const lld::wasm::Symbol* {anonymous}::WasmExportEntry::Symbol’ [-fpermissive]
   const Symbol *Symbol;
             ^
In file included from llvm/tools/lld/wasm/Config.h:17:0,
                 from llvm/tools/lld/wasm/Writer.cpp:13:
llvm/tools/lld/wasm/Symbols.h:27:7: error: changes meaning of ‘Symbol’ from ‘class lld::wasm::Symbol’ [-fpermissive]
 class Symbol {
       ^

I have put up D42278 to rename the field and stop the naming collision.

sbc100 added inline comments.Jan 18 2018, 7:31 PM
lld/trunk/wasm/Writer.cpp
67

Thanks! I already had a local version ready to go so I'll just push that if thats OK?

ncw added inline comments.Jan 19 2018, 3:06 AM
wasm/Writer.cpp
390 ↗(On Diff #130391)

Oh! For "defined" symbols, we can't iterate through the non-exported ones (any functions not exported don't correspond to a Symbol).

But yes, the imports need to be included too. Oops.

600 ↗(On Diff #130391)

It is needed eventually; Patch #5 will fail without it. This is similar to patch #4b (D42237). Maybe I split this badly and it should have gone in with that...

I'll pull this out into another patch.

602 ↗(On Diff #129974)

A function-scoped one wouldn't work. The StringSet<> UsedNames is function-local, but the StringRefs that are returned get put into the WasmExportEntrys in ExportedSymbols, so are used outside this function.