This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] LLD: Don't write out discarded function symbols
AbandonedPublic

Authored by ncw on Dec 19 2017, 5:50 AM.

Details

Reviewers
ruiu
sbc100
Summary

This is small change, on the principle, "let's keep the commits small and take little baby steps in each commit".

APPLIES AFTER:

This beefs up Sam's change with a few lines of code taken from ELF-LLD, which has the notion of "discarded" input sections. We now have "discarded" InputSegments/InputFunctions.

As proof of concept, I've implemented discarding duplicate weak symbols. If two files both define the same function weakly, then the second copy shouldn't be written out, since all calls to it will go via relocations, and no relocations can touch the function if it's not reachable via any Symbols. Hence if every translation unit provides a function like "std::string::whatever" it will only be written out once.

The behaviour is covered by existing tests, which exercise the weak symbols code, and assert on the output.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

ncw created this revision.Dec 19 2017, 5:50 AM
ncw updated this revision to Diff 127539.Dec 19 2017, 8:39 AM

Updated to make the diff a bit shorter/tidier.

(I've rolled in a few "cosmetic" changes like removing an unused variable SymbolTable::SyntheticSymbols - I hope that's OK, it didn't seem worth a review on its own.)

Cool. Didn't expect that to be so simple :)

I'll submit the cosmetic changes separately.

wasm/InputFiles.cpp
233

How about just:

if (S->getFile() != this)  {
    // This symbol was already defined in another file.
    Functions[WasmSym.ElementIndex - NumFunctionImports] = &InputFunction::Discarded;
}

And can we do the same for globals?

wasm/InputFunction.cpp
18 ↗(On Diff #127539)

Seems a shame to add a new source file just for this.

I've not looked at how the other linker do this, but could we not just have a "Discarded" flag on these objects instead of the these magic statics?

ncw added inline comments.Dec 19 2017, 2:07 PM
wasm/InputFiles.cpp
233

Sadly that won't work, the extra layer of redirection with ReachedFunctions is definitely necessary. The way you wrote it was my first attempt!

Here's the problematic C code:

void defaultImpl(void) { ... }
void aliasedCall(void) __attribute__((weak,alias("defaultImpl")));

static void consumingFunction() {
  aliasedCall();
  defaultImpl();
}

Here, you can see the symbol "aliasedCall" is a function export - so the function body for "defaultImpl" is exported with two different names in the exports section, "defaultImpl" and "aliasedCall".

Now imagine that another translation unit defines the symbol "aliasedCall" (either strongly or just gets in first). In this case, we'll get S->getFile() != this for the Function when we process the "aliasedCall" export. Yet, the Function is reachable through two different Symbols, and "defaultCall()" should still reach the function in this translation unit.

Short answer: if one Symbol points to another file, that's not enough reason to prune that Symbol's InputFunction. We need every Symbol that points to the Function to be redirected in order to safely prune it.

Luckily we have a test for it already. One of the aliased symbol tests caught this bug.


Secondly, globals. Hmm, it's trickier. In general, no, since an InputSegment can contain multiple globals? We could put in some code to count how many globals refer to each InputSegment, and if it's a segment with a solitary global we could prune it... Would be interesting to see how much the filesize goes down.

My expectation is that almost all the "weak" symbols that are used in the wild are C++ inlines. Even low-level systems code typically only uses weak symbols in a small number of files.

Thus, optimising the case of weak symbols is not a priority. Comdat handling will prune the duplicate InputSegments that actually occur in the wild, and we don't really need to handle the non-Comdat case.

wasm/InputFunction.cpp
18 ↗(On Diff #127539)

The file is doing no harm. We might need the file anyway eventually. Let's not let the length of the file affect any other decisions.

Good question about Discarded, I was initially surprised too.

The "Discarded" implementation I'm using comes directly from the ELF linker, to keep Rui happy. It's actually quite sensible: if you have a flag, there are two problems. a) you have to check the flag everywhere, and b) you have to keep on checking it forever whenever someone adds new code, ie it's a pain for ever.

Redirecting to a dummy Function like "Discarded" is much safer. Since all the fields in Discarded are null, there's no chance at all that you'll use it by mistake. You only need to check against Discarded in a couple of places, everywhere else can reference the InputFuction/InputSegment without worrying - you'll pass on its nullness if you reference the members.

Keeping it similar to ELF-LLD is probably a good enough reason to use the idiom for now, in any case.

Cool. Didn't expect that to be so simple :)

I'll submit the cosmetic changes separately.

wasm/InputFunction.cpp
18 ↗(On Diff #127539)

I don't think we should aim for complete parity with ELF. For example there is no ::Discarded thing in the COFF version. Not to say that we shouldn't go this route but I think we need more of a reason.

ELF seems to use both Discarded.. the the Live flag. I don't see why we could just mark this section as Live = false, it would need to be checked in exactly the same number of places. I also think Sec->isLive() is much more readable than Sec != &InputSection::Discarded). But obviously ELF chose to have both these method so maybe we will run into the same reason they did.

wasm/InputSegment.h
66 ↗(On Diff #127539)

We don't need this if we are not going to do the same for globals, right?

ncw added a comment.Dec 20 2017, 1:04 AM

Sorry also for the lengthy responses!

It would be nice to merge as much as possible before the holidays, or there'll be lots of rebasing pain when we come back from holidays and find patches that are weeks out of date.

Having said it's not important, I might have time today to add some detection for discarding the segment containing a weak global.

wasm/InputFunction.cpp
18 ↗(On Diff #127539)

I think COFF and ELF use isLive for GC - so we shouldn't use that here for something else. In LLD I think "discarded" means "we've thrown it away, we'll never use it", and sections are initially not discarded, but can be discarded, and that's a definitive decision for the section. "Live" means the section is reachable, and it's initially false and set to true as reachability is calculated during the GC phase.

I could change it to setDiscarded/isDiscarded, if you'd prefer that, I don't particularly mind. Rui did request we keep variable naming and idioms consistent between the different backends.

wasm/InputSegment.h
66 ↗(On Diff #127539)

We will be doing the same for globals in the Comdat review D40845 (which stacks on top of this one). I haven't implemented segment discarding based on weak globals (as I have for functions), but I have implemented segment discarding based on Comdats.

This will be used in future.

ncw updated this revision to Diff 127720.Dec 20 2017, 7:21 AM
ncw edited the summary of this revision. (Show Details)

Updated:

  • Rebased to be on top of Sam's D41419
  • Added handling of global symbols, allowing InputSegments to be discarded as well if the InputSegment contains nothing else apart from the discarded Global

All these refactoring patches are going to be a big pain if they can't be merged quickly...

ncw retitled this revision from Wasm LLD: Don't write out discarded function symbols to [WebAssembly] LLD: Don't write out discarded function symbols.Dec 20 2017, 7:31 AM
ncw updated this revision to Diff 129266.Jan 10 2018, 6:13 AM

Rebased as requested. Tests pass.

sbc100 added inline comments.Jan 10 2018, 11:48 AM
wasm/InputChunks.cpp
26 ↗(On Diff #129266)

This size of a global symbol isn't represented this way. In fact we don't represent the size at all currently.

ncw updated this revision to Diff 129431.Jan 11 2018, 4:25 AM
ncw edited the summary of this revision. (Show Details)

Updated to be on top of D40845, and removed non-quite-correct handling of globals/segments.

ncw abandoned this revision.Jan 11 2018, 4:27 AM

I'm abandoning this!

The changes are too small and not worth doing really, if we plan to merge Comdat support without having general weak-symbol discarding.

Presumably this case will be eventually covered by a GC implementation.

wasm/InputChunks.cpp
26 ↗(On Diff #129266)

It took me a moment to see what you mean - the type doesn't tell you the size! For a struct, it seems like the type is 'i32', although the size will be typically much larger than 4.

OK, this doesn't work. We can't exclude Segments based on whether all the data in them is part of a global - maybe we can assume though that each global "owns" the segment that contains it?

Doesn't really matter, I'll abandon this revision.

Actually I did an experiment with ELF and it looks like unused weak symbols are not stripped by default. i.e. you need to use --gc-sections to achieve this. So I think it makes sense for us to do the same (and --gc-sections will be the default for us).