This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Don't duplicate strings in SYM_INFO subsection
AbandonedPublic

Authored by sbc100 on Jan 18 2018, 7:11 PM.

Details

Reviewers
ncw
sunfish
Summary

Instead reference symbols by type+index.

This change is based on D42117, but doesn't depend on any other patches.

Fixes: https://github.com/WebAssembly/tool-conventions/issues/30

Diff Detail

Event Timeline

sbc100 created this revision.Jan 18 2018, 7:11 PM
sbc100 edited the summary of this revision. (Show Details)Jan 18 2018, 7:13 PM
sbc100 added reviewers: ncw, sunfish.
sbc100 updated this revision to Diff 130545.Jan 18 2018, 7:15 PM
  • revert
sbc100 updated this revision to Diff 130547.Jan 18 2018, 7:16 PM
  • revert
ncw accepted this revision.Jan 19 2018, 1:34 AM

This is really doubling-down on the current import-and-export strategy!

My concern now is #5 rebased on top of this will be larger, and have more test changes. Now that syminfo depends on the duplicate-imports, when they're taken away the diff is going to be that much bigger. I just don't see the incentive to merge #6 first, when my goal is to keep the diff to an absolute minimum in commits like #5 which are "contentful" changes. The order of changes I'd come up was based on trying to minimise the number of times the tests would have to be updated (to reduce the chance of masking a bug by updating the same things repeatedly).

This change is using/abusing the Wasm indexes in the same way as my #2, but #2 really had to land before #5, whereas this is just a cosmetic change to reduce filesize, and doesn't really need to.

Hmm. It looks like it should work, so I guess your order is OK. Thanks for helping me land the changes!

This revision is now accepted and ready to land.Jan 19 2018, 1:34 AM
ncw added inline comments.Jan 19 2018, 1:36 AM
include/llvm/Object/Wasm.h
268

It's a shame we need two maps now! I think it my patch order, we were able to change the map's type in one commit, then remove it in the next.

lib/MC/WasmObjectWriter.cpp
1045

You've dropped a if (Flags != 0) check.

ncw added a comment.Jan 19 2018, 2:10 AM

Oh, having seen the LLD changes I'm torn about whether it's good to land this one first or not.

What it really comes down to is: are you planning to land #5 or not? All the other changes are based around #5, and if that one doesn't go in, I'll be sad, because it's really the main change I'm trying to achieve to with the whole patch chain. Having a single unique index per Symbol, and a single unique import/export per Symbol actually enables these changes to refer to Symbols by index.

Unless you are planning to land #5 subsequently (or something else that does the same job), I'd be uneasy landing #6 at all - personally.

In D42279#981492, @ncw wrote:

Oh, having seen the LLD changes I'm torn about whether it's good to land this one first or not.

What it really comes down to is: are you planning to land #5 or not? All the other changes are based around #5, and if that one doesn't go in, I'll be sad, because it's really the main change I'm trying to achieve to with the whole patch chain. Having a single unique index per Symbol, and a single unique import/export per Symbol actually enables these changes to refer to Symbols by index.

Unless you are planning to land #5 subsequently (or something else that does the same job), I'd be uneasy landing #6 at all - personally.

I'm still not sure about landing #5. The goal of this patch was to address issue 30 but it should also make you #5 patch simply perhaps?

Not that I think about it though, maybe it time to create a prototype of my "full symbol table" idea so that it can be compared the #5 change.

Perhaps landing this breaking change in the interim is not such a good idea.

ncw added a comment.Jan 19 2018, 10:52 AM

Yeah, I can see why it's scary, because it involves changing the way things are actually modelled. But I find it easier to reason about the linker once #5 has landed.

We have Symbols, written out by Clang (WasmObjectWriter) and LLD (with --relocatable), and in Clang we have the duplicated-import hack where some symbols are written as both an import and an export - but LLD doesn't write out relocatable objects that hack, so there's disagreement between the tools, so we can't compare the outputs to check that "linking" a single object file with --relocatable leaves it unchanged.

And then we have to have special code in LLD (WasmObjectFile) when it reads in the object files to recover the symbol table from the imports and exports - but that's not documented in Linking.md and it doesn't make it clear how the indexes are actually worked out unless I think hard about it. And so Symbols don't have a unique index that's easy to reason about.

It comes down to this confusion between indexes into the array of functions and indexes into the array of symbols - but the symbols in a C file don't match the function bodies in a C file, it's just a lucky fluke for simple files. Once we cleanly separate the concept of function/global bodies from the concept of symbols, the linking model (should) become simpler.

This change is switching from referencing symbols by name to referencing them by index, but we haven't got a straightforward concept yet of symbol index. It's a shame to effectively increase the number of places where we're abusing symbols, which drives up the number of lines of diff in #5 (if it lands after this).

To me, "don't duplicate symbol strings" is very low priority, it just reduces filesize a bit. But getting the concepts straight should benefit the codebase throughout.

That's assuming that #5 does actually conceptually improve things. But #5 isn't even my idea! Having each Symbol represented by one import or export is how it was supposed to be, it's simply that the relocation indexes weren't being calculated correctly - in a sense #5 is conservative change that's just trying to get back to the original, simple, model.

sbc100 abandoned this revision.Jan 19 2018, 10:55 AM