This is an archive of the discontinued LLVM Phabricator instance.

[lld][WebAssembly] Initial support merging string data
ClosedPublic

Authored by sbc100 on Feb 28 2021, 10:03 PM.

Details

Summary

This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.

Unlike the ELF linker this support is currently limited:

  • No support for SHF_MERGE (non-string merging)
  • Always do full tail merging ("lo" can be merged with "hello")
  • Only support single byte strings (p2align 0)

Like the ELF linker merging is only performed at -O1 and above.

This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)

Diff Detail

Event Timeline

sbc100 created this revision.Feb 28 2021, 10:03 PM
sbc100 requested review of this revision.Feb 28 2021, 10:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2021, 10:03 PM

PTAL. This is almost ready to land now.

I've tested it against the ensure emscripten test suite: https://github.com/emscripten-core/emscripten/pull/14046. It improves all our microbenchmarks that measure code size.. there is just one strange code size regression that I'm still investigating but I don't think it will block this

Still looking over this, but had a question more up front

lld/wasm/Driver.cpp
388

this means we optimize by default now?

lld/wasm/InputChunks.h
152

should this say "SegmentPiece" instead of "SectionPiece"?

llvm/include/llvm/MC/MCSectionWasm.h
44

I'm a bit fuzzy now on segments and sections. Is the reason we're calling these "segment flags" here and elswhere, just because we are using segments in the wasm binary to represent them?

sbc100 updated this revision to Diff 341718.Apr 29 2021, 5:16 PM
  • feedback
sbc100 updated this revision to Diff 341722.Apr 29 2021, 5:32 PM
  • added new MC test

looks pretty good overall, a lot of this is making sure I understand everything

lld/wasm/InputChunks.cpp
374

This is dead because of the assertion above.

477

just to be clear, each thread has a different this object? When this function calls splitStrings, the emplaceBack will definitely allocate.

490

is the map checked somewhere else?

lld/wasm/InputChunks.h
42

Is "Merge" a segment that can be merged, and "MergedSegment" after merging? Would "MergeableSegment" better reflect what it means?

lld/wasm/InputFiles.cpp
435

Does clang explicitly pass -O0 to the linker to opt out, or does the user have to do that with -Wl?

lld/wasm/OutputSegment.cpp
42

the name mergeSegments is slightly confusing given (IIUC) that "merge segments" is used elsewhere to refer to segments before merging and here they are already-merged segments.

54

this capture could probably just be ms instead of =

llvm/lib/MC/MCObjectFileInfo.cpp
798

So this sets the section flag as a string section, but we just don't merge them yet?

llvm/lib/MC/MCParser/WasmAsmParser.cpp
107

Why this change? I don't see any new callers of this function, and the error message is worse now.

Edit: actually, does TokError print the token that it read? if so, then maybe it's no worse.

165

This should maybe clarify that it's a section flag (rather than the generic "flag")?

sbc100 added inline comments.Apr 29 2021, 6:00 PM
lld/wasm/Driver.cpp
388

Sorry I should have mentioned that in the description. This doesn't change any existing behaviour since there were now users of config->optimize until now(!). And yes we optimize by default, like the ELF linker. Note that this is different to -lto-O2 which sets the LTO level.

lld/wasm/InputChunks.cpp
477

Again this is just cargo culted over.. I'm not sure what this comments means by "pools" but I think it must be refering to some linker specific pools, as opposed to malloc in general, which splitStrings clearly does.

490

This code comes directly from lld/ELF/InputSection.cpp ... I honestly didn't spend time trying to grok it its just cargo cults over :-/

lld/wasm/OutputSegment.cpp
42

Ah yes, perhaps mergedSegments is better.

42

Actually this also comes from ELF which has std::vector<MergeSyntheticSection *> mergeSections; but they have slightly different class naming going on.

54

I copied this from lld/ELF/OutputSections.cpp. Is there a runtime difference are you suggesting that for readability?

llvm/include/llvm/MC/MCSectionWasm.h
44

Indeed, this flag and the two above apply only to data segments.. maybe I can make that more clear, at least in comments.

llvm/lib/MC/MCObjectFileInfo.cpp
798

Yes, currently this will not get used anywhere, or written the binary format. If you prefer I can drop this part of the change and add it in the followup.

llvm/lib/MC/MCParser/WasmAsmParser.cpp
107

Mostly for consistency with ELFAsmParser.cpp that uses this pattern.

165

I copied these lines directly from ELFAsmParser.cpp so I'm fine with this as is. When the user sees this error it also shows them the location on the line where it happens so it pretty obvious.

sbc100 updated this revision to Diff 341725.Apr 29 2021, 6:02 PM
  • feedback

I think I'm generally fine with keeping things that are direct copies from ELF, although it would be nice if it were even easier to tell which pieces those are (there is one comment in a header; is it just that function though?)

lld/wasm/InputChunks.h
216

Do we care about these lint checks? (I guess lld already has different style than LLVM?)

lld/wasm/OutputSegment.cpp
42

I also had a question about the class naming above.

54

mostly just for readability. I think auto-capturing a simple local variable like that should end up the same in the code.

llvm/lib/MC/MCObjectFileInfo.cpp
798

I don't necessarily object; but you also mentioned above that the handling (or just the naming?) might be different for custom sections compared to custom sections.

sbc100 updated this revision to Diff 342113.Apr 30 2021, 9:05 PM
  • feedback
sbc100 added inline comments.Apr 30 2021, 9:08 PM
lld/wasm/InputChunks.h
42

Again taking my lead from ELF here which has enum Kind { Regular, EHFrame, Merge, Synthetic, Output };

216

We do care yes, but I'm having hard time seeing what is wrong here since we use lowerCamal case for locals and params in lld. i guess it must be the trailing _ .. in which case maybe its false positive because it thinks we are using snake_case which would indeed be wrong. I think we can ignore this one.

lld/wasm/InputFiles.cpp
435

I don't think clang ever injects that linker flag, you would need to use -Wl,

llvm/lib/MC/MCObjectFileInfo.cpp
798

Removed this... in fact I reverted this whole file!

dblaikie added inline comments.Apr 30 2021, 9:14 PM
lld/wasm/InputChunks.h
216

What's with the underscores though?

I don't think we usually do that in the LLVM codebase - I think we generally name the parameter the same as the variable & ideally use the ctor's initializer list to initialize members. (or use this->x = x in a pinch)

Hmm, I guess these members are members of a base class? Perhaps the base class should have a ctor that takes them?

sbc100 added inline comments.Apr 30 2021, 10:54 PM
lld/wasm/InputChunks.h
216

In this case I'm initialize members of that superclass so I can't use the initializer list (I think).. so I need to disambiguate.. In the past I've used the underscore suffix disambiguate, but I'm happy to use the this->x = x method instead. I didn't know that was preferred here.

sbc100 updated this revision to Diff 342143.May 1 2021, 7:10 AM
  • use this->member
dblaikie added inline comments.May 1 2021, 8:58 AM
lld/wasm/InputChunks.h
216

re: base class members: Ah, right, sorry, that's what I was trying to get at when I mentioned "perhaps the base class (aka superclass) should have a ctor that takes them" - if the base class doesn't have a ctor that takes flags and alignment, perhaps you could add one?

sbc100 added inline comments.May 1 2021, 9:09 AM
lld/wasm/InputChunks.h
216

Thanks, that does make it a little cleaner.

sbc100 updated this revision to Diff 342157.May 1 2021, 9:10 AM
  • feedback
dschuff accepted this revision.May 3 2021, 2:38 PM

I think this is pretty much ok now, other than that I would like to make it more clear exactly which parts are more-or-less directly copied from ELF; that way if we need to understand more in the future, we know where to look (and also if there are changes to ELF in the future we would know to look at those to see if they make sense for us too)

lld/wasm/InputChunks.cpp
374

This comment still stands (the else cases is dead, right?)

This revision is now accepted and ready to land.May 3 2021, 2:38 PM
sbc100 updated this revision to Diff 344107.May 10 2021, 10:23 AM
  • feedback
lld/wasm/InputChunks.cpp
374

You are right.. for some reason the ELF code seem have code to handle the case when the parent is null.. but I don't see how that could ever happen, and least with wasm code.. removed the condition.

This revision was landed with ongoing or failed builds.May 10 2021, 1:15 PM
This revision was automatically updated to reflect the committed changes.

reverted for now in 061e071d8c9b98526f35cad55a918a4f1615afd4. repros locally with ninja check-llvm-mc-webassembly

Looks like this breaks tests: http://45.33.8.238/macm1/9183/step_11.txt

Fixing now. Sorry about that. I need to learn to pay attention to the pre-commit builders.

reverted for now in 061e071d8c9b98526f35cad55a918a4f1615afd4. repros locally with ninja check-llvm-mc-webassembly

What is the process for re-landing after revert? Do re-upload to this review?

If it's not a big change relative to the first commit, I'd just reland with the same phab link.

In addition to the precommit bot, I'd also recommend running tests locally.

If it's not a big change relative to the first commit, I'd just reland with the same phab link.

In addition to the precommit bot, I'd also recommend running tests locally.

Of course yes. I do always run tests locally but since most of my changes only touch lld I often limit by test runs to lld-only. Forgot I touched common code with this one.