This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho][nfc] Move liveness-tracking fields into ConcatInputSection
ClosedPublic

Authored by int3 on Jun 9 2021, 10:25 AM.

Details

Summary

These fields currently live in the parent InputSection class,
but they should be specific to ConcatInputSection, since the other
InputSection classes (that contain literals) aren't atomically live or
dead -- rather their component string/int literals should have
individual liveness states. (An upcoming diff will add liveness bits for
StringPieces and fixed-sized literals.)

I also factored out some asserts for isCoalescedWeak() in MarkLive.cpp.
We now avoid putting coalesced sections in the inputSections vector,
so we don't have to check/assert against it everywhere.

Diff Detail

Event Timeline

int3 created this revision.Jun 9 2021, 10:25 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: jfb. · View Herald Transcript
int3 requested review of this revision.Jun 9 2021, 10:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2021, 10:25 AM
thakis added a subscriber: thakis.Jun 10 2021, 5:37 AM
thakis added inline comments.
lld/MachO/MarkLive.cpp
165

Is that correct? Imagine a S_ATTR_LIVE_SUPPORT symbol pointing to a live literal.

thakis accepted this revision.Jun 10 2021, 5:54 AM

This change as-is is fine.

But overall, it feels like things get a lot more complicated because we're not creating real InputSections for each literal in literal sections. Are there so many more literals than normal symboled inputsections? What's the memory / perf hit from just having normal InputSections for each literal?

lld/MachO/MarkLive.cpp
165

Nevermind, this is the referring section, not the referent.

This revision is now accepted and ready to land.Jun 10 2021, 5:54 AM

(Also FYI if you use the "Edit Related Revisions…" button in the upper right on phab, the presubmit bots can correctly handle dependent changes)

int3 added a comment.EditedJun 11 2021, 4:38 PM

But overall, it feels like things get a lot more complicated because we're not creating real InputSections for each literal in literal sections.

I agree, it's quite unfortunate...

Are there so many more literals than normal symboled inputsections? What's the memory / perf hit from just having normal InputSections for each literal?

Good questions! For chromium_framework, here are the relative counts of literals and subsections before deduplication (generated via D104158):

word literals: 145224 (8%)
cstring literals: 353031 (20%)
subsections: 1260720 (72%)

So actual subsections are still by far the largest, but cstrings take up a sizable chunk regardless. I also hacked up a diff that creates one InputSection per string, and even without doing dedup, it's already slower: D104159

The remaining question is... could we trim InputSection's size and close this performance gap? There are definitely opportunities here (e.g. we could replace name, segname, and flags with a pointer to the original section_64 header.) But there's still quite a number of other fields that I doubt can be removed. So while this is not a watertight analysis, I think it's enough to make a case for landing this architectural change.

This revision was landed with ongoing or failed builds.Jun 11 2021, 4:50 PM
This revision was automatically updated to reflect the committed changes.

One of your 4 commits broke the build: http://45.33.8.238/linux/48738/step_4.txt

Please take a look and revert for now if it takes a while to fix.

And consider spreading out landing commits so that it's easier to see which one breaks things :)

int3 added a comment.Jun 11 2021, 5:01 PM

And consider spreading out landing commits so that it's easier to see which one breaks things :)

We don't have internal buildbots that notify us of upstream breakages this quickly, so I usually depend on the public LLVM buildbots, which also typically take an hour+ to get to my diffs. Hence the batching.

Anyway, this one looks like my bad for not having tested with a debug/RelWithAsserts build... running one locally now

int3 added a comment.Jun 11 2021, 5:32 PM

I ended up removing a lot of the asserts since sprinkling casts to ConcatInputSection seemed pretty awkward. I think you added most of these asserts, so lmk if you think we should put them back

thakis added a comment.EditedJun 11 2021, 5:49 PM

You can look at http://45.33.8.238/ , it's public and the Linux bot on it cycles in a little over 3 minutes :)

The asserts are kind of load bearing in that if they fire we'll write invalid output, so I think it'd be nice to keep them. Maybe the cast can go in a helper or something?

But overall, it feels like things get a lot more complicated because we're not creating real InputSections for each literal in literal sections.

I agree, it's quite unfortunate...

Are there so many more literals than normal symboled inputsections? What's the memory / perf hit from just having normal InputSections for each literal?

Good questions! For chromium_framework, here are the relative counts of literals and subsections before deduplication (generated via D104158):

word literals: 145224 (8%)
cstring literals: 353031 (20%)
subsections: 1260720 (72%)

So actual subsections are still by far the largest, but cstrings take up a sizable chunk regardless. I also hacked up a diff that creates one InputSection per string, and even without doing dedup, it's already slower: D104159

The remaining question is... could we trim InputSection's size and close this performance gap? There are definitely opportunities here (e.g. we could replace name, segname, and flags with a pointer to the original section_64 header.) But there's still quite a number of other fields that I doubt can be removed. So while this is not a watertight analysis, I think it's enough to make a case for landing this architectural change.

Thanks for collecting this data!

The ELF's port's -fdata-sections behavior means it has one actual section per string, right?

Since most sections are "normal" subsection sections, we kind of need to optimize sections anyways. But we can do that first and then reconsider if we need the special cases for literals.

hokein added a subscriber: hokein.Jun 14 2021, 7:00 AM

Looks like this change and its follow up 5de7467e9821485f492eb97fafd796e1db4c6bb5 causes a test failure (lld::alignment-too-large.yaml) in ppc buildbot: https://lab.llvm.org/buildbot/#/builders/36/builds/9356.

Please take a look.