This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Implement ICF
ClosedPublic

Authored by gkm on May 27 2021, 4:02 PM.

Details

Summary

ICF = Identical C(ode|OMDAT) Folding

This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only -icf all is supported. In order to support -icf safe, we will need to port address-significance tables (.addrsig directives) to MachO, which will come in later diffs.

check-{llvm,clang,lld} have 0 regressions for lld -icf all vs. baseline ld64.

We only run ICF on __TEXT,__text for reasons explained in the block comment in ConcatOutputSection.cpp.

Here is the perf impact for linking chromium_framekwork on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF:

    N           Min           Max        Median           Avg        Stddev
x  20          4.27          4.44          4.34         4.349   0.043029977
+  20          4.37          4.46         4.405        4.4115   0.025188761
Difference at 95.0% confidence
        0.0625 +/- 0.0225658
        1.43711% +/- 0.518873%
        (Student's t, pooled s = 0.0352566)

Diff Detail

Event Timeline

gkm created this revision.May 27 2021, 4:02 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added subscribers: dang, jfb. · View Herald Transcript
gkm requested review of this revision.May 27 2021, 4:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2021, 4:02 PM
gkm planned changes to this revision.May 27 2021, 4:03 PM
gkm edited the summary of this revision. (Show Details)May 27 2021, 4:07 PM

test/ELF/icf*.s have great coverage. It'd be good to ensure the cases are all covered. There is certainly a lot of labor but the return will be good.

gkm updated this revision to Diff 348433.May 27 2021, 7:39 PM
  • fix build break
  • fix clang-tidy complaint
  • remove obsolete #if 0
gkm updated this revision to Diff 348674.May 29 2021, 10:07 PM
  • new test cases
gkm updated this revision to Diff 348992.Jun 1 2021, 9:26 AM
  • add FIXME comments
  • add log() monitoring of fold rejections
  • segregate equivalence classes on a dedicated vector
  • break-out Writer::foldIdenticalSections() from Writer::createOutputSections()
gkm edited the summary of this revision. (Show Details)Jun 2 2021, 8:47 AM
gkm edited the summary of this revision. (Show Details)Jun 2 2021, 11:01 AM
gkm updated this revision to Diff 349671.Jun 3 2021, 1:18 PM
  • resolved some FIXMEs, updated others
  • integrated with new -dead_strip changes
gkm updated this revision to Diff 350080.Jun 5 2021, 2:21 PM
gkm edited the summary of this revision. (Show Details)
  • Resolve more FIXMEs
  • Update & expand comments
int3 added a comment.Jun 7 2021, 7:16 PM

Haven't dug into the details yet, but a few high-level thoughts:

  • The logging code is fairly extensive. Is it really necessary to commit it? If we are leaving it in, could we perhaps cut out InputSection::defined and symbolicate addresses by doing a linear/binary search for the matching symbol?
  • Relatedly: can we have a perf benchmark between LLD-before-this-diff and LLD-with-this-diff-and-ICF-none? I'm wondering if the increased size of the InputSection will cost us.
  • "Folding cannot occur across output-section boundaries, therefore ConcatOutputSection is the natural place to run ICF." -- is this not the case for ELF and COFF? Personally I would prefer keeping "core data types" like ConcatOutputSection as simple as possible, as well as to mirror the ELF/COFF ports more closely.
lld/MachO/ConcatOutputSection.cpp
386

I was hoping that __cfstring would be handled seamlessly by ICF once it's integrated with literal merging. (And I think the current implementation should be able to handle it.)

lld/MachO/InputFiles.cpp
622–628

why is this change necessary?

lld/MachO/InputSection.h
57

I roughly get why we have two entries here, but a comment explaining it would be nice

gkm marked 2 inline comments as done.Jun 8 2021, 9:48 AM
  • The logging code is fairly extensive. Is it really necessary to commit it? If we are leaving it in, could we perhaps cut out InputSection::defined and symbolicate addresses by doing a linear/binary search for the matching symbol?

I can remove the logging and retain it as a patch if/when needed.

  • Relatedly: can we have a perf benchmark between LLD-before-this-diff and LLD-with-this-diff-and-ICF-none? I'm wondering if the increased size of the InputSection will cost us.

Yes.

  • "Folding cannot occur across output-section boundaries, therefore ConcatOutputSection is the natural place to run ICF." -- is this not the case for ELF and COFF? Personally I would prefer keeping "core data types" like ConcatOutputSection as simple as possible, as well as to mirror the ELF/COFF ports more closely.

ELF & COFF also fold only within output sections, but the segregation algorithm runs on all input sections in one batch, so output-section boundaries are enforced explicitly during segregation. Output-section boundaries need not be checked for MachO, because we have already sorted all input sections into ConcatOutputSections. Perhaps I should just remove the comment because it is a minor point and might serve only to mislead.

lld/MachO/ConcatOutputSection.cpp
386

We can experiment with ICF on only __TEXT,__text and __DATA_CONST,__cfstring.

lld/MachO/InputFiles.cpp
622–628

Not strictly necessary. I saw SEGVs when checking isec->data.data() == nullptr because parseSymbols would advance the initial nullptr for a zero-fill section at every split. The alternative is to designate data.data() as undefined for zero-fill sections.

lld/MachO/InputSection.h
57

It is likely best to refer the reader to the block comment in lld/ELF/ICF.cpp where the segregation algorithm is documented.

gkm marked 3 inline comments as done.Jun 8 2021, 9:49 AM
int3 added a comment.Jun 8 2021, 1:32 PM

Output-section boundaries need not be checked for MachO, because we have already sorted all input sections into ConcatOutputSections

Got it. Still... can we have the ICF code in a separate class / file to mirror the other ports?

gkm edited the summary of this revision. (Show Details)Jun 8 2021, 2:20 PM
gkm edited the summary of this revision. (Show Details)Jun 8 2021, 4:32 PM
gkm added a comment.Jun 8 2021, 8:40 PM

Haven't dug into the details yet, but a few high-level thoughts:

  • The logging code is fairly extensive. Is it really necessary to commit it? If we are leaving it in, could we perhaps cut out InputSection::defined and symbolicate addresses by doing a linear/binary search for the matching symbol?

FYI, defined is used extensively from equalsVariable() during the segregation algorithm. It is integral to the operation of ICF, regardless of logging. Removing defined in favor of reverse symbol-table lookups will severely degrade ICF performance.

int3 added a comment.Jun 8 2021, 9:55 PM

Could you point out the line where it's used in equalsVariable? I'm not sure I see it...

(I do see it being used in a non-logging sense in foldIdentical, though.)

gkm added a comment.Jun 9 2021, 9:12 AM

Could you point out the line where it's used in equalsVariable? I'm not sure I see it...

Brain fart. You are correct, there is no non-logging use.

(I do see it being used in a non-logging sense in foldIdentical, though.)

Yes, that's the place it's really used. Does that use justify the .75% overhead? The alternative is making an isec-to-defined map when isec->icfEquivalenceClass & (1ull << 63) .

gkm updated this revision to Diff 350945.Jun 9 2021, 10:30 AM
  • Relocate things from ConcatOutputSection into new class ICF
int3 added a comment.Jun 9 2021, 10:31 AM

Yes, that's the place it's really used. Does that use justify the .75% overhead? The alternative is making an isec-to-defined map when isec->icfEquivalenceClass & (1ull << 63) .

Do the other LLD backends deal with a similar problem, or is this unique to us/Mach-O's format?

int3 added inline comments.Jun 9 2021, 11:16 AM
lld/MachO/InputSection.cpp
116

There can be more than one symbol pointing to a given section, due to N_ALT_ENTRY. Perhaps another point in favor of having a separate isec-to-defined map...

int3 added a comment.Jun 9 2021, 11:25 AM

nit: Can we put the ICF class in its own file too?

gkm added a comment.Jun 9 2021, 11:38 AM
This comment was removed by gkm.
gkm added a comment.Jun 9 2021, 12:17 PM

Yes, that's the place it's really used. Does that use justify the .75% overhead? The alternative is making an isec-to-defined map when isec->icfEquivalenceClass & (1ull << 63) .

Do the other LLD backends deal with a similar problem, or is this unique to us/Mach-O's format?

ELF and COFF add the member InputSection *repl to their input sections. Among a collection of duplicate sections, the dead ones point to the live one. There is no getting around that. The only thing unique about MachO is that dead sections indirect through a defined symbol to reference the live section. I did this for the sake of debugging. I can remove that indirection though and make dead input sections point directly to the live input section. However, there is no away around adding that one pointer to InputSection so that the dead sections can reference the live section.

int3 added a comment.Jun 9 2021, 4:29 PM

Ah I see. Fair enough, there seems no getting around that. Having a pointer to the canonical InputSection instead of a symbol would solve the problem with N_ALT_ENTRY mentioned earlier, though

gkm updated this revision to Diff 351045.Jun 9 2021, 7:32 PM
  • move ICF code into new files ICF.h and ICF.cpp
  • In InputSection, drop Defined *defined in favor of InputSection *replacement
  • remove expensive ICF debug logging
gkm edited the summary of this revision. (Show Details)Jun 9 2021, 9:12 PM
int3 added a comment.Jun 10 2021, 2:57 PM

Nice work :)

lld/MachO/ConcatOutputSection.cpp
24

needs to be moved to ICF.cpp

lld/MachO/Driver.cpp
691

should we have -no_deduplicate as a synonym for -icf=none?

lld/MachO/ICF.cpp
67 ↗(On Diff #351045)

nit: would it be possible to structure the code so that clang-format doesn't insert all this left-padding?

98–102 ↗(On Diff #351045)

I think this doesn't work with N_ALT_ENTRY either; I think we'll need to compare Defined::value too?

107 ↗(On Diff #351045)

do we need to compare gotIndex too?

180 ↗(On Diff #351045)

ditto regarding N_ALT_ENTRY

230–231 ↗(On Diff #351045)

can Mid ever clash with icfInvalidId?

lld/MachO/InputFiles.cpp
622–628

oh that makes sense. Could you leave a comment to that effect?

lld/MachO/InputSection.cpp
58–59

Passing textOutputSection here just for this check seems pretty awkward to me... how about adding a hasPersonality bit on InputSection that gets set directly by UnwindInfoSection? we could make InputSection::live a 1-bit value avoid taking up extra space

98
106–107

I think this should be right since all the symbol references will now be transferred away from the copy

would be good to include a test with N_ALT_ENTRY to check we get this right

lld/MachO/InputSection.h
59

looks like LLD-ELF uses 32 bits for these hashes, we might want to consider doing the same if there's a perf win (no need to sort it out in this diff though)

lld/MachO/Symbols.cpp
45 ↗(On Diff #351045)

how about adding a method like InputSection::canonical() that returns the replacement as necessary?

lld/MachO/UnwindInfoSection.cpp
164

should get<InputSection *>() ever return null? If rFunc was a symbol relocation, we would have to track the corresponding InputSection it belonged to, right?

lld/MachO/UnwindInfoSection.h
36

nit: this doesn't need to be virtual, only the templated methods need to be

lld/MachO/Writer.cpp
941–942

nit: s/std::to_string/Twine/

lld/test/MachO/icf-scale.s
11
13
14
22

not the most descriptive comment :p

lld/test/MachO/icf.s
20
int3 added inline comments.Jun 10 2021, 3:35 PM
lld/MachO/ICF.cpp
107 ↗(On Diff #351045)

actually... since ICF doesn't fold DylibSymbols, can we not just check for pointer equality of the DylibSymbols here?

gkm updated this revision to Diff 351546.Jun 11 2021, 1:22 PM
gkm marked 21 inline comments as done.
  • revise according to review feedback
  • add some TODOs to lld/test/MachO/icf.s
gkm added inline comments.Jun 11 2021, 1:33 PM
lld/MachO/Driver.cpp
691

Are you proposing that -no_deduplicate be a strict synonym for -icf=none, or that it disable all forms of deduplication, including --deduplicate_literals ?

lld/MachO/ICF.cpp
107 ↗(On Diff #351045)

We already checked (ra.referent == rb.referent) above, so the isa<DylibSymbol>() block is dead code. Good catch!

230–231 ↗(On Diff #351045)

[ FYI, I also did this: s/icfInvalidId/icfUniqueID/ ]

The unique IDs and hashes are all replaced with equivalence-class IDs by the first run of segregate. Although the unique ID space does overlap the equivalence-class ID space, they never coexist. Even so, for sake of debugging years ago, it might have been nice to begin the icfUniqueID space at inputSections.size(). I'll do it anyway, even though it ought have no effect.

lld/MachO/InputSection.cpp
58–59

There were already 3 bools. I rearranged slightly so that now 3+1 bools are adjacent.

I still need to reject S_REGULAR sections that are not in __TEXT,__text. I think I made it a bit less awkward by passing the isText bool rather than textOutputSection.

lld/MachO/InputSection.h
59

I wonder if truncating hashes to 32 bits was a legacy choice from the time when people still ran LLVM on 32-bit hosts. There's also space-savings of 8 bytes per InputSection to consider. I will benchmark to see if there is any benefit for modern 64-bit hosts.

lld/MachO/UnwindInfoSection.cpp
164

These are always section relocs.

int3 accepted this revision.Jun 11 2021, 2:27 PM

Neat! I think we still need a test for N_ALT_ENTRY, but I'm fine with having that in a follow-up (maybe just leave a TODO in icf.s).

lld/MachO/Driver.cpp
691

Good question. ld64 does literal dedup regardless of that flag, but it also runs ICF by default unless -no_deduplicate is passed. If we want to be a drop-in replacement we should match that behavior. But having only *some* optimizations run by default seems bizarre. It would probably be better to disable all optimizations by default and print out a message when -no_deduplicate is passed saying that it's now redundant. I guess we can sort that out in a future diff...

lld/MachO/InputFiles.cpp
622–628

I think you missed this one :)

lld/test/MachO/icf.s
24–26

nit: can we align the RHS of these lines?

This revision is now accepted and ready to land.Jun 11 2021, 2:27 PM
gkm updated this revision to Diff 352487.Jun 16 2021, 11:27 AM
gkm marked 3 inline comments as done.
  • revise according to review feedback
  • add -no_deduplicate option as synonym for -icf none
  • handle isa<DylibSymbol> in equalsVariable()
  • add test for no fold when dylib symbol references differ
gkm updated this revision to Diff 352715.Jun 17 2021, 7:24 AM
  • fix clang-tidy complaint
This revision was landed with ongoing or failed builds.Jun 17 2021, 10:09 AM
Closed by commit rGf27e4548fc42: [lld-macho] Implement ICF (authored by gkm). · Explain Why
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Jun 18 2021, 9:19 AM
thakis added inline comments.
lld/MachO/Options.td
265

Is this an ld64 flag? If not: We put non-ld64-compat flags at the top and give them -- prefixes.

int3 added inline comments.Jun 21 2021, 10:21 AM
lld/MachO/ICF.cpp
109 ↗(On Diff #352777)

I guess we do need those extra tests :)

gkm marked an inline comment as done.Jun 23 2021, 9:58 AM
gkm added inline comments.
lld/MachO/ICF.cpp
109 ↗(On Diff #352777)

Yow! I am having trouble writing assembler to define absolute & relocatable symbols with the same value.

gkm marked an inline comment as done.Jun 23 2021, 8:30 PM