Page MenuHomePhabricator

[ELF] --icf: don't fold text sections with LSDA
ClosedPublic

Authored by MaskRay on Jul 26 2020, 5:03 PM.

Details

Summary

Fix PR36272 and PR46835

A .eh_frame FDE references a text section and (optionally) a LSDA (in
.gcc_except_table). Even if two text sections have identical content and
relocations (e.g. a() and b()), we cannot fold them if their LSDA are different.

void foo();
void a() {
  try { foo(); } catch (int) { }
}
void b() {
  try { foo(); } catch (float) { }
}

Scan .eh_frame pieces with LSDA and disallow referenced text sections to be
folded. If two .gcc_except_table have identical semantics (usually identical
content with PC-relative encoding), we will lose folding opportunity.
For ClickHouse (an exception-heavy application), this can reduce --icf=all efficiency
from 9% to 5%. There may be some percentage we can reclaim without affecting
correctness, if we analyze .eh_frame and .gcc_except_table sections.

gold 2.24 implemented a more complex fix (resolution to
https://sourceware.org/bugzilla/show_bug.cgi?id=21066) which combines the
checksum of .eh_frame CIE/FDE pieces.

Diff Detail

Event Timeline

MaskRay created this revision.Jul 26 2020, 5:03 PM
grimar added inline comments.Jul 27 2020, 3:56 AM
lld/ELF/EhFrame.cpp
215

I'd put this first as it is what you are looking for:

for (char c : aug) {
  if (c == 'L')
    return true;
...
}
lld/ELF/ICF.cpp
464

Did you perform a test of a some project that uses exceptions? I wonder if it is really that rare.

471

It is probably more natural for this to be llvm::DenseSet<size_t>?

E.g:

llvm::DenseSet<size_t> ciesWithLDSA;
...
    if (id == 0 && hasLSDA(piece)) {
      ciesWithLDSA.insert(offset);
      continue;
    }
    uint32_t cieOffset = offset + 4 - id;
    if (!ciesWithLDSA.contains(cieOffset))
      continue;
...
479

The Length can take either 4 or 12 bytes (extended length). Should this be handled?

490

It's a bit strange to see that eqClass[0] is set to xxHash64(s->data() right above,
but handleLSDA might override the value to 1,2...N?

Should the hash be combined into, e.g. like combineRelocHashes does?

lld/ELF/SyntheticSections.cpp
373–374

I think the comment should describe the Defined symbol returned.
A non-null pointer sounds like it can be any !nullptr.

MaskRay updated this revision to Diff 280935.Jul 27 2020, 8:46 AM
MaskRay marked 7 inline comments as done.

Address comments

lld/ELF/EhFrame.cpp
215

Emm. I tend to prefer the order where compilers usually generate the augmentation string.

lld/ELF/ICF.cpp
479

The extended length format .eh_frame is not supported by MC (lib/MC/MCDwarf.cpp:EmitFDE).

The previous function does not handle the extended length as well.

Personally I think it is rarer than DWARF64

"A 8 byte unsigned value indicating the length in bytes of the CIE structure, not
including the Length and Extended Length fields themselves" It is hard to get a single CIE structure larger than 4GiB. This is different from DWARF64 where the FDE field is an absolute offset instead of a PC-relative offset (so the total size of .debug_frame matters)

490

Not necessary. By assigning 1,2,...,N, we don't actually want such sections to be folded into others. The combine operation is redundant.

MaskRay updated this revision to Diff 280976.Jul 27 2020, 10:54 AM

Move iterateFDEWithLSDA to SyntheticSections.cpp

MaskRay updated this revision to Diff 280977.Jul 27 2020, 10:58 AM

Small adjustment

amosbird added a subscriber: amosbird.EditedJul 28 2020, 12:37 AM

I've compared three version of lld by linking clickhouse with -fexceptions and --icf=all. Here are the stripped binary size of the related outputs:

bytesbinary
259837744clickhouse<---- clang12 (patch)
249643904clickhouse<---- clang12 (no-patch)
249643824clickhouse<---- clang10
275773152clickhouse<---- clang12 ( --icf=none)
260764432clickhouse<---- gold 1.16
276295792clickhouse<---- gold 1.16 ( --icf=none)
grimar added a comment.EditedJul 28 2020, 12:45 AM

I've compared three version of lld by linking clickhouse with -fexceptions and --icf=all. Here are the stripped binary size of the related outputs:

bytesbinary
259837744clickhouse<---- clang12 (patch)
249643904clickhouse<---- clang12 (no-patch)

259837744 / 249643904 = 1,04x, i.e +4%

MaskRay updated this revision to Diff 281464.EditedJul 28 2020, 9:26 PM
MaskRay edited the summary of this revision. (Show Details)

Remove the sentence "they are rare"

Note that clang uses a monolithic .gcc_except_table (this actually has a problem with RISC-V -mrelax), two functions with LSDA will likely reference different offsets of .gcc_except_table . We can't ICF them unless we parse .gcc_except_table (which is not great).

ClickHouse is an exception-heavy project. As you can see, the ICF efficiency dropped from 9% to 5%. However, this cost is probably something we have to take. Currently we can't guarantee safety. Many projects don't use much exception. They would not be affected much.

grimar added inline comments.Aug 4 2020, 1:01 AM
lld/ELF/ICF.cpp
490

It is just strange to see that we drop the xxHash64 work that was done, it looks excessive.

Right above we have the following code:

auto *s = cast<InputSection>(sec);
if (isEligible(s))
  sections.push_back(s);`

I've not checked, but have you considered something like not putting those sections into sections list at all?
I.e. make them non-eligible for ICF from the begining?

MaskRay updated this revision to Diff 283053.Aug 4 2020, 4:06 PM

Simplify

MaskRay marked 2 inline comments as done.Aug 4 2020, 4:06 PM
MaskRay added inline comments.
lld/ELF/ICF.cpp
490

Good idea. Simplified.

grimar accepted this revision.Aug 5 2020, 1:10 AM

LGTM I think. Would be very nice to see another opinion though.

This revision is now accepted and ready to land.Aug 5 2020, 1:10 AM
psmith added a comment.Aug 5 2020, 1:17 AM

I agree I think it is best to err on the side of correctness. As I understand it we could do better (merge more sections) but this would involve pulling apart .eh_frame sections to show equivalence. I think that could come later.

MaskRay edited the summary of this revision. (Show Details)Aug 5 2020, 9:16 AM
This revision was automatically updated to reflect the committed changes.
MaskRay marked an inline comment as done.

This probably breaks the build on systems where uint64_t and size_t are different types, such as apparently on macOS:

http://45.33.8.238/mac/18492/step_4.txt

I'm only seeing this on some of my Macs though, so maybe it's dependent on sdk or host clang version. If the error makes sense to you, do something about it, else I'll try installing a newer host clang on that bot later.

This probably breaks the build on systems where uint64_t and size_t are different types, such as apparently on macOS:

http://45.33.8.238/mac/18492/step_4.txt

I'm only seeing this on some of my Macs though, so maybe it's dependent on sdk or host clang version. If the error makes sense to you, do something about it, else I'll try installing a newer host clang on that bot later.

Already fixed by 279e4cf78262dfef560631d643b85aa2032ac566

This probably breaks the build on systems where uint64_t and size_t are different types, such as apparently on macOS:

http://45.33.8.238/mac/18492/step_4.txt

I'm only seeing this on some of my Macs though, so maybe it's dependent on sdk or host clang version. If the error makes sense to you, do something about it, else I'll try installing a newer host clang on that bot later.

Already fixed by 279e4cf78262dfef560631d643b85aa2032ac566

I will make a follow-up patch changing these size_t to uint64_t. They are so error-prone.