This is an archive of the discontinued LLVM Phabricator instance.

[InstrProfiling] Use llvm.compiler.used instead of llvm.used for ELF
ClosedPublic

Authored by MaskRay on Feb 26 2021, 2:58 PM.

Details

Summary

Many optimizers (e.g. GlobalOpt/ConstantMerge) do not respect linker semantics
for comdat and may not discard the sections as a unit.

The interconnected __llvm_prf_{cnts,data} sections (in comdat for ELF)
are similar to D97432: __profd_ is not directly referenced, so
__profd_ may be discarded while __profc_ is retained, breaking the
interconnection. We currently conservatively add all such sections to
llvm.used and let the linker do GC for ELF.

In D97448, we will change GlobalObject's in the llvm.used list to use SHF_GNU_RETAIN,
causing the metadata sections to be unnecessarily retained (some check-profile tests check for GC).
Use llvm.compiler.used to retain the current GC behavior.

Diff Detail

Event Timeline

MaskRay created this revision.Feb 26 2021, 2:58 PM
MaskRay requested review of this revision.Feb 26 2021, 2:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 26 2021, 2:58 PM

So to summarize, this patch is to enable compiler passes to respect comdat semantics while still enabling linker to do GC (after using GNU_RETAIN)?

So to summarize, this patch is to enable compiler passes to respect comdat semantics while still enabling linker to do GC (after using GNU_RETAIN)?

The patch makes the ELF behavior the same (the metadata sections can be GCed) before or after D97448.

The comments and the paragraph in this patch are mainly to summarize the current situation (why we use llvm.compiler.used/llvm.used) :)

davidxl accepted this revision.Feb 26 2021, 4:01 PM

lgtm for ELF. Anything to do for COFF?

This revision is now accepted and ready to land.Feb 26 2021, 4:01 PM
MaskRay added a comment.EditedFeb 26 2021, 4:11 PM

lgtm for ELF. Anything to do for COFF?

I think the existing behavior may be a bit problematic for COFF, so let @rnk know and add some tests demonstrating the behavior:)

Perhaps COFF should switch to comdat and llvm.compiler.used as well. Without comdat, because currently llvm.used does not retain sections defining PrivateLinkage/InternalLinkage symbols, the GC result may be weird.

This revision was landed with ongoing or failed builds.Feb 26 2021, 4:14 PM
This revision was automatically updated to reflect the committed changes.
rnk added a comment.Mar 1 2021, 11:38 AM

Wait, are you telling me that __profd/__profc data can participate in linker GC, and that is not a problem? I have actually had to optimize the processing of /INCLUDE: directives for COFF (D78845), because I assumed that it was critical that __profd be treated as a GC root. But, on ELF, they were never GC roots, and coverage/PGO has worked just fine there. So, really, all along, llvm.used was only being used to block LLVM IPO transforms like globalopt, not to establish a GC root. And, as you point out, we don't get the GC root behavior for internal __profd/__profc globals, and coverage/PGO seems to work fine.

What about MachO? Should all platforms not treat __profd as a GC root?

rnk added a comment.Mar 1 2021, 11:41 AM

Switching to compiler.used will save ~8% of PGO object file size, although these numbers are stale: https://crbug.com/1058040#c14 I filed an issue to prototype an alternative GC root object file format similar to .llvm_addrsig in an effort to capture those object file savings, and you are telling me that none of that was ever necessary!

MaskRay added a comment.EditedMar 1 2021, 11:53 AM
In D97585#2595131, @rnk wrote:

Wait, are you telling me that __profd/__profc data can participate in linker GC, and that is not a problem?

Yes on ld.lld, since D96636 . The GC is more useful with D96757, which moved metadata sections which were not within a comdat any into a comdat noduplicates group.

I have actually had to optimize the processing of /INCLUDE: directives for COFF (D78845), because I assumed that it was critical that __profd be treated as a GC root.

Yes. I think getting rid of the local linkage limitation is important for COFF GC.

But, on ELF, they were never GC roots, and coverage/PGO has worked just fine there.

Yes. For some sections (__llvm_prf_nm __llvm_prf_vnds), they work just because linkers have such a rule:

__start_/__stop_ references from a live input section retains all non-SHF_LINK_ORDER non-SHF_GROUP C identifier name sections.

-z start-stop-gc can drop the rule, so we need D97649 for robustness.

That said, I am still not very confident with GCness for the compressed __llvm_prf_nm. I think for PGO the section is optional, so whether we retain or discard it probably does not matter much. For coverage mapping it might be a problem.
But this patch and D97649 do not change the status quo.

So, really, all along, llvm.used was only being used to block LLVM IPO transforms like globalopt, not to establish a GC root. And, as you point out, we don't get the GC root behavior for internal __profd/__profc globals, and coverage/PGO seems to work fine.

What about MachO? Should all platforms not treat __profd as a GC root?

Since MachO does not have comdat. I think it has to be conservative and treats every metadata section not directly or indirectly referenced by code as GC roots (llvm.used).

All binary formats with ELF section group compatible semantics ("if section A in a section group is retained, all members in the group are retained as well") should be able to use llvm.compiler.used.
I believe COFF can do it, but as you said, /INCLUDE: needs fixes to support local linkage GlobalObjects.

Correct me if I'm wrong, but I assumed we could use the same approach that
was suggested in https://bugs.llvm.org/show_bug.cgi?id=49155 for
__llvm_prf_nm.

That is, instead of emitting a single per-TU symbol with all the names, we
would emit one symbol for each name, all having the same section name. When
lowering, all those symbols get combined into a SHF_MERGE & SHF_STRINGS
section. We could then put each symbol into a group with the other
per-function symbols.

I tried this with coverage mapping locally and it's working well. The
problem is with COFF and Mach-O since they AFAIK don't have an equivalent
of SHF_COMPRESSED so we would lose the benefit of compression there.

We could continue using the current scheme for COFF and Mach-O and only use
the proposed approach (that is string merging with compression handled
transparently by the backend & the linker) for ELF, but it'd mean more
divergence between the formats which may be undesirable.