Page MenuHomePhabricator

[InstrProfiling] Use !associated metadata for counters, data and values
AcceptedPublic

Authored by phosek on Mar 25 2020, 2:16 PM.

Details

Summary

C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.

The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.

Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

this broke libprofile tests for i386: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/27806 , also https://ci.chromium.org/p/chromium/builders/try/linux_upload_clang/966? (which has more failures, but look like the same cause)

Can you take a look, and revert if it takes a while to fix?

bsaleil added a subscriber: bsaleil.EditedJun 9 2020, 3:11 PM

Hello,
This patch also broke some of our PowerPC buildbots (e.g. http://lab.llvm.org:8011/builders/clang-ppc64le-rhel/builds/4263).
Can you please push a fix or revert ?
Thanks.

aeubanks added a subscriber: aeubanks.EditedJun 9 2020, 8:09 PM

This is still breaking 37 tests, even after the first attempted fix forward...
ninja check-profile

This revision is now accepted and ready to land.Jun 10 2020, 2:35 AM

I apologize about the late response, I didn't get any email from Phabricator for some reason :/

I'll try to reproduce and address the issue on i386, but I don't have access to any PowerPC machine, @bsaleil is there any way to get access to get access to your PowerPC bots?

I looked into these issue:

phosek updated this revision to Diff 270031.Jun 10 2020, 9:56 PM

I've updated the patch to handle the _IO_stdin_used symbol in the test. I couldn't reproduce this locally, but it's possible that this is due to different libc version on the bot.

I've updated the patch to handle the _IO_stdin_used symbol in the test. I couldn't reproduce this locally, but it's possible that this is due to different libc version on the bot.

I still can reproduce with the latest patch on my workstation

Exit Code: 1

Command Output (stderr):
--
/usr/bin/ld: __llvm_prf_data has both ordered and unordered sections
/usr/bin/ld: final link failed: bad value
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)

I've updated the patch to handle the _IO_stdin_used symbol in the test. I couldn't reproduce this locally, but it's possible that this is due to different libc version on the bot.

I still can reproduce with the latest patch on my workstation

Exit Code: 1

Command Output (stderr):
--
/usr/bin/ld: __llvm_prf_data has both ordered and unordered sections
/usr/bin/ld: final link failed: bad value
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)

This is the other issue I mentioned which also affects Chromium bots. The root cause is the lack of support for the extended semantics of SHF_LINK_ORDER that !associated metadata rely on in bfd.ld. I'm trying to figure out what's the best way forward. We could just disable all affected profile tests when bfd.ld is used as the linker, but that's pretty drastic. Another option would be to gate this feature on a backend flag; targets that use gold or lld as their linker could turn this on by default.

phosek updated this revision to Diff 270263.Jun 11 2020, 4:42 PM

I still can reproduce with the latest patch on my workstation

Exit Code: 1

Command Output (stderr):
--
/usr/bin/ld: __llvm_prf_data has both ordered and unordered sections
/usr/bin/ld: final link failed: bad value
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)

This is the other issue I mentioned which also affects Chromium bots. The root cause is the lack of support for the extended semantics of SHF_LINK_ORDER that !associated metadata rely on in bfd.ld. I'm trying to figure out what's the best way forward. We could just disable all affected profile tests when bfd.ld is used as the linker, but that's pretty drastic. Another option would be to gate this feature on a backend flag; targets that use gold or lld as their linker could turn this on by default.

I've found a workaround for the bfd.ld issue. The problem is that __llvm_prf_data input sections have sh_addralign=8 but size is 4mod8. bfd.ld is considering the alignment padding insertions to be "unordered sections" in the input. Inserting artificial padding in the __llvm_prf_data input section seems to work around the issue. @vitalybuka can you test this locally on your machine please?

Most failures are fixed, thanks! But I still get one failure with this patch:
LLVM Profile Warning: Unable to merge profile data: source profile file is not compatible.
LLVM Profile Error: Profile Merging of file /usr/local/google/home/aeubanks/repos/llvm-project3/build_cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.pgo.profraw/default_15822682845097088506_0.profraw failed: File exists
LLVM Profile Error: Failed to write file "/usr/local/google/home/aeubanks/repos/llvm-project3/build_cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.pgo.profraw/default_15822682845097088506_0.profraw": File exists
warning: /usr/local/google/home/aeubanks/repos/llvm-project3/build_cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.pgo.profraw/default_15822682845097088506_0.profraw: Unsupported instrumentation profile format version
error: No profiles could be merged.

@davidxl Do you have any preference between the current solution which uses the artificial padding in __llvm_prf_data section to workaround the BFD ld bug, and a backend feature flag that could be set to enable this feature?

@davidxl Do you have any preference between the current solution which uses the artificial padding in __llvm_prf_data section to workaround the BFD ld bug, and a backend feature flag that could be set to enable this feature?

The latter might be preferable because there's another bug in trunk BFD ld which causes to enter infinite loop, see D72904 for more details.

@phosek, we just updated the clang-ppc64le-rhel bot to lld 10.0.0, is that recent enough for SHF_LINK_ORDER support you need ?

phosek updated this revision to Diff 283655.Aug 6 2020, 10:43 AM

@davidxl does this look good to you? This feature is now gated on -counter-associated-metadata flag (I'd welcome suggestions for better name if you have some) because we require recent lld. We might be able to set that flag in the frontend when we know that the latest lld is being used in the future.

phosek updated this revision to Diff 283711.Aug 6 2020, 12:58 PM
pcc added a subscriber: pcc.Aug 6 2020, 1:05 PM

@davidxl does this look good to you? This feature is now gated on -counter-associated-metadata flag (I'd welcome suggestions for better name if you have some) because we require recent lld. We might be able to set that flag in the frontend when we know that the latest lld is being used in the future.

Can you find a better way of communicating this infomation than via an llvm::cl flag (e.g. use a function attribute)? The flag won't be compatible with LTO.

phosek added a comment.Aug 6 2020, 2:29 PM
In D76802#2200926, @pcc wrote:

@davidxl does this look good to you? This feature is now gated on -counter-associated-metadata flag (I'd welcome suggestions for better name if you have some) because we require recent lld. We might be able to set that flag in the frontend when we know that the latest lld is being used in the future.

Can you find a better way of communicating this infomation than via an llvm::cl flag (e.g. use a function attribute)? The flag won't be compatible with LTO.

Do I understand your suggestion correctly that we would set function attribute when -fprofile-*-generate -fuse-lld is used that would enable the use of metadata in the backend?

pcc added a comment.Aug 6 2020, 2:37 PM
In D76802#2200926, @pcc wrote:

@davidxl does this look good to you? This feature is now gated on -counter-associated-metadata flag (I'd welcome suggestions for better name if you have some) because we require recent lld. We might be able to set that flag in the frontend when we know that the latest lld is being used in the future.

Can you find a better way of communicating this infomation than via an llvm::cl flag (e.g. use a function attribute)? The flag won't be compatible with LTO.

Do I understand your suggestion correctly that we would set function attribute when -fprofile-*-generate -fuse-lld is used that would enable the use of metadata in the backend?

Yes, that's what I had in mind.

pirama added a subscriber: pirama.Jan 8 2021, 1:56 PM
In D76802#2201170, @pcc wrote:
In D76802#2200926, @pcc wrote:

@davidxl does this look good to you? This feature is now gated on -counter-associated-metadata flag (I'd welcome suggestions for better name if you have some) because we require recent lld. We might be able to set that flag in the frontend when we know that the latest lld is being used in the future.

Can you find a better way of communicating this infomation than via an llvm::cl flag (e.g. use a function attribute)? The flag won't be compatible with LTO.

Do I understand your suggestion correctly that we would set function attribute when -fprofile-*-generate -fuse-lld is used that would enable the use of metadata in the backend?

Yes, that's what I had in mind.

I was thinking that maybe a better approach would be to provide a new flag akin to -fbinutils-version introduced D85474, for example -flld-version. We would then emit the metadata unconditionally. In the backend, we would then check if LLD version is greater than 12 or if binutils version is greater than 2.36, and if not we would avoid using SHF_LINK_ORDER in this case effectively ignoring the metadata. During LTO, LLD would pass its own version to the backend via TargetOptions. WDYT?

In D76802#2201170, @pcc wrote:
In D76802#2200926, @pcc wrote:

@davidxl does this look good to you? This feature is now gated on -counter-associated-metadata flag (I'd welcome suggestions for better name if you have some) because we require recent lld. We might be able to set that flag in the frontend when we know that the latest lld is being used in the future.

Can you find a better way of communicating this infomation than via an llvm::cl flag (e.g. use a function attribute)? The flag won't be compatible with LTO.

Do I understand your suggestion correctly that we would set function attribute when -fprofile-*-generate -fuse-lld is used that would enable the use of metadata in the backend?

Yes, that's what I had in mind.

I was thinking that maybe a better approach would be to provide a new flag akin to -fbinutils-version introduced D85474, for example -flld-version. We would then emit the metadata unconditionally. In the backend, we would then check if LLD version is greater than 12 or if binutils version is greater than 2.36, and if not we would avoid using SHF_LINK_ORDER in this case effectively ignoring the metadata. During LTO, LLD would pass its own version to the backend via TargetOptions. WDYT?

Alternatively, we could assume that LLD version always matches LLVM and avoid -flld-version, but we'll need some way to communicate to the backend that LLD is being used as the linker. @MaskRay do you have any thoughts on this?

MaskRay added a comment.EditedJan 22 2021, 12:24 PM

Alternatively, we could assume that LLD version always matches LLVM and avoid -flld-version, but we'll need some way to communicate to the backend that LLD is being used as the linker. @MaskRay do you have any thoughts on this?

I am happy with the direction here (doing something to decrease binary sizes and help garbage collection).

I think this is probably fine. ELF is stable and has less evolution on binary formats. Some case studies:

  • SHF_LINK_ORDER: I think when Clang started to emit this for sanitizers (!associated), Clang required nearly bleeding-edge LLD
  • R_X86_64_REX_GOTPCRELX: GCC with ~2015 GNU as will emit R_X86_64_REX_GOTPCRELX by default. LLD had supported it for a few years before I switched Clang default. (There was still few things I did not catch beforehand: (LLD lacked relocation overflow check) + (in an edge-case clang code generation, LLD could wrongly relax the relocation))

I had read through all the previous discussions half a year ago but my memory might be outdated, so apologies in advance if I missed context...

Newer LLD SHF_LINK_ORDER support had landed ~August 2020 which should be in LLD 11.0.0 (some may be tin LLD 10.0.0, I don't check).
If you switch the binary format feature used now, the change will go into Clang 12.0.0, i.e. Clang 12 + LLD 11 should be good.
Clang 12 + LLD 10 is unsupported. This looks quite good to me.
(Note: for PGO+LTO users they need to ensure that LLD is newer than Clang. Newer Clang may emit bitcode files not recognizable by older LLD. The recent event happened just one or two month ago when a new loop related metadata was added.)

One problem is whether we want to make -fuse-ld=lld (currently a pure linker stage option) affect code generation. If I remember it correctly @pcc had concerns with it but from looking at the previous comment seems that he is fine? :)

I remember I have changed some comdat thing and would like to verify the effect of this patch myself.

Alternatively, we could assume that LLD version always matches LLVM and avoid -flld-version, but we'll need some way to communicate to the backend that LLD is being used as the linker. @MaskRay do you have any thoughts on this?

I am happy with the direction here (doing something to decrease binary sizes and help garbage collection).

I think this is probably fine. ELF is stable and has less evolution on binary formats. Some case studies:

  • SHF_LINK_ORDER: I think when Clang started to emit this for sanitizers (!associated), Clang required nearly bleeding-edge LLD
  • R_X86_64_REX_GOTPCRELX: GCC with ~2015 GNU as will emit R_X86_64_REX_GOTPCRELX by default. LLD had supported it for a few years before I switched Clang default. (There was still few things I did not catch beforehand: (LLD lacked relocation overflow check) + (in an edge-case clang code generation, LLD could wrongly relax the relocation))

I had read through all the previous discussions half a year ago but my memory might be outdated, so apologies in advance if I missed context...

Newer LLD SHF_LINK_ORDER support had landed ~August 2020 which should be in LLD 11.0.0 (some may be tin LLD 10.0.0, I don't check).
If you switch the binary format feature used now, the change will go into Clang 12.0.0, i.e. Clang 12 + LLD 11 should be good.
Clang 12 + LLD 10 is unsupported. This looks quite good to me.
(Note: for PGO+LTO users they need to ensure that LLD is newer than Clang. Newer Clang may emit bitcode files not recognizable by older LLD. The recent event happened just one or two month ago when a new loop related metadata was added.)

One problem is whether we want to make -fuse-ld=lld (currently a pure linker stage option) affect code generation. If I remember it correctly @pcc had concerns with it but from looking at the previous comment seems that he is fine? :)

The problem I can see is that -fuse-ld=lld is usually only passed in linker flags (see for example https://source.chromium.org/chromium/chromium/src/+/master:build/config/compiler/BUILD.gn;l=314), but unless LTO is being used, we'd need to know which linker is going to be used during compilation, that is we'd need it to be passed as a compiler flag.

Since binutils 2.36 was released yesterday, we could also just consider emitting SHF_LINKER_ORDER unconditionally which would avoid this issue altogether. The only concern there is if someone tried to use new Clang with older binutils, but it sounds like with sanitizers we've ignored that concern it wasn't a problem, so maybe it won't be a problem here either?

I remember I have changed some comdat thing and would like to verify the effect of this patch myself.

phosek added a subscriber: gulfem.Jan 25 2021, 10:43 AM

The problem I can see is that -fuse-ld=lld is usually only passed in linker flags (see for example https://source.chromium.org/chromium/chromium/src/+/master:build/config/compiler/BUILD.gn;l=314), but unless LTO is being used, we'd need to know which linker is going to be used during compilation, that is we'd need it to be passed as a compiler flag.

Even for LTO, -fuse-ld=lld should have minimum behavior difference when using LLVMgold.so.

Since binutils 2.36 was released yesterday, we could also just consider emitting SHF_LINKER_ORDER unconditionally which would avoid this issue altogether. The only concern there is if someone tried to use new Clang with older binutils, but it sounds like with sanitizers we've ignored that concern it wasn't a problem, so maybe it won't be a problem here either?

It might cause friction to PGO+non-LLD users... I think it might be premature for LLVM 13.0.0. 14.0.0 may be fine.

You may add tests to linkage.ll (see D84723).

With the current path, opt -counter-associated-metadata=1 < linkage.ll -mtriple=x86_64-unknown-linux -instrprof -S | llc -filetype=obj crashes due to an empty metadata node.

phosek updated this revision to Diff 319215.Jan 25 2021, 11:06 PM
phosek updated this revision to Diff 319219.Jan 25 2021, 11:24 PM

It might cause friction to PGO+non-LLD users... I think it might be premature for LLVM 13.0.0. 14.0.0 may be fine.

Do you think we should go with the current solution then and change the default to enable -counter-associated-metadata in LLVM 14.0.0?

You may add tests to linkage.ll (see D84723).

With the current path, opt -counter-associated-metadata=1 < linkage.ll -mtriple=x86_64-unknown-linux -instrprof -S | llc -filetype=obj crashes due to an empty metadata node.

I forgot to upload the latest patch which uses sh_link=self instead of sh_link=0, it should be working now.

MaskRay added a comment.EditedJan 26 2021, 4:08 PM
  • __llvm_prf_cnts does not need to have a self link SHF_LINK_ORDER. __llvm_prf_data linking to __llvm_prf_cnts suffices. This is on the premise that only __llvm_prf_data may reference __llvm_prf_cnts
  • Please update InstrProfiling/icall.ll to test __profvp_ (-vp-static-alloc=true). __llvm_prf_vals should link to __llvm_prf_cnts.
  • From the current organization of tests, I think linkage.ll is a better place than a separate associated.ll. The tests there are more comprehensive (needs-comdat and non-needs-comdat cases). You can add the new associated mode to linkage.ll.

-fbinutils-version is now available. You can condition the associated feature on 2.36 now.

phosek updated this revision to Diff 319493.Jan 27 2021, 1:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2021, 1:16 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
  • __llvm_prf_cnts does not need to have a self link SHF_LINK_ORDER. __llvm_prf_data linking to __llvm_prf_cnts suffices. This is on the premise that only __llvm_prf_data may reference __llvm_prf_cnts

Discussed with @MaskRay over chat, this is necessary because __llvm_prf_cnts is a C identifier.

  • Please update InstrProfiling/icall.ll to test __profvp_ (-vp-static-alloc=true). __llvm_prf_vals should link to __llvm_prf_cnts.

Done.

  • From the current organization of tests, I think linkage.ll is a better place than a separate associated.ll. The tests there are more comprehensive (needs-comdat and non-needs-comdat cases). You can add the new associated mode to linkage.ll.

Done.

-fbinutils-version is now available. You can condition the associated feature on 2.36 now.

Done.

@MaskRay does this look good to you?

Looks quite good.

@MaskRay does this look good to you?

Looks like GNU ld has an infinite loop problem w.r.t self-link SHF_LINK_ORDER: https://sourceware.org/bugzilla/show_bug.cgi?id=27259

The reason is that the C identifier sections are considered as GC roots. Adding SHF_LINK_ORDER is a trick to defeat the GC root semantics.

Hope GNU ld can implement the rule. For now, only -fbinutils-version=none can use this.

About counter-associated-metadata: perhaps we should just call it linkorder-counter. !associated while being metadata, is actually similar to , comdat (it cannot be freely dropped) and its semantics is tightly coupled with SHF_LINK_ORDER. linkorder conveys more information than associated (associated can mean many things).

Please also comment (and update the description) that self-link may be used. This is subtle. The reason is that the ELF section names (e.g. __llvm_prf_cnts) are C identifiers and considered GC roots in the absence of the SHF_LINK_ORDER flag.

phosek updated this revision to Diff 319762.Jan 27 2021, 9:45 PM
phosek added a reviewer: MaskRay.

Looks quite good.

@MaskRay does this look good to you?

Looks like GNU ld has an infinite loop problem w.r.t self-link SHF_LINK_ORDER: https://sourceware.org/bugzilla/show_bug.cgi?id=27259

Thanks for checking this.

The reason is that the C identifier sections are considered as GC roots. Adding SHF_LINK_ORDER is a trick to defeat the GC root semantics.

Hope GNU ld can implement the rule. For now, only -fbinutils-version=none can use this.

I've updated the logic to reflect that.

About counter-associated-metadata: perhaps we should just call it linkorder-counter. !associated while being metadata, is actually similar to , comdat (it cannot be freely dropped) and its semantics is tightly coupled with SHF_LINK_ORDER. linkorder conveys more information than associated (associated can mean many things).

Done.

Please also comment (and update the description) that self-link may be used. This is subtle. The reason is that the ELF section names (e.g. __llvm_prf_cnts) are C identifiers and considered GC roots in the absence of the SHF_LINK_ORDER flag.

Done.

phosek updated this revision to Diff 320398.Jan 31 2021, 10:48 PM

@MaskRay I saw on https://sourceware.org/bugzilla/show_bug.cgi?id=27259 that the infinite loop fix has been cherry picked into 2.36 branch so I updated the condition to -fbinutils-version=2.36. Is this OK to land?

The !associated metadata may be attached to a global object declaration with a single argument that references another global object. This metadata prevents discarding of the global object in linker GC unless the referenced object is also discarded.

"prevents discarding" is not accurate in the context. __llvm_prf_* cannot be discarded because C identifier name input sections are GC roots.

Add "C identifier name input sections are GC roots. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics." in appropriate place.

Furthermore, when a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. This is not possible today because there's metadata to guide the linker. This approach is also used by other instrumentations like sanitizers.

This C identifier name GC currently only works with LLD but does no harm to gold/GNU ld.

Do you think __llvm_prf_* can link to the text section so that the self-link trick is not needed? (https://sourceware.org/bugzilla/show_bug.cgi?id=27259)

phosek updated this revision to Diff 320562.Feb 1 2021, 12:25 PM
phosek edited the summary of this revision. (Show Details)

"prevents discarding" is not accurate in the context. __llvm_prf_* cannot be discarded because C identifier name input sections are GC roots.

Add "C identifier name input sections are GC roots. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics." in appropriate place.

Done, I have revised the change description.

This C identifier name GC currently only works with LLD but does no harm to gold/GNU ld.

Do you think __llvm_prf_* can link to the text section so that the self-link trick is not needed? (https://sourceware.org/bugzilla/show_bug.cgi?id=27259)

I associated the counter with the function symbol which seems to be working and we no longer need the self-link trick.

MaskRay accepted this revision.Feb 1 2021, 12:34 PM

Thanks!

shayba added a subscriber: shayba.Feb 1 2021, 2:45 PM
MaskRay added inline comments.
compiler-rt/test/profile/instrprof-gc-sections.c
31

This fails on my machine (cc @akuegel). If I use diff -u:

--- /tmp/RelA/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.nocode.syms   2021-02-01 22:14:22.850222346 -0800
+++ /tmp/RelA/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.code.syms     2021-02-01 22:14:22.854222358 -0800
@@ -3,6 +3,5 @@
 __prof_cnts_sect_data
 __prof_data_sect_data
 __prof_nms_sect_data
-__prof_orderfile_sect_data
 __prof_vnodes_sect_data
 lprofDirMode

At the worst we can temporarily disable it (if others report issues as well).

phosek added inline comments.Feb 1 2021, 10:52 PM
compiler-rt/test/profile/instrprof-gc-sections.c
31

What's your build setup? I ran check-profile on my Linux workstation just now but haven't seen this issue.

TWeaver added a subscriber: TWeaver.Feb 2 2021, 5:23 AM

Hiya,

this patch (probably) broke the build bot at:

http://lab.llvm.org:8011/#/builders/53/builds/1184

phosek reopened this revision.Feb 2 2021, 12:21 PM
This revision is now accepted and ready to land.Feb 2 2021, 12:21 PM
phosek updated this revision to Diff 321015.Feb 2 2021, 11:19 PM
This revision was landed with ongoing or failed builds.Feb 2 2021, 11:20 PM
This revision was automatically updated to reflect the committed changes.
aeubanks added inline comments.Feb 3 2021, 4:12 PM
compiler-rt/test/profile/instrprof-gc-sections.c
31

This also fails on my machine:
$ cmake -GNinja -DLLVM_ENABLE_PROJECTS='clang;compiler-rt;lld' -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_LLD=ON ..
$ ninja check-profile

phosek reopened this revision.Feb 8 2021, 10:04 AM
This revision is now accepted and ready to land.Feb 8 2021, 10:04 AM