This is an archive of the discontinued LLVM Phabricator instance.

[InstrProfiling] Use ELF section groups for counters, data and values
ClosedPublic

Authored by phosek on Feb 16 2021, 12:41 AM.

Details

Summary

start_/stop_ references retain C identifier name sections such as
__llvm_prf_*. Putting these into a section group disables this logic.

The ELF section group semantics ensures that group members are retained
or discarded as a unit. When a function symbol is discarded, this allows
allows linker to discard counters, data and values associated with that
function symbol as well.

Note that noduplicates COMDAT is lowered to zero-flag section group in
ELF. We only set this for functions that aren't already in a COMDAT and
for those that don't have available_externally linkage since we already
use regular COMDAT groups for those.

Diff Detail

Event Timeline

phosek created this revision.Feb 16 2021, 12:41 AM
phosek requested review of this revision.Feb 16 2021, 12:41 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 16 2021, 12:41 AM
Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript

This is an alternative to D76802 which uses zero-flag ELF section groups instead of SHF_LINK_ORDER. While testing D76802, we keep running into issues in real-world code, but this variant seems to be working.

davidxl added inline comments.Feb 16 2021, 11:36 AM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
832–833

maybe this:

bool UseSectGroup = (NeedComdat || TT.isOSBinFormatELF());

if (UseSectGroup) {

auto GroupName = ..
Comdat *C = M->getOrInsertComdat(GroupName);
 if (!NeedComdat)
    C->setSelectionKind(Comdat::NoDuplicates);
 GV->setComdat(C);

}

phosek updated this revision to Diff 324122.Feb 16 2021, 3:16 PM
phosek marked an inline comment as done.

Can this be handled by other linkers (gold, bfd)?

Can this be handled by other linkers (gold, bfd)?

They handle this without problems (I tested it locally) which is another advantage over D76802. However, they aren't yet able to GC those sections so this change is currently a no-op for those linkers. They would need to make a change similar to D96753. There's currently an ongoing discussion about that.

davidxl accepted this revision.Feb 16 2021, 4:37 PM

Looks good to me. Wait for maskray's comments.

This revision is now accepted and ready to land.Feb 16 2021, 4:37 PM

Does this mean that for non-COMDAT __llvm_prf_* sections, we now need one section header (.group) for -ffunction-sections use cases? sizeof(Elf64_Shdr)=64 the size overhead can be quite significant.

This is an alternative to D76802 which uses zero-flag ELF section groups instead of SHF_LINK_ORDER. While testing D76802, we keep running into issues in real-world code, but this variant seems to be working.

Would be good to know the root cause. SHF_LINK_ORDER is the perfect alternative to COFF comdat nodeduplicates here as both solutions don't have space overhead.

Good point about the section header. The objective of the patch is to reduce binary image size (loadable sections) though the object size may increase. It would be good to know the object size overhead because some of our large binaries may get hit during link time (of instr binary).

Would be good to know the root cause. SHF_LINK_ORDER is the perfect alternative to COFF comdat nodeduplicates here as both solutions don't have space overhead.

The current issue we're stuck on is in the build of zstd (as discussed privately):

ld.lld: error: host_arm64-profile/obj/third_party/zstd/libzstd.a(libzstd.zstd_decompress.c.o):(__llvm_prf_cnts): sh_link points to discarded section host_arm64-profile/obj/third_party/zstd/libzstd.a(libzstd.zstd_decompress.c.o):(.text.ZSTD_getDDict)

Good point about the section header. The objective of the patch is to reduce binary image size (loadable sections) though the object size may increase. It would be good to know the object size overhead because some of our large binaries may get hit during link time (of instr binary).

I'll do a full build of Fuchsia with this change to measure the impact.

I tested this change in the Fuchsia build. The total size of all .o files without this change is 31665652 (30.2GB) and with this change it increases to 32287840 (30.8GB) so the increase is about 1.9%. In terms of impact on the final system image (by enabling --gc-sections on instrumented binaries), the size of boot image decreases from 116M to 63M so roughly 46% reduction, and the size of system image decreases from 5.8GB to 4.2GB which is roughly 28% reduction.

Nice reduction. Is the total .o size quoted from instrumentation build (with --gc-sections)?

Nice reduction. Is the total .o size quoted from instrumentation build (with --gc-sections)?

In both cases the .o files are built with -fprofile-instr-generate -ffunction-sections -fdata-sections and --gc-sections for linking.

MaskRay accepted this revision.Feb 18 2021, 12:18 AM

I have checked that the .o size increase should not be a concern for my internal users. So, LG, but the description needs some clarification.

C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded.

Sorry, my previous wording is inaccurate. It should be: __start_/__stop_ references retain C identifier name sections.
This is different from being GC roots.

The ELF section group semantics ensures that group members are retained or discarded as a unit. When a function symbol is discarded, this allows allows linker to discard counters, data and values associated with that function symbol as well.

I don't find incoming or outgoing relocations with __llvm_prf_names, but it appears to be a parallel table to __llvm_prf_cnts/__llvm_prf_data.
The idea is that __llvm_prf_cnts can be properly handled with the traditional GC semantics (text sections keep it live).
We use a section group to let __llvm_prf_cnts retain the associated __llvm_prf_names (and probably also __llvm_prf_vals).
Hope @davidxl can elaborate on the usage of these sections to make this paragraph accurate.

llvm_prf_names are cross function for the purpose of compression. This section is not loadable, so it is does not really matter that much.

phosek added a subscriber: rnk.Feb 18 2021, 10:47 AM

@rnk based on your comment in https://groups.google.com/g/llvm-dev/c/4Bxvjt_rcCg/m/9TsRuI4FAgAJ, I think this should work for COFF as well, is that correct?

phosek updated this revision to Diff 325338.Feb 21 2021, 2:03 PM
phosek edited the summary of this revision. (Show Details)

I plan on improving how __llvm_prf_names is constructed using the approach suggested in PR49155 in a follow up change.

This revision was landed with ongoing or failed builds.Feb 21 2021, 4:13 PM
This revision was automatically updated to reflect the committed changes.

Hi Petr,

This commit broke AArch64 LLD bot, logs are available here:

http://lab.llvm.org:8011/#/builders/53/builds/1366

Cheers,
Yvan

Thanks for the heads up. I've landed
https://github.com/llvm/llvm-project/commit/97184ab99c46e35ae94f828ee90f5d6af2c47e11
which should have addressed that issue, but now the bot is failing with a
different issue that looks like a stale output. I'm still looking into it,
but if I cannot find a culprit quickly, I'll revert the change.

phosek reopened this revision.Feb 22 2021, 11:18 AM
This revision is now accepted and ready to land.Feb 22 2021, 11:18 AM
phosek updated this revision to Diff 325550.Feb 22 2021, 1:19 PM

Check the size of __llvm_prf_cnts section rather than its content.

This revision was landed with ongoing or failed builds.Feb 22 2021, 2:00 PM
This revision was automatically updated to reflect the committed changes.

instrprof-gc-sections.c is failing for me:

FAIL: Profile-x86_64 :: instrprof-gc-sections.c (1 of 1)
******************** TEST 'Profile-x86_64 :: instrprof-gc-sections.c' FAILED ********************
Script:
--
: 'RUN: at line 3';      /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/./bin/clang   -m64  -ldl  -fprofile-instr-generate=/usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.profraw -fuse-ld=lld -fcoverage-mapping -mllvm -enable-name-compression=false -DCODE=1 -ffunction-sections -fdata-sections -Wl,--gc-sections -o /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c
: 'RUN: at line 4';    /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp
: 'RUN: at line 5';   llvm-profdata merge -o /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.profdata /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.profraw
: 'RUN: at line 6';   llvm-profdata show --all-functions /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.profdata | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=PROF
: 'RUN: at line 7';   llvm-cov show /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp -instr-profile /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.profdata | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=COV
: 'RUN: at line 8';   llvm-nm /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=NM
: 'RUN: at line 9';   llvm-readelf -x __llvm_prf_names /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=PRF_NAMES
: 'RUN: at line 10';   llvm-size -A /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=PRF_CNTS
: 'RUN: at line 12';      /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/./bin/clang   -m64  -ldl  -fprofile-instr-generate=/usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.lto.profraw -fuse-ld=lld -fcoverage-mapping -mllvm -enable-name-compression=false -DCODE=1 -ffunction-sections -fdata-sections -Wl,--gc-sections -flto -o /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.lto /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c
: 'RUN: at line 13';    /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.lto
: 'RUN: at line 14';   llvm-profdata merge -o /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.lto.profdata /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.lto.profraw
: 'RUN: at line 15';   llvm-profdata show --all-functions /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.lto.profdata | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=PROF
: 'RUN: at line 16';   llvm-cov show /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.lto -instr-profile /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.lto.profdata | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=COV
: 'RUN: at line 17';   llvm-nm /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.lto | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=NM
: 'RUN: at line 18';   llvm-readelf -x __llvm_prf_names /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.lto | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=PRF_NAMES
: 'RUN: at line 19';   llvm-size -A /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.lto | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=PRF_CNTS
: 'RUN: at line 30';      /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/./bin/clang   -m64  -ldl  -fprofile-generate=/usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.pgo.profraw -fuse-ld=lld -DCODE=1 -ffunction-sections -fdata-sections -Wl,--gc-sections -o /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.pgo /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c
: 'RUN: at line 31';    /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.pgo
: 'RUN: at line 32';   llvm-profdata merge -o /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.pgo.profdata /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.pgo.profraw
: 'RUN: at line 33';   llvm-profdata show --all-functions /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.pgo.profdata | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=PGO
: 'RUN: at line 34';   llvm-nm /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.pgo | FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=NM
--
Exit Code: 1

Command Output (stderr):
--
+ : 'RUN: at line 3'
+ /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/./bin/clang -m64 -ldl -fprofile-instr-generate=/usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.profraw -fuse-ld=lld -fcoverage-mapping -mllvm -enable-name-compression=false -DCODE=1 -ffunction-sections -fdata-sections -Wl,--gc-sections -o /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c
+ : 'RUN: at line 4'
+ /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp
+ : 'RUN: at line 5'
+ llvm-profdata merge -o /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.profdata /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.profraw
+ : 'RUN: at line 6'
+ llvm-profdata show --all-functions /usr/local/google/home/aeubanks/repos/llvm-project2/build/cmake/projects/compiler-rt/test/profile/Profile-x86_64/Output/instrprof-gc-sections.c.tmp.profdata
+ FileCheck /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c -check-prefix=PROF
/usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c:53:15: error: PROF-NEXT: is not on the line after the previous match
// PROF-NEXT: Instrumentation level: Front-end
              ^
<stdin>:10:1: note: 'next' match was here
Instrumentation level: Front-end
^
<stdin>:5:19: note: previous match ended here
 Function count: 1
                  ^
<stdin>:6:1: note: non-matching line after previous match is here
 foo:
^

Input file: <stdin>
Check file: /usr/local/google/home/aeubanks/repos/llvm-project2/compiler-rt/test/profile/instrprof-gc-sections.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
         1: Counters:
         2:  main:
         3:  Hash: 0x0000000000000018
         4:  Counters: 1
         5:  Function count: 1
         6:  foo:
         7:  Hash: 0x0000000000000000
         8:  Counters: 1
         9:  Function count: 0
        10: Instrumentation level: Front-end
next:53     !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: match on wrong line
        11: Functions shown: 2
        12: Total functions: 2
        13: Maximum function count: 1
        14: Maximum internal block count: 0
>>>>>>

--

********************
********************
Failed Tests (1):
  Profile-x86_64 :: instrprof-gc-sections.c


Testing Time: 0.12s
  Failed: 1

cmake invocation:

cmake -G Ninja -DLLVM_LOCAL_RPATH=/usr/local/google/home/aeubanks/repos/chromium/src/third_party/llvm-build-tools/gcc-10.2.0-trusty/lib64 '-DCOMPILER_RT_TEST_COMPILER_CFLAGS=--gcc-toolchain=/usr/local/google/home/aeubanks/repos/chromium/src/third_party/llvm-build-tools/gcc-10.2.0-trusty -Wl,-rpath,/usr/local/google/home/aeubanks/repos/chromium/src/third_party/llvm-build-tools/gcc-10.2.0-trusty/lib64 -Wl,-rpath,/usr/local/google/home/aeubanks/repos/chromium/src/third_party/llvm-build-tools/gcc-10.2.0-trusty/lib32' -DLLVM_ENABLE_LIBXML2=FORCE_ON -DLLVM_TARGETS_TO_BUILD=X86 '-DLLVM_ENABLE_PROJECTS=clang;lld;compiler-rt' -DCMAKE_C_FLAGS= -DCMAKE_CXX_FLAGS= -DCMAKE_EXE_LINKER_FLAGS= -DCMAKE_SHARED_LINKER_FLAGS= -DCMAKE_MODULE_LINKER_FLAGS= -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_C_COMPILER=/usr/local/google/home/aeubanks/repos/chromium/src/third_party/llvm-build-tools/gcc-10.2.0-trusty/bin/gcc -DCMAKE_CXX_COMPILER=/usr/local/google/home/aeubanks/repos/chromium/src/third_party/llvm-build-tools/gcc-10.2.0-trusty/bin/g++  ../../llvm

Looks like lld isn't getting built when I run ninja check-profile. Manually building it then running check-profile works.

instrprof-gc-sections.c is failing for me:

Do you have lld installed on your host? This looks like as if the test was using an older lld and not the one from your build.

instrprof-gc-sections.c is failing for me:

Do you have lld installed on your host? This looks like as if the test was using an older lld and not the one from your build.

lld-available may be a bit tricky. I can never tell when lld-available or lld should be used:( Perhaps disable the test temporarily (would be better for @aeubanks to do that as he can test the configuration)

Looks like lld isn't getting built when I run ninja check-profile. Manually building it then running check-profile works.

I think the problem is that depending on the order in which projects are processed, TARGET lld on https://github.com/llvm/llvm-project/blob/06e25d56451977ef5b7052282faacfe3d42acb65/compiler-rt/test/profile/CMakeLists.txt#L8 might return FALSE in case compiler-rt is processed before lld so this is not sufficient. What's not clear to me is how does this work for other runtimes like ASan which use the same pattern, see https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/asan/CMakeLists.txt#L29 (or maybe it doesn't and nobody noticed).

I'll try to come up with a fix if you're not in a hurry, otherwise I can revert the change and reland it later.

I'm ok for now, I was trying to investigate a bunch of other failing check-profile tests on our bots when this separate one came up locally :)

instrprof-gc-sections.c is failing for me:

Do you have lld installed on your host? This looks like as if the test was using an older lld and not the one from your build.

lld-available may be a bit tricky. I can never tell when lld-available or lld should be used:( Perhaps disable the test temporarily (would be better for @aeubanks to do that as he can test the configuration)

lld-available is set based on COMPILER_RT_HAS_LLD (https://github.com/llvm/llvm-project/blob/06e25d56451977ef5b7052282faacfe3d42acb65/compiler-rt/CMakeLists.txt#L595) which may be TRUE but TARGET lld may still return FALSE depending on the order in which projects are processed.