This is an archive of the discontinued LLVM Phabricator instance.

[InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling
ClosedPublic

Authored by MaskRay on Jun 4 2021, 1:55 PM.

Details

Summary

On ELF, the D103372 optimization can apply to more cases. There are two
prerequisites for making __profd_ private:

  • __profc_ keeps __profd_ live under compiler/linker GC
  • __profd_ is not referenced by code

The first is satisfied because all counters/data are in a section group (either
comdat any or comdat noduplicates). The second requires that the function
does not use value profiling.

Regarding the second point: __profd_ may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.

With this change, a stage 2 (-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR)
clang is 4.2% smaller (1-169620032/177066968).

Diff Detail

Event Timeline

MaskRay created this revision.Jun 4 2021, 1:55 PM
MaskRay requested review of this revision.Jun 4 2021, 1:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2021, 1:55 PM
rnk added a subscriber: aeubanks.Jun 8 2021, 12:38 PM

We are observing some crashes in coverage, but we haven't root caused them yet: crbug.com/1216811
So I guess we need to keep waiting for the dust to settle. @aeubanks was able to repro the crash there, but it's not deterministic.

We are observing some crashes in coverage, but we haven't root caused them yet: crbug.com/1216811
So I guess we need to keep waiting for the dust to settle. @aeubanks was able to repro the crash there, but it's not deterministic.

This ended up being unrelated

I think the previous Instrumentation/InstrProfiling.cpp changes can be considered stable now. So, ping:)

davidxl added inline comments.Jun 17 2021, 1:36 PM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
933

Is the lld specific behavior, or it applies to gold and ld.bfd?

MaskRay marked an inline comment as done.Jun 17 2021, 1:42 PM
MaskRay added inline comments.
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
933

It is generic for ELF, applicable to all the linkers.

Other binary formats may have limitation.

rnk accepted this revision.Jun 17 2021, 1:49 PM

lgtm

llvm/test/Transforms/PGOProfile/indirect_call_profile.ll
57

I don't think we need to append hashes to symbols if they will ultimately be private. Seems like future work, though.

This revision is now accepted and ready to land.Jun 17 2021, 1:49 PM
davidxl accepted this revision.Jun 17 2021, 1:51 PM

lgtm.

This revision was automatically updated to reflect the committed changes.
MaskRay marked an inline comment as done.

This patch breaks a two stage instrumented build::

$ cmake \
    -B build/stage1 \
    -G Ninja \
    -Wno-dev \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DLLVM_CCACHE_BUILD=ON \
    -DLLVM_ENABLE_PROJECTS="clang;compiler-rt;lld" \
    -DLLVM_TARGETS_TO_BUILD=host \
    -DLLVM_USE_LINKER=lld \
    llvm

$ ninja -C build/stage1

$ export PATH=$PWD/build/stage1/bin:$PATH

$ cmake \
    -B build/stage2 \
    -G Ninja \
    -Wno-dev \
    -DCLANG_TABLEGEN=$(command -v clang-tblgen) \
    -DCMAKE_AR=$(command -v llvm-ar) \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_C_COMPILER=$(command -v clang) \
    -DCMAKE_CXX_COMPILER=$(command -v clang++) \
    -DCMAKE_RANLIB=$(command -v llvm-ranlib) \
    -DLLVM_BUILD_INSTRUMENTED=IR \
    -DLLVM_BUILD_RUNTIME=OFF \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_PROJECTS="clang;lld" \
    -DLLVM_ENABLE_TERMINFO=OFF \
    -DLLVM_ENABLE_WARNINGS=OFF \
    -DLLVM_TABLEGEN=$(command -v llvm-tblgen) \
    -DLLVM_TARGETS_TO_BUILD=host \
    -DLLVM_USE_LINKER=$(command -v ld.lld) \
    llvm

$ ninja -C build/stage2
...
ld.lld: error: relocation refers to a discarded section: __llvm_prf_data
>>> defined in utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced 232 more times

ld.lld: error: relocation refers to a discarded section: __llvm_prf_data
>>> defined in utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced 8 more times
...

This patch breaks a two stage instrumented build::

$ cmake \
    -B build/stage1 \
    -G Ninja \
    -Wno-dev \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DLLVM_CCACHE_BUILD=ON \
    -DLLVM_ENABLE_PROJECTS="clang;compiler-rt;lld" \
    -DLLVM_TARGETS_TO_BUILD=host \
    -DLLVM_USE_LINKER=lld \
    llvm

$ ninja -C build/stage1

$ export PATH=$PWD/build/stage1/bin:$PATH

$ cmake \
    -B build/stage2 \
    -G Ninja \
    -Wno-dev \
    -DCLANG_TABLEGEN=$(command -v clang-tblgen) \
    -DCMAKE_AR=$(command -v llvm-ar) \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_C_COMPILER=$(command -v clang) \
    -DCMAKE_CXX_COMPILER=$(command -v clang++) \
    -DCMAKE_RANLIB=$(command -v llvm-ranlib) \
    -DLLVM_BUILD_INSTRUMENTED=IR \
    -DLLVM_BUILD_RUNTIME=OFF \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_PROJECTS="clang;lld" \
    -DLLVM_ENABLE_TERMINFO=OFF \
    -DLLVM_ENABLE_WARNINGS=OFF \
    -DLLVM_TABLEGEN=$(command -v llvm-tblgen) \
    -DLLVM_TARGETS_TO_BUILD=host \
    -DLLVM_USE_LINKER=$(command -v ld.lld) \
    llvm

$ ninja -C build/stage2
...
ld.lld: error: relocation refers to a discarded section: __llvm_prf_data
>>> defined in utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced 232 more times

ld.lld: error: relocation refers to a discarded section: __llvm_prf_data
>>> defined in utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced by AsmWriterEmitter.cpp
>>>               utils/TableGen/CMakeFiles/llvm-tblgen.dir/AsmWriterEmitter.cpp.o:((anonymous namespace)::AsmWriterEmitter::run(llvm::raw_ostream&))
>>> referenced 8 more times
...

Thanks for reporting the issue. Reverted in 5798be84580be233e4cf34c08ceec8f79e80502e (which has detailed descriptions)
I made a mistake in my stage-2 testing so I wasn't actually testing stage-2 so failed to catch the static value profiler counter issue...

MaskRay reopened this revision.Jun 18 2021, 12:26 AM
This revision is now accepted and ready to land.Jun 18 2021, 12:26 AM
MaskRay updated this revision to Diff 352926.Jun 18 2021, 12:31 AM
MaskRay retitled this revision from [InstrProfiling] Make __profd_ unconditionally private for ELF to [InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling.
MaskRay edited the summary of this revision. (Show Details)

I need to add a test case, but this is the sketch.

I am not sure whether this value profiling detection is correct.

MaskRay requested review of this revision.Jun 18 2021, 12:31 AM

A stage-2 -DLLVM_BUILD_INSTRMENTED=IR clang is 4.2% smaller (1-169620032/177066968) with this patch.

MaskRay updated this revision to Diff 353089.Jun 18 2021, 1:41 PM
MaskRay edited the summary of this revision. (Show Details)

add test

MaskRay edited the summary of this revision. (Show Details)Jun 18 2021, 1:42 PM
MaskRay added inline comments.Jun 18 2021, 1:46 PM
llvm/test/Transforms/PGOProfile/indirect_call_profile.ll
57

It doesn't. __profd_foo3.[[FOO3_HASH:[0-9]+]] has no harm because such a private linkage variable will not need a symbol table entry.

davidxl added inline comments.Jun 18 2021, 1:54 PM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
936

Add a comment here about NS == 0 condition?

MaskRay updated this revision to Diff 353094.Jun 18 2021, 1:58 PM

improve a comment

davidxl added inline comments.Jun 18 2021, 2:24 PM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
896

should this code by guarded by whether value profile is enabled?

932–937

What is the root cause of the failure when NS == 0 condition is not used?

MaskRay added inline comments.Jun 18 2021, 2:40 PM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
896

For performance?

The code below iterates over the array unconditionally

for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)
  Int16ArrayVals[Kind] = ConstantInt::get(Int16Ty, PD.NumValueSites[Kind]);

so I think having another loop should be fine.

932–937

5798be84580be233e4cf34c08ceec8f79e80502e has the message.

Say a.cc has a prevailing comdat and b.cc's is non-prevailing.

b.cc's linkonce_odr comdat function may get inlined into a function not in a.cc.
It is a linker error for the prevailing function to reference a discarded STB_LOCAL symbol (__profd_ is non-prevailing and thus discarded)

If we don't change the linkage, __profd_ is STB_GLOBAL STV_HIDDEN. It is fine to reference such a non-local symbol, which will bind to a.cc's copy.

davidxl added inline comments.Jun 18 2021, 2:51 PM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
896

Basically in that case, there is no need to create the variable in the first place.

932–937

When ValueProfileStaticAlloc is false, NS will also be zero, but the prf_data will still be referenced -- will it create the same linker error?

If that is the case, I think a better condition for ELF should be :

if (ValueProfileStaticAlloc && NS == 0 || !DataReferencedByCode) ...

!DataReferencedByCode is a bigger hammer to catch all.

MaskRay added inline comments.Jun 18 2021, 3:11 PM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
896

The loop has 2 iterations, so I think should be fine.

for (uint32_t Kind = 0; Kind <= 1; ++Kind)

We could use zeroinitializer but that would require that we have other means to know @llvm.instrprof.value.profile isn't used.

932–937

computeNumValueSiteCounts visits every @llvm.instrprof.value.profile instruction.

If computeNumValueSiteCounts is called, NS will be non-zero. So I guess it is a good indicator, regardless of ValueProfileStaticAlloc.

Correction. With the way NS is unconditionally computed, using NS==0 check is fine, but I think it is cleaner (easier to read) perhaps to check ValuesVar nullness when static allocation is on or fallback to global option check.

On second thought, NS == 0 is a reasonablly good indicator.

MaskRay updated this revision to Diff 353114.Jun 18 2021, 3:32 PM

Use ValuesVar instead of NS

See my prior comment -- your previous revision is sound and LGTM. If we go with this version, we do need to examine whether static allocation is on or not. Sorry for the trouble.

MaskRay updated this revision to Diff 353118.Jun 18 2021, 3:57 PM

fix condition and improve test

davidxl accepted this revision.Jun 18 2021, 4:02 PM

lgtm.

This revision is now accepted and ready to land.Jun 18 2021, 4:02 PM
MaskRay updated this revision to Diff 353121.Jun 18 2021, 4:04 PM

switch to the NS==0 approach after seeing See my prior comment -- your previous revision is sound and LGTM. If we go with this version, we do need to examine whether static allocation is on or not. Sorry for the trouble.
(did not notice in my previous test update...)

The LGTM remains the same.

MaskRay updated this revision to Diff 353122.Jun 18 2021, 4:12 PM

minor adjustment to comment and test

Thanks for review:) I feel that I understand value profiling better now.

I have a plan to use label differences (i.e. PC-relative relocations on ELF) (D104556) to save sizeof(Elf64_Rela) * |__profd_| = 24 * instrumented_functions bytes.

This revision was landed with ongoing or failed builds.Jun 18 2021, 5:01 PM
This revision was automatically updated to reflect the committed changes.