Change CountersPtr in __profd_ to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that __profc_ is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
.quad a-b (64-bit label difference) can lower to a 32-bit PC-relative relocation.
# ELF: R_X86_64_PC64 (PC-relative) .quad .L__profc_foo-.L__profd_foo # Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR .quad l___profc_foo-l___profd_foo # COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so # the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo # As compensation, we truncate CountersDelta in the header so that # __llvm_profile_merge_from_buffer and llvm-profdata reader keep working. .quad .L__profc_foo-.L__profd_foo
(Note: link.exe sorts .lprfc before .lprfd even if the object writer
has .lprfd before .lprfc.)
With this change, a stage 2 (-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR)
ld -pie linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}' R_X86_64_JUMP_SLO 331 R_X86_64_TPOFF64 2 R_X86_64_RELATIVE 476059 # was: 607712 R_X86_64_64 2616 R_X86_64_GLOB_DAT 31
Bump the raw profile format version to 6. (Last bump happened in 2019-10.)
The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.
I'd suggest moving this into a local variable in the instrumentation code. The less code there is in this header, the better.