Page MenuHomePhabricator

[InstrProfiling] Make CountersPtr in __profd_ relative
AcceptedPublic

Authored by MaskRay on Jun 18 2021, 11:05 AM.

Details

Reviewers
davidxl
rnk
vsk
Summary

Change CountersPtr in __profd_ to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that __profc_ is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
.quad a-b (64-bit label difference) can lower to a 32-bit PC-relative relocation.

# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo

# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo

# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo

(Note: link.exe sorts .lprfc before .lprfd even if the object writer
has .lprfd before .lprfc.)

With this change, a stage 2 (-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR)
ld -pie linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.

% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059  # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31

Bump the raw profile format version to 6. (Last bump happened in 2019-10.)

The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.

Diff Detail

Unit TestsFailed

TimeTest
50 msx64 windows > LLVM.tools/llvm-profdata::nocompress.test
Script: -- : 'RUN: at line 11'; not c:\ws\w2\llvm-project\premerge-checks\build\bin\llvm-profdata.exe show C:\ws\w2\llvm-project\premerge-checks\llvm\test\tools\llvm-profdata/Inputs/compressed.profraw -o C:\ws\w2\llvm-project\premerge-checks\build\test\tools\llvm-profdata\Output\nocompress.test.tmp 2>&1 | c:\ws\w2\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w2\llvm-project\premerge-checks\llvm\test\tools\llvm-profdata\nocompress.test

Event Timeline

MaskRay created this revision.Jun 18 2021, 11:05 AM
MaskRay requested review of this revision.Jun 18 2021, 11:05 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 18 2021, 11:05 AM
Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript
MaskRay edited the summary of this revision. (Show Details)Jun 18 2021, 5:52 PM
MaskRay updated this revision to Diff 353139.Jun 18 2021, 6:11 PM
MaskRay edited the summary of this revision. (Show Details)

update

MaskRay edited the summary of this revision. (Show Details)Jun 18 2021, 6:12 PM
MaskRay updated this revision to Diff 353167.Jun 18 2021, 10:26 PM
MaskRay edited the summary of this revision. (Show Details)

make win64 work

davidxl added inline comments.Jun 21 2021, 2:51 PM
compiler-rt/lib/profile/InstrProfilingMerge.c
126

The code to compute the relative counter offset is pretty confusing. Perhaps add more comments on the math behind it.

rnk added a comment.Jun 21 2021, 3:00 PM

On COFF, it would be more traditional to make these image-relative 32-bit offsets, so they would come out as .long __profc_foo@IMGREL. If you grep for IMGREL in test/, you can see various exception handling and RTTI tables use that structure.

I seem to recall this was also an issue for swift, which wanted label difference relocations.

The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.

If we're doing a format change, we should think about this a bit. Previously I was considering removing this field altogether and trying to store this information in a new section, but I don't think it's worth it after D102818.

MaskRay updated this revision to Diff 353502.Jun 21 2021, 3:20 PM

Add more comments to InstrProfilingMerge.c

vsk added a comment.Jun 21 2021, 3:33 PM

llvm has had a long-standing policy of keeping details of the .profraw format private between libprofile and InstrProfReader, opting to only commit to backwards-compat for the indexed .profdata format. I'm concerned that breaking with that practice would send the wrong message and commit llvm to maintaining an extra compat layer for a long time.

Why/how does the Linux PGO patch "care" about the .profraw format?

davidxl added inline comments.Jun 21 2021, 3:40 PM
llvm/lib/ProfileData/InstrProfReader.cpp
366

!=5 condition seems weird.

442

See Vedant's comment -- raw profile compatibility is a non-goal.

MaskRay added a comment.EditedJun 21 2021, 3:46 PM

On COFF, it would be more traditional to make these image-relative 32-bit offsets, so they would come out as .long __profc_foo@IMGREL. If you grep for IMGREL in test/, you can see various exception handling and RTTI tables use that structure.

I seem to recall this was also an issue for swift, which wanted label difference relocations.

The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.

If we're doing a format change, we should think about this a bit. Previously I was considering removing this field altogether and trying to store this information in a new section, but I don't think it's worth it after D102818.

Hmm, IMGREL (IMAGE_REL_AMD64_ADDR32NB) looks useful and is an alternative solution to PC-relative relocations in ELF / relocation subtraction in Mach-O.
But leveraging it seems to need more code in the runtime and will make COFF vs non-COFF different in IR, runtime, and llvm-profdata....

llvm has had a long-standing policy of keeping details of the .profraw format private between libprofile and InstrProfReader, opting to only commit to backwards-compat for the indexed .profdata format. I'm concerned that breaking with that practice would send the wrong message and commit llvm to maintaining an extra compat layer for a long time.

Why/how does the Linux PGO patch "care" about the .profraw format?

Because the Linux kernel cannot statically link against compiler-rt; we MUST emulate the interfaces provided by compiler-rt within the Linux kernel's runtime.

llvm has had a long-standing policy of keeping details of the .profraw format private between libprofile and InstrProfReader, opting to only commit to backwards-compat for the indexed .profdata format. I'm concerned that breaking with that practice would send the wrong message and commit llvm to maintaining an extra compat layer for a long time.

Why/how does the Linux PGO patch "care" about the .profraw format?

Because the Linux kernel cannot statically link against compiler-rt; we MUST emulate the interfaces provided by compiler-rt within the Linux kernel's runtime.

One way to solve this is to use a matching version of llvm-profdata tool and export indexed profile data only.

vsk added a comment.EditedJun 21 2021, 4:16 PM

llvm has had a long-standing policy of keeping details of the .profraw format private between libprofile and InstrProfReader, opting to only commit to backwards-compat for the indexed .profdata format. I'm concerned that breaking with that practice would send the wrong message and commit llvm to maintaining an extra compat layer for a long time.

Why/how does the Linux PGO patch "care" about the .profraw format?

Because the Linux kernel cannot statically link against compiler-rt; we MUST emulate the interfaces provided by compiler-rt within the Linux kernel's runtime.

One way to solve this is to use a matching version of llvm-profdata tool and export indexed profile data only.

I don't have much experience with the Linux build system, so apologies if this is naive, but why is it exactly that it can't statically link libprofile? This is something we do for the Darwin kernel (there's a stripped-down runtime target called libclang_rt.cc_kext_x86_64_osx.a the kernel links).

llvm has had a long-standing policy of keeping details of the .profraw format private between libprofile and InstrProfReader, opting to only commit to backwards-compat for the indexed .profdata format. I'm concerned that breaking with that practice would send the wrong message and commit llvm to maintaining an extra compat layer for a long time.

Why/how does the Linux PGO patch "care" about the .profraw format?

Because the Linux kernel cannot statically link against compiler-rt; we MUST emulate the interfaces provided by compiler-rt within the Linux kernel's runtime.

One way to solve this is to use a matching version of llvm-profdata tool and export indexed profile data only.

I don't have much experience with the Linux build system, so apologies if this is naive, but why is it exactly that it can't statically link libprofile? This is something we do for the Darwin kernel (there's a stripped-down runtime target called libclang_rt.cc_kext_x86_64_osx.a the kernel links).

Does it use buffer API? It does not support value profiling.

There may be other reasons.

vsk added a comment.Jun 21 2021, 4:24 PM

llvm has had a long-standing policy of keeping details of the .profraw format private between libprofile and InstrProfReader, opting to only commit to backwards-compat for the indexed .profdata format. I'm concerned that breaking with that practice would send the wrong message and commit llvm to maintaining an extra compat layer for a long time.

Why/how does the Linux PGO patch "care" about the .profraw format?

Because the Linux kernel cannot statically link against compiler-rt; we MUST emulate the interfaces provided by compiler-rt within the Linux kernel's runtime.

One way to solve this is to use a matching version of llvm-profdata tool and export indexed profile data only.

I don't have much experience with the Linux build system, so apologies if this is naive, but why is it exactly that it can't statically link libprofile? This is something we do for the Darwin kernel (there's a stripped-down runtime target called libclang_rt.cc_kext_x86_64_osx.a the kernel links).

Does it use buffer API? It does not support value profiling.

There may be other reasons.

I see. Yes, the Darwin kernel uses the buffer API, but not value profiling.

I don't have much experience with the Linux build system, so apologies if this is naive, but why is it exactly that it can't statically link libprofile? This is something we do for the Darwin kernel (there's a stripped-down runtime target called libclang_rt.cc_kext_x86_64_osx.a the kernel links).

The kernel neither links against compiler_rt no libgcc_s, so there's at least precedence for not doing so. Also, I'm not sure of the implications of statically linking the code under two different licenses.

MaskRay added a comment.EditedJun 21 2021, 4:53 PM

Not keeping version compatibility for the raw profile format would actually make my life easier.
But I was CCed on the Linux kernel news https://www.phoronix.com/scan.php?page=news_item&px=Clang-PGO-For-Linux-Next so I paid additional care in this patch.

As we can see the compatibility code is actually minimal: just if (relativeCounterPtr()) and the associated function.
There comments are longer than the code.

How about we making a one-time exception as a courtesy to the initial check-in of the CONFIG_PGO_CLANG feature in the Linux kernel?
There can be a bunch of users testing the PGO functionality.
(Well, you may argue that why I couldn't postpone this patch for some extended time, 1 month? I had this in mind for a while (when I changed XRay and learned a bit about PGO last year I had a plan) but I would probably have postponed this work further if I had not seen that some folks are proposing the MIP profile format.)

The summary can be clarified that this is a one-time exception.
I can set up a calendar reminder for myself to drop the compatibility code in llvm-project after some time (@nickdesaulniers @kees may decide on this).
For example, 2 months?

One time exception sounds fine to me.

MaskRay updated this revision to Diff 353519.Jun 21 2021, 5:57 PM
MaskRay edited the summary of this revision. (Show Details)

Mention the one-time exception for .profraw compatibility

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2021, 5:57 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
vsk added a comment.Jun 23 2021, 9:53 AM

A one-time exception to the .profraw compatibility policy sounds reasonable to me.

A little more context: llvm has historically rev'd the .profraw format with abandon to deliver performance/size improvements (as David & co. did with name compression) and new functionality (value profiling, continuous sync mode, ...). That will keep happening. The freedom to rev the raw format comes from keeping it private between the profile reader library and the runtime. Some alternatives to hardcoding details of a particular version of the .profraw format were mentioned earlier in the thread: if the Linux PGO support can pursue one of these (or find some other solution), I think it will it prevent headaches down the road.

rnk added a comment.Jun 23 2021, 12:36 PM

Hmm, IMGREL (IMAGE_REL_AMD64_ADDR32NB) looks useful and is an alternative solution to PC-relative relocations in ELF / relocation subtraction in Mach-O.
But leveraging it seems to need more code in the runtime and will make COFF vs non-COFF different in IR, runtime, and llvm-profdata....

We can go forward with what we have, but the signextIfWin64 helper is pretty subtle. In some ways, I think adding __ImageBase is clearer:

#ifdef _WIN32
extern "C" char __ImageBase;
#endif
uintptr_t rebaseRelativePtr(void *D, void *P) {
#ifdef _WIN32
  return (uintptr_t)&__ImageBase + (uintptr_t)P;
#else
  return (uintptr_t)D + (uintptr_t)P;
#endif
}

It's not quite this easy, and the instrumentation side changes have to basically copy or refactor this code:
https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/MicrosoftCXXABI.cpp#L548

We aren't worried about profraw format compatibility on Windows, so I think we can change this later at any time if we like.

Hmm, IMGREL (IMAGE_REL_AMD64_ADDR32NB) looks useful and is an alternative solution to PC-relative relocations in ELF / relocation subtraction in Mach-O.
But leveraging it seems to need more code in the runtime and will make COFF vs non-COFF different in IR, runtime, and llvm-profdata....

We can go forward with what we have, but the signextIfWin64 helper is pretty subtle. In some ways, I think adding __ImageBase is clearer:

#ifdef _WIN32
extern "C" char __ImageBase;
#endif
uintptr_t rebaseRelativePtr(void *D, void *P) {
#ifdef _WIN32
  return (uintptr_t)&__ImageBase + (uintptr_t)P;
#else
  return (uintptr_t)D + (uintptr_t)P;
#endif
}

It's not quite this easy, and the instrumentation side changes have to basically copy or refactor this code:
https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/MicrosoftCXXABI.cpp#L548

We aren't worried about profraw format compatibility on Windows, so I think we can change this later at any time if we like.

I think you mean that INSTR_PROF_RAW_HEADER(uint64_t, CountersDelta, (uintptr_t)CountersBegin - (uintptr_t)DataBegin) (in InstrProfiling.cpp, so used by both -fprofile-generate and -fprofile-instr-generate) needs to adopt the scheme in https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/MicrosoftCXXABI.cpp#L548
Yes, seems quite a bit of complexity.

Is it possible to ask MSVC to add the 64-bit IMAGE_REL_AMD64_REL64 and IMAGE_REL_ARM64_REL64?
Newer compiler metadata techniques can benefit from using the same mechanism for all three major binary formats (PE-COFF/ELF/Mach-O).

rnk added a comment.Jun 23 2021, 1:51 PM

Is it possible to ask MSVC to add the 64-bit IMAGE_REL_AMD64_REL64 and IMAGE_REL_ARM64_REL64?
Newer compiler metadata techniques can benefit from using the same mechanism for all three major binary formats (PE-COFF/ELF/Mach-O).

Swift wanted the same thing, so I think the answer is yes, we should ask.

Is it possible to ask MSVC to add the 64-bit IMAGE_REL_AMD64_REL64 and IMAGE_REL_ARM64_REL64?
Newer compiler metadata techniques can benefit from using the same mechanism for all three major binary formats (PE-COFF/ELF/Mach-O).

Swift wanted the same thing, so I think the answer is yes, we should ask.

What would the benefit of that be, as the difference itself can only ever be 32 bit? Is it only for consistency with other binary formats? I guess getting the value sign extended to a 64 bit value is a bit useful too.

Swift wanted the same thing, so I think the answer is yes, we should ask.

What would the benefit of that be, as the difference itself can only ever be 32 bit? Is it only for consistency with other binary formats? I guess getting the value sign extended to a 64 bit value is a bit useful too.

Yes, consistency among binary formats. A 64-bit label difference .quad A-B is directly expressable on Mach-O and RISC-V (R_RISCV_SUB*) and can be converted to PC-relative on ELF.

For example, mips64 supports 64-bit PC-relative since D80390 (GNU as still doesn't allow it) => this allowed an XRay clean-up D87977.

MaskRay updated this revision to Diff 356271.Fri, Jul 2, 1:50 PM
MaskRay edited the summary of this revision. (Show Details)

Drop version 5 compatibility because the Linux kernel patch is suspended.

rnk accepted this revision.Fri, Jul 2, 2:42 PM

I'm happy with this.

compiler-rt/include/profile/InstrProfData.inc
79

I'd suggest moving this into a local variable in the instrumentation code. The less code there is in this header, the better.

This revision is now accepted and ready to land.Fri, Jul 2, 2:42 PM
MaskRay updated this revision to Diff 356310.Fri, Jul 2, 7:05 PM
MaskRay marked an inline comment as done.

Move ConstantExpr::getSub from .inc to InstrProfiling.cpp

MaskRay marked 3 inline comments as done.Fri, Jul 2, 7:20 PM

@davidxl @vsk 😃

I am happy to postpone this after 13.0.0 is branched - so that we can adjust the format while only bumping the version once.