Page MenuHomePhabricator

[LLD][LLVM] CG Graph profile using relocations
ClosedPublic

Authored by ayermolo on Jun 10 2021, 5:17 PM.

Details

Summary

Currently when .llvm.call-graph-profile is created by llvm it explicitly encodes the symbol indices. This section is basically a black box for post processing tools. For example, if we run strip -s on the object files the symbol table changes, but indices in that section do not. In non-visible behavior indices point to wrong symbols. The visible behavior indices point outside of Symbol table: "invalid symbol index".

This patch changes the format by using R_*_NONE relocations to indicate the from/to symbols. The Frequency (Weight) will still be in the .llvm.call-graph-profile, but symbol information will be in relocation section. In LLD information from both sections is used to reconstruct call graph profile. Relocations themselves will never be applied.

With this approach post processing tools that handle relocations correctly work for this section also. Tools can add/remove symbols and as long as they handle relocation sections with this approach information stays correct.

Doing a quick experiment with clang-13.
The size went up from 107KB to 322KB, aggregate of all the input sections. Size of clang-13 binary is ~118MB. For users of -fprofile-use/-fprofile-sample-use the size of object files will go up slightly, it will not impact final binary size.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

small description test fix

ayermolo updated this revision to Diff 352874.Jun 17 2021, 4:44 PM

small fix

ayermolo updated this revision to Diff 352881.Jun 17 2021, 5:22 PM

somehow missed a test.

jhenderson added inline comments.Jun 21 2021, 1:09 AM
llvm/lib/MC/MCELFStreamer.cpp
511

This hasn't been addressed properly: the = at the end of the comment in my inline edit is important, as clang-format interprets such a pattern differently (note that if you add the equals sign and then reformat, the space between the comment and 8 will disappear.

llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test
130
161
211
llvm/test/tools/yaml2obj/ELF/call-graph-profile-section.yaml
83–84

I think you can drop this entire test case, since the YAML no longer refers to symbols for the call graph section.

102–103

Ditto: we no longer refer to symbols directly in the YAML for this section, so drop this case.

llvm/tools/llvm-readobj/ELFDumper.cpp
6732–6733

Get rid of the unnecessary "with relocations".

6740

The style is no capital for the first letter of warnings.

6745
6747

The return type of Rel.getSymbol(...) is already uint32_t, so no need for the trailing return type to be explicitly specified.

ayermolo updated this revision to Diff 353451.Jun 21 2021, 12:16 PM

Addressed comments.

ayermolo marked 8 inline comments as done.Jun 21 2021, 12:32 PM
ayermolo added inline comments.
llvm/lib/MC/MCELFStreamer.cpp
511

ah, sorry missed the =.

llvm/tools/llvm-readobj/ELFDumper.cpp
6747

Also modified lambda in lld driver to follow the same style.

jhenderson accepted this revision.Jun 22 2021, 12:29 AM

No more comments from me - I'm happy with the changes outside LLD, although haven't looked in depth at those within the LLD code. I also don't use the CG profile stuff or pretend to really follow it, so @MaskRay should sign off on that side of things.

This revision is now accepted and ready to land.Jun 22 2021, 12:29 AM
ayermolo marked 2 inline comments as done.Jun 22 2021, 10:06 AM

No more comments from me - I'm happy with the changes outside LLD, although haven't looked in depth at those within the LLD code. I also don't use the CG profile stuff or pretend to really follow it, so @MaskRay should sign off on that side of things.

Thanks! @MaskRay any further feedback?

Mostly looks good.

lld/ELF/Driver.cpp
866–875

This can happen so assert is not appropriate. You can use if (...) fatal

llvm/include/llvm/MC/MCAsmBackend.h
88 ↗(On Diff #353454)

Delete

llvm/lib/MC/MCELFStreamer.cpp
498

Is it expected for architectures that have _NONE to have the mapping? . .Case("BFD_RELOC_NONE", ELF::R_<arch>_NONE)

Yes. In GNU as, BFD_RELOC_NONE is generic and can be used on all targets. I have added BFD_RELOC_NONE to many LLVM targets. Note: .llvm.call-graph-profile is used by -fprofile-use=. The targets which don't support BFD_RELOC_NONE are not used by people for PGO. (If they do use PGO, adding the support should be trivial anyway.)

It probably makes sense to save the return value in a variable and assert it is None. I find that an unknown BFD_RELOC_NONE does not cause an error.

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
677 ↗(On Diff #353454)

This function and its friends can all be deleted.

llvm/tools/llvm-readobj/ELFDumper.cpp
6752

I != Size is probably more common in LLVM code

MaskRay added a comment.EditedJun 22 2021, 10:33 PM

You can mention that this does increase -fprofile-use= object file sizes a bit but probably don't matter.

For instrumentation PGO users, -fprofile-generate= object files are much larger than -fprofile-use=, so -fprofile-use= slightly size increases don't matter.
For sample PGO users, this may increase the peak object file sizes a bit, but doesn't matter as your measurement for clang-13 shows.

Test Plan:
ninja check all, manual rtesting

This should be deleted when committing.

The size went up from 107KB to 322KB, aggregate of all the input sections.

How large is your clang-13? ~100MiB? Then this increase looks neligible.

ayermolo added inline comments.Jun 23 2021, 10:37 AM
llvm/lib/MC/MCELFStreamer.cpp
498

ok, will remove getNoneRelcationName(), and use BFD_RELOC_NONE here. Thanks for the background.

ayermolo updated this revision to Diff 354059.Jun 23 2021, 1:07 PM
ayermolo marked 5 inline comments as done.

addressign feedback

ayermolo edited the summary of this revision. (Show Details)Jun 23 2021, 1:08 PM
ayermolo marked an inline comment as done.
ayermolo added inline comments.
lld/ELF/Driver.cpp
866–875

What is general guideline for fatal vs assert in llvm? I looked at Coding Standard, and didn't see anything (might have missed it)

ayermolo updated this revision to Diff 354062.Jun 23 2021, 1:16 PM
ayermolo marked an inline comment as done.

somehow random file made it in to diff, probably clang format related.

MaskRay added inline comments.Jun 23 2021, 1:29 PM
lld/ELF/Driver.cpp
866–875

My personal guideline is:

assert is for ensuring an invariant doesn't break - it shouldn't break even for malicious user input.

If you know some user input would crash, prefer a proper error reporting mechanism, in lld errorOrWarn/fatal/error. In LLVM internal code you can find many report_fatal_error(similar to assert, just with diagnostic even when assertions is disabled). Many are lazy examples which are not good references.

867

drop trailing period

Don't capitalize

MaskRay accepted this revision.Jun 23 2021, 1:31 PM

LG baring the comment

ayermolo updated this revision to Diff 354069.Jun 23 2021, 1:35 PM

Addressed last comment.

jrtc27 added inline comments.Jun 23 2021, 1:36 PM
lld/ELF/InputFiles.cpp
681

What if it's SHT_REL? This won't fail nicely, it'll give an error about sh_entsize not matching. You should either handle SHT_REL properly (i.e. support it) or not even attempt to get an Elf_Rela for it.

ayermolo updated this revision to Diff 354087.Jun 23 2021, 3:18 PM

Changed lld to handle call graph as RELA only, and emits warning for REL.

ayermolo added inline comments.Jun 23 2021, 3:20 PM
lld/ELF/InputFiles.cpp
681

Something like this?

Patch description is non-descriptive, does not provide any motivation as to why this is wanted e.g.

jrtc27 added inline comments.Jun 23 2021, 3:21 PM
lld/ELF/InputFiles.cpp
681

Why not error?

ayermolo added inline comments.Jun 23 2021, 3:27 PM
lld/ELF/InputFiles.cpp
681

With error it will prevent binary from being linked. I thought warning would be more appropriate so that result will still be a valid binary minus this optimization. I can change to error if you and @MaskRay think that is more appropriate.

ayermolo added inline comments.Jun 23 2021, 3:31 PM
lld/ELF/InputFiles.cpp
681

With error it will prevent binary from being linked. I thought warning would be more appropriate so that result will still be a valid binary minus this optimization. I can change to error if you and @MaskRay think that is more appropriate.

readCallGraphsFromObjectFiles will also need to be modified, otherwise it will fatal there due to obj->cgProfileRela.size() != obj->cgProfile.size() * 2

ayermolo edited the summary of this revision. (Show Details)Jun 23 2021, 3:33 PM
MaskRay requested changes to this revision.Jun 23 2021, 3:36 PM
MaskRay added inline comments.
lld/ELF/InputFiles.cpp
681

Why doesn't SHT_REL work?

A warning is appropriate, as it is an optional optimization anyway.

This revision now requires changes to proceed.Jun 23 2021, 3:36 PM

For example, if we run strip -s on the object files the symbol table changes, but indices in that section do not.

The description needs to mention what would happen with the old format.

We propose to change this section to use relocations. The Frequency will still be in the .llvm.call-graph-profile, but symbol information will be in relocation section. In LLD information from both sections is used to reconstruct call graph profile. Relocations themselves will never be applied.

We propose => This patch changes the format by using R_*_NONE relocations to indicate the from/to symbols.

With this approach post processing tools that handle relocations correctly work for this section also.

I think you might want to say adding/removing symbols?

ayermolo added inline comments.Jun 23 2021, 4:12 PM
lld/ELF/InputFiles.cpp
681

I didn't think SHT_REL is a common case/used. For X86 only I386 and IAMCU use REL, looking at some of the other architectures it's either ON by default or in case of MIPS enabled for 64 bit.

bool HasRelocationAddend = TT.isArch64Bit();
  return std::make_unique<MipsELFObjectWriter>(OSABI, HasRelocationAddend,
                                                IsN64);

I didn't want to over-complicate the implementation un-necessary, but might have been wrong about it. Should I add support?

MaskRay added inline comments.Jun 23 2021, 4:17 PM
lld/ELF/InputFiles.cpp
681

32-bit arm uses REL so it is not that uncommon. x86-32 does matter less these days and I don't know who want to use -fprofile-use for it. You can leave REL as-is. I can fix it afterwards.

The description needs to be fixed, though.

ayermolo edited the summary of this revision. (Show Details)Jun 23 2021, 4:52 PM
ayermolo edited the summary of this revision. (Show Details)
ayermolo updated this revision to Diff 354110.Jun 23 2021, 4:55 PM

Modified readCallGraphsFromObjectFiles to continue if cgProfileRela is empty. In line with just printing warning if relocaiton is REL.

MaskRay accepted this revision.Jun 23 2021, 7:21 PM
This revision is now accepted and ready to land.Jun 23 2021, 7:21 PM
ayermolo updated this revision to Diff 354150.Jun 23 2021, 9:02 PM

rebased on latest.

@ayermolo is oof for a few days and asked me to commit this for him. I will do that shortly.

This revision was automatically updated to reflect the committed changes.

You can mention that this does increase -fprofile-use= object file sizes a bit but probably don't matter.

For instrumentation PGO users, -fprofile-generate= object files are much larger than -fprofile-use=, so -fprofile-use= slightly size increases don't matter.
For sample PGO users, this may increase the peak object file sizes a bit, but doesn't matter as your measurement for clang-13 shows.

Test Plan:
ninja check all, manual rtesting

This should be deleted when committing.

The size went up from 107KB to 322KB, aggregate of all the input sections.

How large is your clang-13? ~100MiB? Then this increase looks neligible.

I've noticed some increases in object size over here. What's the percentage increase you're expecting here in object file sizes?

You can mention that this does increase -fprofile-use= object file sizes a bit but probably don't matter.

For instrumentation PGO users, -fprofile-generate= object files are much larger than -fprofile-use=, so -fprofile-use= slightly size increases don't matter.
For sample PGO users, this may increase the peak object file sizes a bit, but doesn't matter as your measurement for clang-13 shows.

Test Plan:
ninja check all, manual rtesting

This should be deleted when committing.

The size went up from 107KB to 322KB, aggregate of all the input sections.

How large is your clang-13? ~100MiB? Then this increase looks neligible.

I've noticed some increases in object size over here. What's the percentage increase you're expecting here in object file sizes?

So original implementation was 16-bytes (2x4 bytes for index, 8 bytes for Weight). After this @MaskRay changed it to REL implementation. So that's 40 bytes (2x16 bytes for index in relocation form + 8 bytes for Weight).
When I measured with RELA approach on clang-13 it went from 107KB to 322KB in 118MB binary. With REL, current implementation, that's around 230KB out of 118MB. Now this is final binary size so proportion, in aggregate, of object files will be smaller.