This is an archive of the discontinued LLVM Phabricator instance.

[LLD] Adding support for RELA for CG Profile.
ClosedPublic

Authored by ayermolo on Jun 30 2021, 11:28 AM.

Details

Summary

This is a follow up to https://reviews.llvm.org/D104080, and https://github.com/llvm/llvm-project/commit/ca3bdb57fa1ac98b711a735de048c12b5fdd8086#diff-e64a48fabe31db213a631fdc5f2acb51bdddf3f16a8fb2928784f4c579229585. The implementation of call graph profile was changed from a black box section to relocation approach. This was done to be compatible with post processing tools like strip/objcopy, and llvm equivalent. When they are invoked on object file before the final linking step with this new approach the symbol indices correctness is preserved.

The GNU binutils tools change the REL section to RELA section, unlike llvm tools. For example when strip -S is run on the ELF object files, as an intermediate step before linking. To preserve compatibility this patch extends implementation in LLD and ELFDumper to support both REL and RELA sections for call graph profile.

Diff Detail

Event Timeline

ayermolo created this revision.Jun 30 2021, 11:28 AM
ayermolo requested review of this revision.Jun 30 2021, 11:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2021, 11:28 AM
ayermolo updated this revision to Diff 355643.Jun 30 2021, 11:32 AM

removing header include that sneaked in somehow

I have changed MC to unconditionally emit REL, so this change is not needed.

I have changed MC to unconditionally emit REL, so this change is not needed.

Unfortunately when running strip -S on ELF object files it converts REL to RELA.

Before:

[85] .llvm.call-graph-profile LLVM_CALL_GRAPH_PROFILE 0000000000000000 00db01 000168 08   E 92   0  1
[86] .rel.llvm.call-graph-profile REL  0000000000000000 01ec18 0005a0 10     92  85  8

After:

[76] .llvm.call-graph-profile LLVM_CALL_GRAPH_PROFILE 0000000000000000 0056aa 000168 08   E  0   0  1
[77] .rela.llvm.call-graph-profile RELA 0000000000000000 00dd18 000870 18   I 81  76  8

Supporting usage model of running strip -S on object files before linking was original motivation for this changing CG Call graph to use relocations.

Thanks for the fix. Please add a test case of what a stripped binary would look like (yaml2obj probably easiest way to do this) to make sure this works. Also are there llvm-readobj changes that need to be made?

lld/ELF/Driver.cpp
63

Needed?

858–871

I don't think you need this vector, the getIndex lambda can be generalized as a templated lambda/function based on ArrayRef

906

Add a warning + bailout if a binary every has both for some weird reason?

lld/ELF/InputFiles.h
251–252

Fix up comments to reflect that there is "always" here now.

ayermolo added inline comments.Jun 30 2021, 11:43 AM
lld/ELF/Driver.cpp
858–871

I wanted to doing a rel/rela check on each invocation of helper function.

I feel uneasy with the additional complexity. I am not sure we want this just to make strip -S happy. Can you use llvm-strip -S?

MaskRay requested changes to this revision.Jun 30 2021, 11:48 AM
This revision now requires changes to proceed.Jun 30 2021, 11:48 AM

I feel uneasy with the additional complexity. I am not sure we want this just to make strip -S happy. Can you use llvm-strip -S?

Although llvm-strip does have the "correct/expected" behavior I think it's a too strict requirement on the users. Specifically I am thinking of large build systems with multiple projects. Right now it will be an assert

if (obj->cgProfileRel.size() != obj->cgProfile.size() * 2)
   fatal("number of relocations doesn't match Weights");

Which in context is not a descriptive error message.
Owners of projects will need to dig in to LLD source code, and change their workflow.

Also strip and objcopy are standard GNU tools that I think used interchangeably with llvm tools, and it's worth for LLD to continue to support them, and not force users in to using llvm tools.

I understand that this ads complexity, but I think the benefit of enabling support for more tools and reducing work on users of lld out-weights it.

Why is strip -S is part of the step running on intermediate object files?

The usually operation is to do strip only on linked images. If you want to have smaller intermediate object files, you may use -gsplit-dwarf or -gsplit-dwarf=single.

Why is strip -S is part of the step running on intermediate object files?

The usually operation is to do strip only on linked images. If you want to have smaller intermediate object files, you may use -gsplit-dwarf or -gsplit-dwarf=single.

Good question. My understanding of the usage model after talking to someone knowledgable in build system is that there are two issues. First we have a large set of pre-build libraries that various projects link against and they are build with monolithic debug information. Second is how caching of object files works within the build system. It allows flexibility for projects to re-use objects with and without debug information. Second part can be mitigated if build system supported dwo files, but right now it doesn't.
There is another potential issue. For example if there is some post processing tool for object files/binaries that doesn't support split dwarf. As an example until recently BOLT project only supported monolithic debug information.

First, .llvm.call-graph-profile is only emitted by -fprofile-use= and -fprofile-sample-use=, instrumentation PGO and sample PGO.
The use case is very specific. For sample PGO/BOLT there are other requirements from an external tool converting linux-perf data to a profile format recognized by llvm-project.
I can imagine that BOLT may have more requirements on LLVM tooling.
So I don't see why requiring llvm-strip will be another hindrance.
Also keep in mind that GNU binutils doesn't recognize the section type SHT_LLVM_*. (Folks have added dumping support to llvm-readobj.)

Second is how caching of object files works within the build system. It allows flexibility for projects to re-use objects with and without debug information.

If linker input size is not a concern, you may use ld -S instead of strip -S on .o files.

I don't think the additional complexity is all that much, especially given there's an actual use-case for it. It'd be different if the case was some hypothetical situation, but it isn't. Not all systems have all of LLVM installed as their toolchain, so forcing people away from using GNU tools seems like the wrong approach to me.

First, .llvm.call-graph-profile is only emitted by -fprofile-use= and -fprofile-sample-use=, instrumentation PGO and sample PGO.
The use case is very specific. For sample PGO/BOLT there are other requirements from an external tool converting linux-perf data to a profile format recognized by llvm-project.
I can imagine that BOLT may have more requirements on LLVM tooling.
So I don't see why requiring llvm-strip will be another hindrance.
Also keep in mind that GNU binutils doesn't recognize the section type SHT_LLVM_*. (Folks have added dumping support to llvm-readobj.)

Second is how caching of object files works within the build system. It allows flexibility for projects to re-use objects with and without debug information.

If linker input size is not a concern, you may use ld -S instead of strip -S on .o files.

Linker input size is a concern. But also moving all the object files (gigs worth) around from machines where they are cached to ones where final linking step is happening.
I brought up BOLT just in context of debug fission, and that it's not always possible to enable it. Depending on the work flow. Not as an argument for this patch. Sorry for confusion.
Speaking of distributed build environment. As James pointed out, they might not have full llvm toolchain installed. I understand that this ads complexity, but it's very small and it enables maximum compatibility of LLD with standard tools that is used in production.

I filed a binutils bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=28035 Perhaps you can chime in?
The support needs a comment that this is specifically for GNU strip.

That said, my main objection is that we now waste additional 16 bytes for InputFile. Can it be avoided?

lld/ELF/Driver.cpp
864

drop braces

First, .llvm.call-graph-profile is only emitted by -fprofile-use= and -fprofile-sample-use=, instrumentation PGO and sample PGO.
The use case is very specific.

Not sure if I follow this. If what you meant is we don't need to have good support for it since this call graph profile is not used by everyone... I'm afraid I don't agree: 1. We used this broadly for hundreds of workloads. 2. Toolchains have many features most of which aren't used by many, but as long as those who use them are willing to maintain and improve it, why would we want to block such work just because it's not a mainstream feature?

For sample PGO/BOLT there are other requirements from an external tool converting linux-perf data to a profile format recognized by llvm-project.
I can imagine that BOLT may have more requirements on LLVM tooling.
So I don't see why requiring llvm-strip will be another hindrance.

This is more of a build system issue - requiring all gnu tools to be replaced by llvm tools in any large organization is going to be very difficult. I think it's reasonable to support some level of compatibility if the complexity isn't high.

In fact, there're many precedences for that. In LLD, we have a list of silently ignored switches for compatibility. We could insist build system changes for all instead of accommodate them in toolchain, but we favored compatibility.

You mentioned extra complexity, while I agree that if possible simpler implementation and tighter contract is better, but the trade off here between very minor complexity and compatibility seem not very different from those silently ignored switches.

Also keep in mind that GNU binutils doesn't recognize the section type SHT_LLVM_*. (Folks have added dumping support to llvm-readobj.)

Second is how caching of object files works within the build system. It allows flexibility for projects to re-use objects with and without debug information.

If linker input size is not a concern, you may use ld -S instead of strip -S on .o files.

I don't think the additional complexity is all that much, especially given there's an actual use-case for it. It'd be different if the case was some hypothetical situation, but it isn't. Not all systems have all of LLVM installed as their toolchain, so forcing people away from using GNU tools seems like the wrong approach to me.

Exactly. As mentioned above, we have precedence for supporting some level of compatibility. Insisting on simplicity and closed toolchain would do more harm than good.

First, .llvm.call-graph-profile is only emitted by -fprofile-use= and -fprofile-sample-use=, instrumentation PGO and sample PGO.
The use case is very specific.

Not sure if I follow this. If what you meant is we don't need to have good support for it since this call graph profile is not used by everyone...

No.

I'm afraid I don't agree: 1. We used this broadly for hundreds of workloads. 2. Toolchains have many features most of which aren't used by many, but as long as those who use them are willing to maintain and improve it, why would we want to block such work just because it's not a mainstream feature?

I meant: some features have some particular requirement. We don't want to add support for something nobody uses.
For this point, I saw this patch so I knew you are going to use it. If it didn't add 16-bytes to each InputFile I would be ok with it.
However, it has such overhead so I need to balance the needs with the costs and want to ensure you would not add the cost if you could fix the build system internally.

I don't think the additional complexity is all that much, especially given there's an actual use-case for it. It'd be different if the case was some hypothetical situation, but it isn't. Not all systems have all of LLVM installed as their toolchain, so forcing people away from using GNU tools seems like the wrong approach to me.

Exactly. As mentioned above, we have precedence for supporting some level of compatibility. Insisting on simplicity and closed toolchain would do more harm than good.

See my previous comment. My main concern is the size overhead to InputFile. I am thinking we are paying too much for a relatively less useful feature.

The fatal issue can be addressed by, e.g. emitting a warning instead or ignoring the section completely.

I filed a binutils bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=28035 Perhaps you can chime in?
The support needs a comment that this is specifically for GNU strip.

That said, my main objection is that we now waste additional 16 bytes for InputFile. Can it be avoided?

We only waste 16 bytes if someone uses bin utilities vs llvm one. Some users might be OK with that waste vs trying to switch large fleet of servers to use llvm.
Thank you for creating a bug report, but even if it's fixed in the issue there is that servers are using older version of binutils.

To add. From pervious patch. With RELA it was 322KB. Each entry is 56 bytes ( 8 + 24 + 24), so thats 5750 entries. With REL (8 + 16 + 16), that will be 40 bytes per entry, and translates to 230KB.
The binary is ~113MB. So with RELA it's about 0.28%, with REL it's 0.2%.

My main concern has always been: I am not sure ArrayRef<Elf_Rel> cgProfileRel; ArrayRef<Elf_Rela> cgProfileRela; is the correct design.

Previously I was ok because one ArrayRef just added 16 bytes. Now you are taking extra 16 bytes from each InputFile. Given the tiny benefit of this call graph profile optimization, I am not sure this is the good traceoff.

Can't you just not cache cgProfileRela?

ayermolo updated this revision to Diff 356823.Jul 6 2021, 2:56 PM

Re-did lld part so that relocation sections are not cached.

My main concern has always been: I am not sure ArrayRef<Elf_Rel> cgProfileRel; ArrayRef<Elf_Rela> cgProfileRela; is the correct design.

Previously I was ok because one ArrayRef just added 16 bytes. Now you are taking extra 16 bytes from each InputFile. Given the tiny benefit of this call graph profile optimization, I am not sure this is the good traceoff.

Can't you just not cache cgProfileRela?

Oh. I see. Something like this? I guess downside is that linking time will go up slightly because we have to scan through all the sections again.

You also need a yaml2obj test case. This is adding new functionality. Any such patch needs a test case.

Oh. I see. Something like this? I guess downside is that linking time will go up slightly because we have to scan through all the sections again.

Yes, this is close. See my inline comment. The downside is likely negligible if you benchmark it... sizeof(InputSection) is ~184 bytes. I would be unhappy if the relatively minor feature takes 15% of its size.

lld/ELF/Driver.cpp
864

Not addressed, but see below: with inlining you can ignore this.

876

section index

886

< => !=

891

just inline getIndices here

lld/ELF/InputFiles.cpp
579–580

Replace cgProfile with cgProfileSectionIndex. Let processRelocationsCGSection (which probably should be renamed to something else, e.g. processCallGraphRelocations) bail out if cgProfileSectionIndex is 0 (SHN_UNDEF). You can also delete one loop there.

You also need a yaml2obj test case. This is adding new functionality. Any such patch needs a test case.

Oh. I see. Something like this? I guess downside is that linking time will go up slightly because we have to scan through all the sections again.

Yes, this is close. See my inline comment. The downside is likely negligible if you benchmark it... sizeof(InputSection) is ~184 bytes. I would be unhappy if the relatively minor feature takes 15% of its size.

Wanted to make sure it's on correct path, before adding tests, and elfDumper needs to be modified.

ayermolo updated this revision to Diff 357062.Jul 7 2021, 1:10 PM

Updated ELFDumper and a test that uses yaml2obj, plus some other cleanup/feedback incorporation.

MaskRay added inline comments.Jul 7 2021, 1:52 PM
lld/ELF/InputFiles.h
253

uint32_t

MaskRay added inline comments.Jul 7 2021, 1:52 PM
lld/ELF/Driver.cpp
901

This consumes too much stack space. Perhaps SmallVector<uint32_t, 32>

lld already uses SmallVector<unsigned, 32> so using 32 will not increase binary size.

906

Delete the unneeded blank line

llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test
256

Add a test comment: ## GNU strip may convert SHT_REL to SHT_RELA. Test we can handle SHT_RELA.

259

Delete elf-cg-profile RUN lines. The aliases are tested by dedicated tests, no need for duplicating.

289

Delete if unused or make it match the reality

296

The offsets should match the reality: they relocate the values (0x0, 0x8, 0x10, ...)

llvm/tools/llvm-readobj/ELFDumper.cpp
6727

Drop else since we are using the early return pattern.

6733

Add a comment: MC unconditionally produces SHT_REL but GNU strip may convert the format to SHT_RELA (https://sourceware.org/bugzilla/show_bug.cgi?id=28035)

I don't expect they will fix this any time soon, but good to have a reference to justify additional complexity here.

ayermolo updated this revision to Diff 357095.Jul 7 2021, 3:24 PM

addressing comments

ayermolo edited the summary of this revision. (Show Details)Jul 7 2021, 3:31 PM
MaskRay accepted this revision.Jul 7 2021, 3:33 PM

LG, @jhenderson may have more comments

llvm/tools/llvm-readobj/ELFDumper.cpp
6778

drop parens beside !=

This revision is now accepted and ready to land.Jul 7 2021, 3:33 PM

Looks like if we run strip -S it changes rel to rela. Changed so that LLD, and ELFDumper, supports both REL and RELA for CG Profile.

Looks like => Make this certain. https://sourceware.org/bugzilla/show_bug.cgi?id=28035

Looks like if we run strip -S it changes rel to rela. Changed so that LLD, and ELFDumper, supports both REL and RELA for CG Profile.

Looks like => Make this certain. https://sourceware.org/bugzilla/show_bug.cgi?id=28035

My bad. Just my "verbal tic" :)

ayermolo edited the summary of this revision. (Show Details)Jul 7 2021, 3:37 PM
ayermolo updated this revision to Diff 357097.Jul 7 2021, 3:40 PM

Addressing () comment.

Test case(s) for the LLD portion of this patch?

lld/ELF/Driver.cpp
858–903

A quick skim of the area shows // is used for comenting function-level comments.

886
lld/ELF/InputFiles.h
24

Do you need to add this include? My understanding of the style guide is that you don't need to add it, unless the code doesn't compile without (it is likely pulled in by other headers).

llvm/tools/llvm-readobj/ELFDumper.cpp
6705–6706

Style is overwhelmingly // for functions here. These don't need doxygen comments, since they're not part of a public interface.

6733

clang-format

6739–6741

Test case needed for this path.

ayermolo updated this revision to Diff 357355.Jul 8 2021, 2:10 PM

addressing comments

Test case(s) for the LLD portion of this patch?

There doesn't seem to be any tests for LLD, even for old implementation. What would be best way? feed llvm bit code in to llvm-mc, run lld on it and check final binary symbol table for ordering? Add something to LLD to output from internal data structure?

lld/ELF/InputFiles.h
24

ugh, sneaked in somehow.

ayermolo updated this revision to Diff 357358.Jul 8 2021, 2:15 PM

missed an update

MaskRay added a comment.EditedJul 8 2021, 2:16 PM

Test case(s) for the LLD portion of this patch?

There doesn't seem to be any tests for LLD, even for old implementation. What would be best way? feed llvm bit code in to llvm-mc, run lld on it and check final binary symbol table for ordering? Add something to LLD to output from internal data structure?

See test/ELF/cgprofile-obj.s for .cg_profile directives. The old implementation only tested RELA, not REL. The current state is only REL. Now we need RELA for GNU objcopy/strip compatibility, we need yaml2obj tests because assemblers don't emit RELA.

Please mark resolved comments as "done" (press submit).

Test case(s) for the LLD portion of this patch?

There doesn't seem to be any tests for LLD, even for old implementation. What would be best way? feed llvm bit code in to llvm-mc, run lld on it and check final binary symbol table for ordering? Add something to LLD to output from internal data structure?

See test/ELF/cgprofile-obj.s for .cg_profile directives. The old implementation only tested RELA, not REL. The current state is only REL. Now we need RELA for GNU objcopy/strip compatibility, we need yaml2obj tests because assemblers don't emit RELA.

Please mark resolved comments as "done" (press submit).

Ah I see it. Not sure how I missed it.

ayermolo marked 24 inline comments as done.Jul 8 2021, 2:25 PM
ayermolo updated this revision to Diff 357396.Jul 8 2021, 5:52 PM

Added LLD test.

jhenderson added inline comments.Jul 9 2021, 12:34 AM
lld/ELF/Driver.cpp
858–903
886

Ping? This hasn't been addressed...

lld/test/ELF/cgprofile-rela-obj.test
1 ↗(On Diff #357396)

Add a comment (using '##' for comment markers) to this test to explain the purpose of this test.

I think you can also drop -obj from the test name.

4 ↗(On Diff #357396)

You don't need to specify the entry point explicitly. If you really want to suppress the warning, I'd just rename one of the symbols to _start.

8 ↗(On Diff #357396)

Nit: add a blank line between the RUN and YAML blocks.

I also have a personal preference for the flow of the test to be:

# RUN commands

# CHECK directives

YAML
llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test
209
307
ayermolo updated this revision to Diff 357601.Jul 9 2021, 12:36 PM
ayermolo marked 4 inline comments as done.

Addressing comments

lld/ELF/Driver.cpp
886

Sorry missed it.

lld/test/ELF/cgprofile-rela-obj.test
4 ↗(On Diff #357396)

Hmm. Even with that still got warning: cannot find entry symbol _start; defaulting to 0x201120
Was probably doing something wrong. Just left as Aa.

MaskRay added inline comments.Jul 9 2021, 12:49 PM
lld/test/ELF/cgprofile-rela-obj.test
4 ↗(On Diff #357396)

_start needs to be non-local. You can find the pattern in other tests.

.globl _start
_start:
ayermolo updated this revision to Diff 357614.Jul 9 2021, 1:18 PM

Added _start

@MaskRay Gave it another shot. Got it to work this time. I guess global symbol needs to be at the end of Symbols in yaml. Previously I was getting linker error.

MaskRay added inline comments.Jul 9 2021, 2:11 PM
lld/test/ELF/cgprofile-rela.test
33

Delete unused AddressAlign: 0x10

73

Add Offset: 0x0

121

Delete STT_SECTION symbols not referenced by relocations.

ayermolo updated this revision to Diff 357666.Jul 9 2021, 4:36 PM

cleaned up cgprofile-rela.text

ayermolo updated this revision to Diff 357667.Jul 9 2021, 4:39 PM
ayermolo marked 7 inline comments as done.

Missed comment fix.

ayermolo marked an inline comment as done.Jul 9 2021, 4:40 PM
jhenderson added inline comments.Jul 12 2021, 4:05 AM
lld/test/ELF/cgprofile-rela.test
2

I'd rewrite the comment like this:

"Under some circumstances, GNU tools strip/objcopy change REL to RELA. Test that LLD can handle call graph profile data relocated with RELA relocations."

33

I assume your aim is for 2-byte sections? If so, you can replace these lines with Size: 2, which I think expresses the intent better.

64

I believe the Link: .symtab is implicit for relocation sections. You can remove it.

65

Similarly: I think the address align field has a default value for reloc sections. No need to explicitly specify it.

llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test
209

Two typos.

307
ayermolo updated this revision to Diff 357987.Jul 12 2021, 10:48 AM

Addressing rela test comments.

ayermolo marked 5 inline comments as done.Jul 12 2021, 10:54 AM
MaskRay added inline comments.Jul 12 2021, 11:00 AM
lld/test/ELF/cgprofile-rela.test
2

You can attach https://sourceware.org/bugzilla/show_bug.cgi?id=28035 after "change REL to RELA"

ayermolo updated this revision to Diff 358042.Jul 12 2021, 12:55 PM

Added link to bugzilla to the test.

MaskRay accepted this revision.Jul 12 2021, 6:41 PM
jhenderson accepted this revision.Jul 13 2021, 12:42 AM

LGTM, except for one remaining typo fix.

llvm/test/tools/llvm-readobj/ELF/call-graph-profile.test
307

Ping this typo fix.

ayermolo updated this revision to Diff 358326.Jul 13 2021, 10:17 AM

Missed a typo.

ayermolo marked 2 inline comments as done.Jul 13 2021, 10:18 AM

Thanks for review. Will commit later today.

Thanks for review. Will commit later today.

Please update the summary (and commit message) about the use scenario.

ayermolo edited the summary of this revision. (Show Details)Jul 13 2021, 12:04 PM
This revision was automatically updated to reflect the committed changes.