This is an archive of the discontinued LLVM Phabricator instance.

[llvm][Instrumentation/MC] Add Call Graph Profile pass and object file emission.
ClosedPublic

Authored by Bigcheese on Mar 27 2018, 5:11 PM.

Details

Summary

This patch adds support for generating a call graph profile from Branch Frequency Info and embedding that profile into an object file to pass to the linker. Linker support for this feature (without reading from the object file) is being reviewed at https://reviews.llvm.org/D36351 and appears to be nearing completion.

Generating the CG Profile

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:

!llvm.module.flags = !{!0} 
 
!0 = !{i32 5, !"CG Profile", !1} 
!1 = !{!2, !3, !4} ; List of edges
!2 = !{!"a", !"b", i64 32} ; Edge from a to b with a weight of 32
!3 = !{!"freq", !"a", i64 11} 
!4 = !{!"freq", !"b", i64 20}

Object FIle Representation

At codegen time this is emitted into the ELF file a pair of symbol indices and a weight. In assembly it looks like:

.cg_profile a, b, 32 
.cg_profile freq, a, 11 
.cg_profile freq, b, 20

When writing an ELF file these are put into a SHT_LLVM_CALL_GRAPH_PROFILE (0x6fff4c02) section as (uint32_t, uint32_t, uint64_t) tuples as (from symbol index, to symbol index, weight).

Diff Detail

Repository
rL LLVM

Event Timeline

Bigcheese created this revision.Mar 27 2018, 5:11 PM

I am not sure what the semantics of a "parent" revision are.

The intention is that we should first finish the lld patch that reads the callgraph from a text file before we look at the llvm changes.

Bigcheese updated this revision to Diff 143031.Apr 18 2018, 7:29 PM

Rebase.

I'm also updating the lld patch that reads this, will have that up shortly.

Bigcheese updated this revision to Diff 143035.Apr 18 2018, 8:17 PM

Previous update was missing tests. Fixed.

The part MC and Object part of this patch LGTM.

The IR level probably needs someone more familiar with metadata as a reviewer.

If you commit the MC and Object parts this should be sufficient to unblock the lld part, right?

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
133 ↗(On Diff #143035)

use an explicit type

144 ↗(On Diff #143035)

Explicit type

147 ↗(On Diff #143035)

Should the metadata be referring to symbols by name?

tools/llvm-readobj/ELFDumper.cpp
751 ↗(On Diff #143035)

We should probably have these values cached in the dumper. It is OK for that to be a followup.

The MC and Object parts are enough for lld, yes.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
147 ↗(On Diff #143035)

Hmm, it may make more sense to refer to the symbols by GlobalValue instead, but that just delays the conversion to string.

The MC and Object parts are enough for lld, yes.

OK. Please commit the MC and Object parts first. We can get the metadata bits reviewed afterwards.

This and D45850 made 2 - 2.5% performance improvement to a large real world application already built with PGO + full LTO. Any chance to push this anytime soon?

pcc added a subscriber: pcc.May 24 2018, 3:20 PM

Can you document the assembly and object file extensions in docs/Extensions.rst?

Do we need to do anything about the fact that the new section contains symbol table indexes (see the discussion on the proposed SHT_LLVM_ADDRSIG section)? I don't think so since an incorrect index will not result in a miscompile but rather a suboptimality.

With that the MC/Object part LGTM to me as well, but it will probably require a rebase at this point. The IR part also looks good but I'd like another pair of eyes on it. Can you split out the IR change as Rafael suggested?

Bigcheese edited reviewers, added: compnerd, pcc; removed: espindola.May 24 2018, 3:25 PM

Adding some reviewers that last touched metadata.

In writing the documentation I ran into an issue defining the precise semantics of .cg_profile when it includes a symbol which has not previously been referenced. In an updated patch I'm treating it as if .weak <symbol> was written if <symbol> has not yet been used. This leads to the following behavior:

.section .test,"aw",@progbits
a:    .word b

    .cg_profile a, b, 32
    .cg_profile freq, a, 11
    .cg_profile late, late2, 20
    .cg_profile .L.local, b, 42

    .globl late
late:
late2:
late3:
.L.local:

a: local
late3: local
b: global
freq: weak
late: global
late2: weak
.L.local: not in symbol table

Note that a and late2 disagree about what binding they have based on if they came before or after .cg_profile. Also .L.local generates a null symbol index because it doesn't get put in the symbol table.

Bigcheese updated this revision to Diff 149005.May 29 2018, 4:45 PM
  • Remove IR parts, those will be in a separate patch.
  • Update Extensions.rst
  • Add assembly only test
pcc added inline comments.May 29 2018, 5:36 PM
docs/Extensions.rst
326 ↗(On Diff #149005)

This seems a little weird. In addition to the cases you mentioned, it would also seem to imply that code like this:

.cg_profile foo, bar, 42

.globl foo
foo:
call bar

will cause bar to become STB_WEAK. I guess the idea behind setting the binding to STB_WEAKrather than STB_GLOBAL is that you don't want to cause an undefined symbol error in the case where all other object files use STB_WEAK for the symbol and one of those object files contains a relocation to the symbol? In that case, maybe we should only end up setting the binding to weak if we don't emit any relocations for the symbol (i.e. isUsedInReloc() is false)?

lib/MC/ELFObjectWriter.cpp
987 ↗(On Diff #149005)

You can move this case under SHT_SYMTAB_SHNDX.

test/MC/ELF/cgprofile.s
9 ↗(On Diff #149005)

Using a symbol index of 0 seems like valid behaviour, but maybe it would be a little better to point to the section symbol instead?

Bigcheese marked an inline comment as done.May 29 2018, 6:04 PM
Bigcheese added inline comments.
docs/Extensions.rst
326 ↗(On Diff #149005)

That would require delaying processing of .cg_profile entries until the end, but given the above issue, that's probably worth it. Another option is to error if the symbol doesn't already exist.

test/MC/ELF/cgprofile.s
9 ↗(On Diff #149005)

Yeah, that would match the standard behavior of .L.

Bigcheese updated this revision to Diff 149378.May 31 2018, 4:02 PM

I've changed the behavior to better match the existing semantics.

  • Creating symbols is delayed until the end of the file.
  • Referencing an undefined temporary symbol is an error.
pcc accepted this revision.Jun 1 2018, 7:16 PM

LGTM

test/MC/ELF/cgprofile.s
1 ↗(On Diff #149378)

Can you change this test to test the -elf-cg-profile feature that you're adding to llvm-readobj?

This revision is now accepted and ready to land.Jun 1 2018, 7:16 PM
This revision was automatically updated to reflect the committed changes.