This is an archive of the discontinued LLVM Phabricator instance.

[LLVM][LTO][LLD] Enable Profile Guided Layout (--call-graph-profile-sort) for FullLTO
ClosedPublic

Authored by bd1976llvm on Jun 23 2022, 9:39 AM.

Details

Summary

The CGProfilePass needs to be run during FullLTO compilation at link time to emit the .llvm.call-graph-profile section to the compiled LTO object file. Currently, it is being run only during the initial LTO-prelink compilation stage (to produce the bitcode files to be consumed by the linker) and so the section is not produced.

ThinLTO is not affected because:

  • For ThinLTO-prelink compilation the CGProfilePass pass is not run because ThinLTO-prelink passes are added via buildThinLTOPreLinkDefaultPipeline. Normal and FullLTO-prelink passes are both added via buildPerModuleDefaultPipeline which uses the LTOPreLink parameter to customise its behaviour for the FullLTO-prelink pass differences.
  • ThinLTO backend compilation phase adds the CGProfilePass (see: buildModuleOptimizationPipeline).

Adjust when the pass is run so that the .llvm.call-graph-profile section is produced correctly for FullLTO.

Fixes #56185

Diff Detail

Event Timeline

bd1976llvm created this revision.Jun 23 2022, 9:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 9:39 AM
bd1976llvm requested review of this revision.Jun 23 2022, 9:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 9:39 AM
nikic resigned from this revision.Jun 24 2022, 12:35 AM

(Not familiar with CGProfilePass and its requirements)

bd1976llvm added a subscriber: nikic.EditedJun 28 2022, 6:56 PM

(Not familiar with CGProfilePass and its requirements)

This change should be relatively simple to understand with a bit more background: For Profile Guided Layout (PGL) the compiler must emit the .llvm.call-graph-profile (SHT_LLVM_CALL_GRAPH_PROFILE) section to the ELF object file that LLD consumes. To do this the CGProfilePass attaches additional metadata recording the weighted call-graph, which is later lowered to the .llvm.call-graph-profile ELF section contents. This currently works for normal compilation and ThinLTO; however, for FullLTO the pass is currently run in the pre-link stage so the weighted callgraph metadata is present in the bitcode input to LLD. Since the metadata is attached too early it does not survive LTO compilation to lowering and the .llvm.call-graph-profile section is not present in the LTO object file. In this change we make the CGProfilePass run at the right point during the LTO object file compilation so that the LTO object file contains the .llvm.call-graph-profile section which LLD can consume to drive PGL.

More background in this talk: https://youtu.be/F-lbgspxv1c

LTO object file = ELF object file created by compiling the bitcode inputs at link time.

Added some reviewer from D112160 where PGL has recently been made to work for macho LLD. It seems likely (to me!) that macho is also affected by this issue.

MaskRay accepted this revision.EditedJun 29 2022, 2:22 PM

Please expand uncommon abbreviations: PGL, FLTO, and TLTO.

TLTO is not affected as the passes are added (correctly) via a different path

Be more precise about why ThinLTO is unaffected: buildModuleOptimizationPipeline called by it runs CGProfilePass.
Also mention the LTOPreLink change for ThinLTOPreLink.

llvm/lib/Passes/PassBuilderPipelines.cpp
1273–1274

Perhaps move after GlobalDCEPass/ConstantMergePass similar to buildLTODefaultPipeline.

GlobalDCEPass may discard some functions and these functions don't need to run CGProfilePass. Though the speed-up is almost assuredly negligible.

This revision is now accepted and ready to land.Jun 29 2022, 2:22 PM
MaskRay added inline comments.Jun 29 2022, 2:29 PM
llvm/test/Other/new-pm-defaults.ll
12

CHECK-CGPP isn't clear. Clarify it. Possibly add a comment explaining what it does.

bd1976llvm retitled this revision from [PGL][LTO] Make PGL work with FLTO (PR #56185) to [LLVM][LTO][LLD] Enable Profile Guided Layout (--call-graph-profile-sort) for FullLTO (PR #56185).Jun 30 2022, 6:41 AM
bd1976llvm edited the summary of this revision. (Show Details)
bd1976llvm edited the summary of this revision. (Show Details)Jun 30 2022, 10:56 AM
bd1976llvm updated this revision to Diff 441463.EditedJun 30 2022, 11:16 AM

Addressed review comments.

Also the description and earlier comments have now been expanded and unappreciated acronyms have been removed.

MaskRay accepted this revision.EditedJun 30 2022, 12:04 PM

Thanks for the updated summary.

https://github.com/llvm/llvm-project/issues/56185

You can use Fix(es)? https://github.com/llvm/llvm-project/issues/56185 (or #56185) so that the pushed commit will close the issue automatically.

Since the body mentions the issue, you can remove (PR #56185) from the subject.

ThinLTO compilation at link time adds the CGProfilePass (see: buildModuleOptimizationPipeline).

at link time is incorrect for distributed ThinLTO which happens at clang -fthinlto-index=... time. You may just say "ThinLTO backend compilation phase"

bd1976llvm retitled this revision from [LLVM][LTO][LLD] Enable Profile Guided Layout (--call-graph-profile-sort) for FullLTO (PR #56185) to [LLVM][LTO][LLD] Enable Profile Guided Layout (--call-graph-profile-sort) for FullLTO.Jun 30 2022, 2:58 PM
bd1976llvm edited the summary of this revision. (Show Details)
bd1976llvm marked 2 inline comments as done.
bd1976llvm added inline comments.
llvm/lib/Passes/PassBuilderPipelines.cpp
1273–1274

Seems like a reasonable improvement. It seems to work with the programs I tried it on locally. Thanks!

llvm/test/Other/new-pm-defaults.ll
12

I noticed that I can just use CHECK-DEFAULT (as is used for the similar "RelLookupTableConverterPass" pass) :)

bd1976llvm marked 2 inline comments as done.Jun 30 2022, 3:00 PM

at link time is incorrect for distributed ThinLTO which happens at clang -fthinlto-index=... time. You may just say "ThinLTO backend compilation phase"

Thanks for spotting that. I have adopted your suggestion.