The CGProfilePass needs to be run during FullLTO compilation at link time to emit the .llvm.call-graph-profile section to the compiled LTO object file. Currently, it is being run only during the initial LTO-prelink compilation stage (to produce the bitcode files to be consumed by the linker) and so the section is not produced.
ThinLTO is not affected because:
- For ThinLTO-prelink compilation the CGProfilePass pass is not run because ThinLTO-prelink passes are added via buildThinLTOPreLinkDefaultPipeline. Normal and FullLTO-prelink passes are both added via buildPerModuleDefaultPipeline which uses the LTOPreLink parameter to customise its behaviour for the FullLTO-prelink pass differences.
- ThinLTO backend compilation phase adds the CGProfilePass (see: buildModuleOptimizationPipeline).
Adjust when the pass is run so that the .llvm.call-graph-profile section is produced correctly for FullLTO.
Fixes #56185
Perhaps move after GlobalDCEPass/ConstantMergePass similar to buildLTODefaultPipeline.
GlobalDCEPass may discard some functions and these functions don't need to run CGProfilePass. Though the speed-up is almost assuredly negligible.