This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Profile guided code size optimization (continued).
Needs ReviewPublic

Authored by hjyamauchi on Sep 3 2019, 11:23 AM.

Details

Reviewers
davidxl
Summary

This follows up D59514.

Enable more of the existing size optimizations for colder code under PGO.

Up to ~11% code size savings and up to ~2.5% performance improvement (due to
reduced icache misses) for large internal apps.

This depends on D67377, D69409, D69580, D70095, D70510, D70988 and D71149.

Event Timeline

hjyamauchi created this revision.Sep 3 2019, 11:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 3 2019, 11:23 AM
Herald added subscribers: jsji, tpr, MaskRay and 15 others. · View Herald Transcript

Can you split the patches into smaller ones -- at least 3 : ProfileSummary API changes, TTI API changes, and the rest (may need to be split too, but we will see).

Split the PSI changes into https://reviews.llvm.org/D67377.

It looks like the TTI part can't be separated due to dependencies and would pull in the whole thing:

The TTI files (TargetTransformInfo.h, TargetTransformInfoImpl.h, BasicTTIImpl.h, TargetTransformInfo.cpp) would pull in TargetLowering.h for the isSuitableForJumpTable change, which then would pull in SizeOpts.h/cpp for the llvm::shouldOptimizeForSize change. In addition, TargetLowering.h would also pull in SwitchLoweringUtils.cpp for the isSuitableForJumpTable change, which would pull in isSuitableForJumpTable.h for findJumpTables change, which would pull in SelectionDAGBuilder.cpp and IRTranslator.cpp for the findJumpTables changes,...etc.

Did you think of a different way to split the TTI part?

Update to sync with D67377.

This patch should add test cases for each affected codegen passes.

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
118 ↗(On Diff #221824)

Mark as unused args?

hjyamauchi marked an inline comment as done.

Added tests.

hjyamauchi added inline comments.Oct 10 2019, 2:31 PM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
118 ↗(On Diff #221824)

Done (note according to the comment for LLVM_ATTRIBUTE_UNUSED, a cast-to-void is preferred for unused variables.)

Fix the summary.

hjyamauchi edited the summary of this revision. (Show Details)Oct 10 2019, 3:42 PM

SizeOpts and MachineSizeOpts changes can also be extracted into its own patch.

After this is done, the TTI change has one fewer dependency and we might figure out a way to isolate that part too..

llvm/lib/CodeGen/MachineSizeOpts.cpp
28 ↗(On Diff #224495)

I've looked at SizeOpts.h, SizeOpts.cpp, MachineSizeOpts.cpp files. I think we should refactorize the code using template to avoid duplicate logic.

This should be done similarly to BlockFrequencyInfoImpl

template <typename FuncT, typename BFIT> bool shouldOptimizeForSize(FuncT *F, ....) {

}

isColdBlock should probably refactorized in the similar way.

The code can be broken down and contributed in the following order:

  1. SizeOpts related change
  2. TargetLowering change (isSuitableForJumpTable)
  3. TargetTransformation related changes -- depending on 1) and 2)
  4. SwitchLoweringUtils (findJumpTable depends on isSuitableForJumpTable in 2)
  5. the rest of the changes (can be further broken down per-pass).
hjyamauchi marked an inline comment as done.Oct 16 2019, 2:15 PM

Split the SizeOpts part into https://reviews.llvm.org/D69070

With SizeOpts change split, is it ready to split TargetLowering changes out?

With SizeOpts change split, is it ready to split TargetLowering changes out?

We will see.

wenlei added a subscriber: wenlei.Oct 28 2019, 10:57 PM

Updated the summary.

hjyamauchi edited the summary of this revision. (Show Details)Oct 29 2019, 10:14 AM

I think the following allows us to split the TargetLowering/TTI/SwitchLoweringUtils part.

The TargetLowering (isSuitableForJumpTable) change pulls in BasicTTIImpl.h (getEstimatedNumberOfCaseClusters) and SwitchLoweringUtils.h/SwitchLoweringUtils.cpp (findJumpTables).
The getEstimatedNumberOfCaseClusters change pulls in TargetTransformInfo.h, TargetTransformInfo.cpp, TargetTransformInfoImpl.h, InlineCost.cpp.
The findJumpTables change pulls in IRTranslator.cpp and part of SelectionDAGBuilder.cpp.

Split the TargetInstrInfo part into https://reviews.llvm.org/D69737.

hjyamauchi updated this revision to Diff 228159.Nov 6 2019, 3:33 PM

Updated after dropping the TargetInstrInfo part (D69737).

Decreased the number of files by quite a few.

PTAL.

hjyamauchi updated this revision to Diff 228257.Nov 7 2019, 8:53 AM

Dropped two test files that were for the dropped D69737.

May I suggest another split? DAG.shouldOptForSize change can be split out as a NFC.

Rebased past D70095. PTAL.

hjyamauchi updated this revision to Diff 231981.Dec 3 2019, 1:48 PM

Rebased past D67120.

Split the code gen pass instrumentation into D71072.

D71149 is a replacement for reverted D71072.

hjyamauchi added a comment.EditedDec 10 2019, 10:40 AM

Split into D71288 to enable in the code gen passes for cold code.