Page MenuHomePhabricator

[PGO] Profile guided code size optimization (continued).
Needs ReviewPublic

Authored by yamauchi on Sep 3 2019, 11:23 AM.

Details

Reviewers
davidxl
Summary

This follows up D59514.

Enable more of the existing size optimizations for colder code under PGO.

Up to ~11% code size savings and up to ~2.5% performance improvement (due to
reduced icache misses) for large internal apps.

This depends on D67377, D69409, D69580, D70095, D70510, D70988 and D71149.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
yamauchi updated this revision to Diff 224495.Oct 10 2019, 3:38 PM

Fix the summary.

yamauchi edited the summary of this revision. (Show Details)Oct 10 2019, 3:42 PM

SizeOpts and MachineSizeOpts changes can also be extracted into its own patch.

After this is done, the TTI change has one fewer dependency and we might figure out a way to isolate that part too..

llvm/lib/CodeGen/MachineSizeOpts.cpp
28 ↗(On Diff #224495)

I've looked at SizeOpts.h, SizeOpts.cpp, MachineSizeOpts.cpp files. I think we should refactorize the code using template to avoid duplicate logic.

This should be done similarly to BlockFrequencyInfoImpl

template <typename FuncT, typename BFIT> bool shouldOptimizeForSize(FuncT *F, ....) {

}

isColdBlock should probably refactorized in the similar way.

The code can be broken down and contributed in the following order:

  1. SizeOpts related change
  2. TargetLowering change (isSuitableForJumpTable)
  3. TargetTransformation related changes -- depending on 1) and 2)
  4. SwitchLoweringUtils (findJumpTable depends on isSuitableForJumpTable in 2)
  5. the rest of the changes (can be further broken down per-pass).
yamauchi marked an inline comment as done.Oct 16 2019, 2:15 PM

Split the SizeOpts part into https://reviews.llvm.org/D69070

With SizeOpts change split, is it ready to split TargetLowering changes out?

With SizeOpts change split, is it ready to split TargetLowering changes out?

We will see.

wenlei added a subscriber: wenlei.Oct 28 2019, 10:57 PM
yamauchi updated this revision to Diff 226921.Oct 29 2019, 9:48 AM

Rebased past D69409.

yamauchi updated this revision to Diff 226929.Oct 29 2019, 10:14 AM

Updated the summary.

yamauchi edited the summary of this revision. (Show Details)Oct 29 2019, 10:14 AM

I think the following allows us to split the TargetLowering/TTI/SwitchLoweringUtils part.

The TargetLowering (isSuitableForJumpTable) change pulls in BasicTTIImpl.h (getEstimatedNumberOfCaseClusters) and SwitchLoweringUtils.h/SwitchLoweringUtils.cpp (findJumpTables).
The getEstimatedNumberOfCaseClusters change pulls in TargetTransformInfo.h, TargetTransformInfo.cpp, TargetTransformInfoImpl.h, InlineCost.cpp.
The findJumpTables change pulls in IRTranslator.cpp and part of SelectionDAGBuilder.cpp.

yamauchi updated this revision to Diff 227489.Nov 1 2019, 11:04 AM

Rebased past D69580.

Split the TargetInstrInfo part into https://reviews.llvm.org/D69737.

yamauchi updated this revision to Diff 228159.Nov 6 2019, 3:33 PM

Updated after dropping the TargetInstrInfo part (D69737).

Decreased the number of files by quite a few.

PTAL.

yamauchi updated this revision to Diff 228257.Nov 7 2019, 8:53 AM

Dropped two test files that were for the dropped D69737.

May I suggest another split? DAG.shouldOptForSize change can be split out as a NFC.

yamauchi updated this revision to Diff 230688.Nov 22 2019, 10:49 AM

Rebased past D70095. PTAL.

yamauchi edited the summary of this revision. (Show Details)Nov 22 2019, 10:49 AM
yamauchi updated this revision to Diff 231981.Dec 3 2019, 1:48 PM

Rebased past D67120.

yamauchi updated this revision to Diff 232368.Dec 5 2019, 9:36 AM

Rebase past D70988.

Split the code gen pass instrumentation into D71072.

D71149 is a replacement for reverted D71072.

yamauchi added a comment.EditedDec 10 2019, 10:40 AM

Split into D71288 to enable in the code gen passes for cold code.