Page MenuHomePhabricator

[PGO] Profile guided code size optimization.
ClosedPublic

Authored by yamauchi on Mar 18 2019, 2:21 PM.

Details

Summary

Enable some of the existing size optimizations for cold code under PGO.

A ~5% code size saving in big internal app under PGO.

The way it gets BFI/PSI is discussed in the RFC thread

http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html

Note it doesn't currently touch loop passes.

Diff Detail

Repository
rL LLVM

Event Timeline

yamauchi created this revision.Mar 18 2019, 2:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2019, 2:21 PM

Missing unit tests for the feature.

lib/Analysis/ProfileSummaryInfo.cpp
337 ↗(On Diff #191176)

BB can still be 'not cold' even though the function entry is cold.

Or perhaps you should assert BB == nullptr and use

PSI->isFunctionColdInCallGraph() interface.

lib/Transforms/Scalar/LoopLoadElimination.cpp
538 ↗(On Diff #191176)

is BFI null check needed?

lib/Transforms/Scalar/LoopUnrollPass.cpp
206 ↗(On Diff #191176)

Do you need to check BFI here? It is already checked in shouldOptimizeForSize

yamauchi updated this revision to Diff 191591.Wed, Mar 20, 2:46 PM
yamauchi marked 3 inline comments as done.

Addressed comments. Tests added.

Look fine in general. Easwaran can help with review of pass pipeline changes

lib/Transforms/Scalar/LoopLoadElimination.cpp
536 ↗(On Diff #191591)

auto *HeaderBB = L->getHeader()->getParent();
bool OptForSize = HeaderBB->optForSize() || (PSI && BFI && PSI->shouldOptimizeForSize(HeaderBB));

yamauchi updated this revision to Diff 191773.Thu, Mar 21, 1:42 PM
yamauchi marked an inline comment as done.

Address comment.

eraman added inline comments.Thu, Mar 21, 4:38 PM
lib/Analysis/ProfileSummaryInfo.cpp
62 ↗(On Diff #191773)

Perhaps better to default to false, do robust performance testing and flip the option. You've mentioned code size reduction in one binary in the description, but good to validate there are no performance regressions in spec.

331 ↗(On Diff #191591)

This function looks very odd to me. Why not separate it into shouldOptimizeForSize(Function *, BFI) and shouldOptimizeForSize(BasicBlock*,...)? All but one uses of this function passes a BB in which case the function is irrrelevant.

lib/Transforms/Vectorize/LoopVectorize.cpp
7139 ↗(On Diff #191773)

Comment out of sync with the code

7223 ↗(On Diff #191773)

Augment the comment to make it in-sync with the code

yamauchi updated this revision to Diff 192382.Tue, Mar 26, 3:52 PM
yamauchi marked 5 inline comments as done and an inline comment as not done.

address comments.

yamauchi added inline comments.Tue, Mar 26, 3:54 PM
lib/Analysis/ProfileSummaryInfo.cpp
62 ↗(On Diff #191773)

Here's SPEC data.

400.perlbench
base: 512.81 ±0.564 (8 trials, 7.8 robust values)
test: 512.16 ±0.420 (8 trials, 7.1 robust values)
diff: -0.13% ±0.124%
Possible significant effect (p=0.047)

401.bzip2
base: 532.43 ±1.145 (8 trials, 7.9 robust values)
test: 532.81 ±0.962 (8 trials, 8.0 robust values)
diff: +0.07% ±0.254%
Not significant (p=0.567)

403.gcc
base: 313.08 ±1.356 (8 trials, 7.8 robust values)
test: 312.63 ±1.183 (8 trials, 7.5 robust values)
diff: -0.14% ±0.519%
Not significant (p=0.558)

429.mcf
base: 239.39 ±3.018 (8 trials, 8.0 robust values)
test: 239.41 ±2.067 (8 trials, 8.0 robust values)
diff: +0.01% ±1.386%
High variance between trials

445.gobmk
base: 560.88 ±0.613 (8 trials, 8.0 robust values)
test: 560.46 ±0.143 (7 trials, 6.0 robust values)
diff: -0.07% ±0.103%
Not significant (p=0.142)

456.hmmer
base: 602.40 ±0.140 (8 trials, 7.7 robust values)
test: 602.45 ±0.074 (8 trials, 7.5 robust values)
diff: +0.01% ±0.024%
Not significant (p=0.441)

458.sjeng
base: 607.02 ±0.957 (8 trials, 7.5 robust values)
test: 606.24 ±0.599 (8 trials, 7.5 robust values)
diff: -0.13% ±0.167%
Not significant (p=0.123)

462.libquantum
base: 291.97 ±4.636 (8 trials, 8.0 robust values)
test: 290.80 ±3.281 (8 trials, 7.6 robust values)
diff: -0.40% ±1.765%
High variance between trials

464.h264ref
base: 795.35 ±0.658 (8 trials, 7.9 robust values)
test: 795.05 ±0.345 (8 trials, 8.0 robust values)
diff: -0.04% ±0.085%
Not significant (p=0.354)

471.omnetpp
base: 274.20 ±5.967 (8 trials, 6.1 robust values)
test: 271.89 ±7.348 (8 trials, 6.6 robust values)
diff: -0.85% ±3.043%
High variance between trials

473.astar
base: 426.80 ±0.802 (8 trials, 7.8 robust values)
test: 425.75 ±0.526 (8 trials, 6.1 robust values)
diff: -0.25% ±0.202%
Possible significant effect (p=0.021)

483.xalancbmk
base: 250.64 ±1.038 (8 trials, 8.0 robust values)
test: 250.19 ±0.530 (8 trials, 6.9 robust values)
diff: -0.18% ±0.422%
Not significant (p=0.380)

lib/Transforms/Scalar/LoopLoadElimination.cpp
538 ↗(On Diff #191176)

It is now.

lib/Transforms/Scalar/LoopUnrollPass.cpp
206 ↗(On Diff #191176)

It is now.

Easwaran and David, more comments?

eraman added inline comments.Tue, Apr 2, 2:39 PM
lib/Analysis/ProfileSummaryInfo.cpp
335 ↗(On Diff #192382)

This check is not needed as isFunctionColdInCallGraph first checks if summary is available.

I am not convinced this shouldOptimizeForSize belongs to ProfileSummaryInfo but I don't have any good suggestions. One possibility is to keep the flag in PassBuilder and pass it to individual passes which then check the boolean and isFunctionColdInCallGraph. Or perhaps add a pass that annotates OptForSize attribute on functions based on profile data? (that will still not help with the BB version of this function). Thoughts?

yamauchi marked an inline comment as done.Wed, Apr 3, 10:16 AM
yamauchi added inline comments.
lib/Analysis/ProfileSummaryInfo.cpp
335 ↗(On Diff #192382)

This check is not needed as isFunctionColdInCallGraph first checks if summary is available.

True. I'd prefer having it here to make it clearer that we only do this with PGO. But it can go either way here.

I am not convinced this shouldOptimizeForSize belongs to ProfileSummaryInfo but I don't have any good suggestions. One possibility is to keep the flag in PassBuilder and pass it to individual passes which then check the boolean and isFunctionColdInCallGraph. Or perhaps add a pass that annotates OptForSize attribute on functions based on profile data? (that will still not help with the BB version of this function). Thoughts?

I agree - this may not belong to ProfileSummaryInfo. I think I put it there because it's tied to the existence of PSI and might potentially need access to some PSI-internal data or have opportunities for code sharing.

How about having a separate file like ProfileGuidedSizeOpt.h/cpp which hosts the flag and shouldOptimizeForSize, and more? I think that would have an advantage of being able to share the common code in a single place (calling isFunctionColdInCallGraph, etc.) and being less intrusive (not needing to change many pass constructors, etc.), and being able to handle the block-level profiles (unlike the OptForSize attribute).

I'm not sure which directory it could be in, as that would have an implication on the cross-directory dependencies (eg. if we want to use PSI/BB/MBB/BFI/MBFI as a parameter type in it, it must be somewhere above IR/Analysis/CodeGen yet accessible from Transform/CodeGen.). Is there some directory where this sort of files can be in?

...it must be somewhere above IR/Analysis/CodeGen yet accessible from Transform/CodeGen....

Having scanned the LLVMBuild.txt files for the dependencies, I think one workable way would be a split between two parts, something like:

1 lib/Transforms/Utils/SizeOpts - the flag and the 'shouldOptimizeFor' for the IR passes.
2 lib/CodeGen/MachineSizeOpts - the 'shouldOptimizeFor' for the CodeGen/Machine IR passes.

Note 2 depends on 1. 1 is sufficient for this patch.

Thoughts?

yamauchi updated this revision to Diff 193941.Fri, Apr 5, 12:33 PM

Moved the code out of ProfileSummaryInfo to separate files. PTAL.

eraman added inline comments.Wed, Apr 10, 12:33 PM
lib/Transforms/Utils/SizeOpts.cpp
23 ↗(On Diff #193941)

In a follow-up patch, it might be good to make this work for the non-profile case and call this whereever F->optForSize() is used.

27 ↗(On Diff #193941)

Instead of the asserts, return false if any one of them is null. This will avoid doing if (PSI && BFI && shouldOptimizeForSize(...)) in the callers.

test/Transforms/ConstantHoisting/ARM/const-addr-no-neg-offset.ll
49 ↗(On Diff #193941)

Perhaps add a CHECK: [[CONST2:%const_mat[0-9]*]] = add i32 %const, -4 to check the size optimization doesn't happen when -pgso is not used?

yamauchi marked 4 inline comments as done.Thu, Apr 11, 10:37 AM
yamauchi added inline comments.
lib/Transforms/Utils/SizeOpts.cpp
23 ↗(On Diff #193941)

Yes, it might be good.

yamauchi updated this revision to Diff 194715.Thu, Apr 11, 10:37 AM
yamauchi marked an inline comment as done.

Addressed comments.

eraman accepted this revision.Thu, Apr 11, 10:55 AM

LGTM after making the hasProfileSummary check consistent in both flavors of shouldOptimizeForSize

lib/Transforms/Utils/SizeOpts.cpp
34 ↗(On Diff #194715)

The previous flavor of this function doesn't check !PSI->hasProfileSummary(). In both the versions, it is not necessary as isColdBlock and isFunctionColdInCallGraph return false if there is no summary. But if you choose to add the check, then do it in both the flavors of this function.

test/Transforms/ConstantHoisting/ARM/const-addr-no-neg-offset.ll
3 ↗(On Diff #194715)

You don't need a separate run with explicit -pgso=false. You can use the first run and use CHECK and CHECK-LABEL where you use NPGSO and NPGO-LABEL . That's what I had in mind, but if you prefer this, it's fine.

This revision is now accepted and ready to land.Thu, Apr 11, 10:55 AM
yamauchi marked 3 inline comments as done.Thu, Apr 11, 3:00 PM
yamauchi added inline comments.
test/Transforms/ConstantHoisting/ARM/const-addr-no-neg-offset.ll
3 ↗(On Diff #194715)

Sure. I think separate runs are probably better in that it doesn't matter what the default value of -pgso is.

yamauchi updated this revision to Diff 194772.Thu, Apr 11, 3:01 PM
yamauchi marked an inline comment as done.

Addressed comments.

This revision was automatically updated to reflect the committed changes.