This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Control Height Reduction
ClosedPublic

Authored by hjyamauchi on Aug 10 2018, 4:51 PM.

Details

Summary

Control height reduction merges conditional blocks of code and reduces the
number of conditional branches in the hot path based on profiles.

if (hot_cond1) { // Likely true.

do_stg_hot1();

}
if (hot_cond2) { // Likely true.

do_stg_hot2();

}

->

if (hot_cond1 && hot_cond2) { // Hot path.

do_stg_hot1();
do_stg_hot2();

} else { // Cold path.

if (hot_cond1) {
  do_stg_hot1();
}
if (hot_cond2) {
  do_stg_hot2();
}

}

This speeds up some internal benchmarks up to ~30%.

Diff Detail

Repository
rL LLVM

Event Timeline

hjyamauchi created this revision.Aug 10 2018, 4:51 PM

The request for test case description also applies to other new test cases.

test/Transforms/PGOProfile/chr.ll
8 ↗(On Diff #160210)

Also add a simple test case to make sure cold functions are skipped.

14 ↗(On Diff #160210)

Can you add check of the meta data itself (for testing of profile meta data merging)?

60 ↗(On Diff #160210)

Can you add more detailed description of test case using C?

121 ↗(On Diff #160210)

Similarly add test case description.

davidxl added inline comments.Aug 14 2018, 12:24 PM
lib/Passes/PassBuilder.cpp
198 ↗(On Diff #160210)

what is npm?

199 ↗(On Diff #160210)

Enable CHR --> Enable control height reduction optimization (CHR)

494 ↗(On Diff #160210)

How about SamplePGO ?

lib/Transforms/Instrumentation/ControlHeightReduction.cpp
388 ↗(On Diff #160210)

CHRModules should be checked before function name demangling and check.

393 ↗(On Diff #160210)

For stress testing purpose, perhaps adding a force-chr option so that it is applied on cold functions as well?

440 ↗(On Diff #160210)

need documentation comment for the function.

534 ↗(On Diff #160210)

Make it an internal option.

537 ↗(On Diff #160210)

the comment does not match the implementation: it returns true as long as MD is well formed.

595 ↗(On Diff #160210)

Can some refactoriing be done to share some of the implementation with CheckBiasedBranch?

657 ↗(On Diff #160210)

This method looks pretty big. Can it be broken up?

davidxl added inline comments.Aug 14 2018, 12:24 PM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
1378 ↗(On Diff #160210)

this method is also gigantic. Suggest breaking it up (the debug check code can also be put into separate helpers).

hjyamauchi marked 4 inline comments as done.

Updated the test file.

hjyamauchi added inline comments.Aug 15 2018, 2:04 PM
test/Transforms/PGOProfile/chr.ll
14 ↗(On Diff #160210)

There's already a check for !15 (at the end of this line). I added a few more.

davidxl added inline comments.Aug 16 2018, 10:25 AM
test/Transforms/PGOProfile/chr.ll
94 ↗(On Diff #160916)

should be == 0

268 ↗(On Diff #160916)

If this can be hoisted above, will the optimization combine (t1 & 3) | (t2 &12) ?

532 ↗(On Diff #160916)

&& should be bitwise &

634 ↗(On Diff #160916)

&& -> &

1084 ↗(On Diff #160916)

Not sure if I understand the comments about bad hoisting here. You want to test the bad hoisting does not happen? What does the bad hoisting look like?

1242 ↗(On Diff #160916)

likely true?

1243 ↗(On Diff #160916)

likely true ?

1254 ↗(On Diff #160916)

is z != 1 redundant?

1256 ↗(On Diff #160916)

why was it inserted?

1439 ↗(On Diff #160916)

should it be:

; t0 = *i
; v40 = t0 + 44
; if ((t0 & 3) == 3) Likely true
; foo()
; v41 = t0 + 99
; foo()
; } else {
; if ((t0 & 1) != 0)
Likely true
; foo()
; if ((t0 & 2) != 0) // Likely true
; v41_nc = t0 + 99
; foo()
; }
; }
; v42 = phi (v41, v41_nc, v40)
; return v42 + v40

14 ↗(On Diff #160210)

that is not what I meant. I think what is missing is the prof meta data's content. For instance

CHECK-NEXT: br i1 ....... !prof ![[COMBINED_PROF:%.*]]

CHECK:![[COMBINED_PROF] = ...

hjyamauchi marked 10 inline comments as done.

Updated the .cpp file.

hjyamauchi added inline comments.Aug 17 2018, 12:54 PM
lib/Passes/PassBuilder.cpp
198 ↗(On Diff #160210)

New pass manager, following the options above.

hjyamauchi marked 9 inline comments as done.

Addressed more comments.

hjyamauchi added inline comments.Aug 17 2018, 4:17 PM
test/Transforms/PGOProfile/chr.ll
268 ↗(On Diff #160916)

Currently no if t1 and t2 are separate loads., and yes if t1/t2 are merged into one load.

1084 ↗(On Diff #160916)

This is a regression test against a specific bug that existed regarding moving %conv. I think the bad hoisting looked like:

entry:

...
%div = fdiv double 1.000000e+00, %conv
%conv = sitofp i32 %0 to double
%mul16 = fmul double %div, %conv

It moved %conv to the end of %entry once for %div (from %bb1) and once more for %mul16 (within %entry).

Updated the comment.

1242 ↗(On Diff #160916)

This should be false because the branch weight is 0:1.

1243 ↗(On Diff #160916)

Same.

1254 ↗(On Diff #160916)

yes, I think instcombine doesn't seem to fold due to limited reassociation. This doesn't matter for the point of the test, I think.

1256 ↗(On Diff #160916)

As part of the CHR transformation, a trivial phi is inserted at the exit of a region when CHR is applied so that we can easily know what variables/values are live at the exit and need to have the incoming values corresponding to the cloned cold path added to those phis. IOW, when we split/clone a path/region into two, we need a new phi for anything that's alive at the exit. To make this easier, we first insert trivial phis at the exit for anything alive there and later add operands corresponding to the second path to those phis. This is also a regression test for an old bug.

1439 ↗(On Diff #160916)

That's valid optimization but, I don't think CHR/instcombine don't specifically try hoisting v40.

14 ↗(On Diff #160210)

Had understood that. I should have said "at the end of the *file*". There are checks on the prof metadata content there.

davidxl added inline comments.Aug 20 2018, 10:12 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
739 ↗(On Diff #161357)

Add an comment describing the source pattern for this case.

739 ↗(On Diff #161357)

perhaps do

if (loopy)

return nullptr;
745 ↗(On Diff #161357)

Reads better if it is an early return 'return nullptr'

754 ↗(On Diff #161357)

Should it return nullptr if HasBranch == false?

764 ↗(On Diff #161357)

describe the source pattern here.

783 ↗(On Diff #161357)

The code can be simplified:

auto AddSeelctsRegion = [&](RegionInfo &RegI) {

for (auto *SI : Selects) {
    if (CheckBiasedSelect(SI, R,
                          TrueBiasedSelectsGlobal,
                          FalseBiasedSelectsGlobal,
                          SelectBiasMap))
      RI.Selects.push_back(SI);
  }

};

if (!Result) {

 RegInfo RI(R);
 AddSelectsRegion(&RI);
Result = new CHRScope(RI);
Scopes.insert(Result);

} else {

AddSelectsRegion(Result->RegionInfos[0].R);

}

814 ↗(On Diff #161357)

It is unclear from the comments what the method is check. Perhaps using a little C example to show what it checks.

830 ↗(On Diff #161357)

can depend on or 'can't' depend on. It is unclear what the comment means. Perhaps explain more?

845 ↗(On Diff #161357)

this code is confusing:

if (not hoistable) {

unhoistable.erase(...);

}

852 ↗(On Diff #161357)

Line 827 does the same. Should it be moved above line 826?

hjyamauchi marked 10 inline comments as done.

Addressed comments.

hjyamauchi added inline comments.Aug 22 2018, 8:05 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
745 ↗(On Diff #161357)

Can't do an early return here because we will fall through to look for selects below.

754 ↗(On Diff #161357)

Likewise, we don't want to return yet because we look for selects below and because we try to curb out a larger scope (keep scopes connected) even if the branch is unbiased in the middle to increase the chance/scope of the branch merging (eg. merging three branches h1, u, h2 in a row into one where h1 and h2 are biased and u isn't).

830 ↗(On Diff #161357)

It means to say a branch instruction doesn't produce a value (in terms of SSA) so no other instruction can't have a use-def edge (data dependence) to a branch instruction. Updated comment.

845 ↗(On Diff #161357)

Since we initialized Unhoistables with the selects above and we are dropping this select, we also remove it from Unhoistables. Added a comment.

852 ↗(On Diff #161357)

This is not redundant. Since we may remove a select above, the insert point may change here. Added a comment.

davidxl added inline comments.Aug 23 2018, 11:42 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
752 ↗(On Diff #161961)

is it possible to have S0 == S1?

828 ↗(On Diff #161961)

In this example, if the selects are after if (c3) {}, then the insertion point won't be before the first select, or is it handled in other way?

930 ↗(On Diff #161961)

Add a description that this methods find all nested scopes and merge them if possible.

933 ↗(On Diff #161961)

Why can' this scope be merged with subsequent ones?

935 ↗(On Diff #161961)

This early check does not seem necessary -- it does not save much compile time, but makes the code less clean.

hjyamauchi marked 5 inline comments as done.

Addressed comments.

hjyamauchi added inline comments.Aug 24 2018, 10:16 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
752 ↗(On Diff #161961)

I think it's unlikely but possible because we could have a branch instruction that has the same label for the true and the false targets.

828 ↗(On Diff #161961)

The insert point would be the branch or the first select *in the first block" (if any). This particular sentence is specifically about this example. If the selects are instead after if (c3) {}, there's no select in the first block and the insert point would be right before the branch.

933 ↗(On Diff #161961)

This will be merged with subsequent ones, if possible, after it's returned to the recursive call site in line 943. At the top level of the recursion, there's the sentinel top-level region at the root of the region tree, which doesn't have a subsequent region. So no merging opportunities missed.

davidxl added inline comments.Aug 27 2018, 9:59 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
362 ↗(On Diff #162410)

maybe rename it to 'classifyBiasedScopes' -- which means classify them into true or false biased regions?

588 ↗(On Diff #162410)

Through out the patch, 'double' is used for branch probability type. Please use existing fixed point representation BranchProbability type in Support/

1727 ↗(On Diff #162410)

Is is possible for this be 'folded' into unconditional branch given the true predicate?

1730 ↗(On Diff #162410)

Is it better to use IRBuilder interface here?

Added one more metric to the stats.

hjyamauchi marked 4 inline comments as done.

Addressed comments.

hjyamauchi added inline comments.Aug 27 2018, 4:05 PM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
1727 ↗(On Diff #162410)

The true predicate is a placeholder. It will be replaced later in fixupBranchesAndSelects(). Added a comment.

1730 ↗(On Diff #162410)

I'll keep it here as is to be symmetric with the removal of the old branch instruction above. But I'll use IRBuilder in the other places (addToMergedCondition()).

davidxl added inline comments.Aug 29 2018, 8:39 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
1808 ↗(On Diff #162768)

This might be too conservative that the merged branch becomes less biased as it can be. It is very likely that the contributing branches are correlated. How about taking the 'min' BranchProb as the resulting BP?

hjyamauchi marked an inline comment as done.

Updated.

lib/Transforms/Instrumentation/ControlHeightReduction.cpp
1808 ↗(On Diff #162768)

Done.

davidxl accepted this revision.Aug 30 2018, 11:00 AM

lgtm.

lib/Passes/PassBuilder.cpp
494 ↗(On Diff #160210)

In thinLTO mode, you may want to only do this in the backend compilation, otherwise, there is a risk that that fallback path gets CHRed again which is not desirable.

This revision is now accepted and ready to land.Aug 30 2018, 11:00 AM
hjyamauchi marked an inline comment as done.Sep 4 2018, 10:01 AM
hjyamauchi added inline comments.
lib/Passes/PassBuilder.cpp
494 ↗(On Diff #160210)

Will look into this later.

hjyamauchi marked an inline comment as done.

Rebased.

This revision was automatically updated to reflect the committed changes.
rtrieu added a subscriber: rtrieu.Sep 4 2018, 9:32 PM
rtrieu added inline comments.
llvm/trunk/lib/Transforms/Instrumentation/ControlHeightReduction.cpp
610–616

This calculation is in danger of overflowing when the weights are large. I made a fix in r341444 to calculate the sum weight in 64 bits to prevent the overflow.

hjyamauchi added inline comments.Sep 5 2018, 8:42 AM
llvm/trunk/lib/Transforms/Instrumentation/ControlHeightReduction.cpp
610–616

Thanks for the fix.