This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Control Height Reduction
ClosedPublic

Authored by hjyamauchi on Aug 10 2018, 4:51 PM.

Details

Summary

Control height reduction merges conditional blocks of code and reduces the
number of conditional branches in the hot path based on profiles.

if (hot_cond1) { // Likely true.

do_stg_hot1();

}
if (hot_cond2) { // Likely true.

do_stg_hot2();

}

->

if (hot_cond1 && hot_cond2) { // Hot path.

do_stg_hot1();
do_stg_hot2();

} else { // Cold path.

if (hot_cond1) {
  do_stg_hot1();
}
if (hot_cond2) {
  do_stg_hot2();
}

}

This speeds up some internal benchmarks up to ~30%.

Diff Detail

Event Timeline

hjyamauchi created this revision.Aug 10 2018, 4:51 PM

The request for test case description also applies to other new test cases.

test/Transforms/PGOProfile/chr.ll
9

Also add a simple test case to make sure cold functions are skipped.

15

Can you add check of the meta data itself (for testing of profile meta data merging)?

61

Can you add more detailed description of test case using C?

122

Similarly add test case description.

davidxl added inline comments.Aug 14 2018, 12:24 PM
lib/Passes/PassBuilder.cpp
198

what is npm?

199

Enable CHR --> Enable control height reduction optimization (CHR)

494

How about SamplePGO ?

lib/Transforms/Instrumentation/ControlHeightReduction.cpp
389

CHRModules should be checked before function name demangling and check.

394

For stress testing purpose, perhaps adding a force-chr option so that it is applied on cold functions as well?

441

need documentation comment for the function.

535

Make it an internal option.

538

the comment does not match the implementation: it returns true as long as MD is well formed.

596

Can some refactoriing be done to share some of the implementation with CheckBiasedBranch?

658

This method looks pretty big. Can it be broken up?

davidxl added inline comments.Aug 14 2018, 12:24 PM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
1379

this method is also gigantic. Suggest breaking it up (the debug check code can also be put into separate helpers).

hjyamauchi marked 4 inline comments as done.

Updated the test file.

hjyamauchi added inline comments.Aug 15 2018, 2:04 PM
test/Transforms/PGOProfile/chr.ll
15

There's already a check for !15 (at the end of this line). I added a few more.

davidxl added inline comments.Aug 16 2018, 10:25 AM
test/Transforms/PGOProfile/chr.ll
15

that is not what I meant. I think what is missing is the prof meta data's content. For instance

CHECK-NEXT: br i1 ....... !prof ![[COMBINED_PROF:%.*]]

CHECK:![[COMBINED_PROF] = ...

95

should be == 0

269

If this can be hoisted above, will the optimization combine (t1 & 3) | (t2 &12) ?

533

&& should be bitwise &

635

&& -> &

1085

Not sure if I understand the comments about bad hoisting here. You want to test the bad hoisting does not happen? What does the bad hoisting look like?

1243

likely true?

1244

likely true ?

1255

is z != 1 redundant?

1257

why was it inserted?

1440

should it be:

; t0 = *i
; v40 = t0 + 44
; if ((t0 & 3) == 3) Likely true
; foo()
; v41 = t0 + 99
; foo()
; } else {
; if ((t0 & 1) != 0)
Likely true
; foo()
; if ((t0 & 2) != 0) // Likely true
; v41_nc = t0 + 99
; foo()
; }
; }
; v42 = phi (v41, v41_nc, v40)
; return v42 + v40

hjyamauchi marked 10 inline comments as done.

Updated the .cpp file.

hjyamauchi added inline comments.Aug 17 2018, 12:54 PM
lib/Passes/PassBuilder.cpp
198

New pass manager, following the options above.

hjyamauchi marked 9 inline comments as done.

Addressed more comments.

hjyamauchi added inline comments.Aug 17 2018, 4:17 PM
test/Transforms/PGOProfile/chr.ll
15

Had understood that. I should have said "at the end of the *file*". There are checks on the prof metadata content there.

269

Currently no if t1 and t2 are separate loads., and yes if t1/t2 are merged into one load.

1085

This is a regression test against a specific bug that existed regarding moving %conv. I think the bad hoisting looked like:

entry:

...
%div = fdiv double 1.000000e+00, %conv
%conv = sitofp i32 %0 to double
%mul16 = fmul double %div, %conv

It moved %conv to the end of %entry once for %div (from %bb1) and once more for %mul16 (within %entry).

Updated the comment.

1243

This should be false because the branch weight is 0:1.

1244

Same.

1255

yes, I think instcombine doesn't seem to fold due to limited reassociation. This doesn't matter for the point of the test, I think.

1257

As part of the CHR transformation, a trivial phi is inserted at the exit of a region when CHR is applied so that we can easily know what variables/values are live at the exit and need to have the incoming values corresponding to the cloned cold path added to those phis. IOW, when we split/clone a path/region into two, we need a new phi for anything that's alive at the exit. To make this easier, we first insert trivial phis at the exit for anything alive there and later add operands corresponding to the second path to those phis. This is also a regression test for an old bug.

1440

That's valid optimization but, I don't think CHR/instcombine don't specifically try hoisting v40.

davidxl added inline comments.Aug 20 2018, 10:12 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
740

Add an comment describing the source pattern for this case.

740

perhaps do

if (loopy)

return nullptr;
746

Reads better if it is an early return 'return nullptr'

755

Should it return nullptr if HasBranch == false?

765

describe the source pattern here.

784

The code can be simplified:

auto AddSeelctsRegion = [&](RegionInfo &RegI) {

for (auto *SI : Selects) {
    if (CheckBiasedSelect(SI, R,
                          TrueBiasedSelectsGlobal,
                          FalseBiasedSelectsGlobal,
                          SelectBiasMap))
      RI.Selects.push_back(SI);
  }

};

if (!Result) {

 RegInfo RI(R);
 AddSelectsRegion(&RI);
Result = new CHRScope(RI);
Scopes.insert(Result);

} else {

AddSelectsRegion(Result->RegionInfos[0].R);

}

815

It is unclear from the comments what the method is check. Perhaps using a little C example to show what it checks.

831

can depend on or 'can't' depend on. It is unclear what the comment means. Perhaps explain more?

846

this code is confusing:

if (not hoistable) {

unhoistable.erase(...);

}

853

Line 827 does the same. Should it be moved above line 826?

hjyamauchi marked 10 inline comments as done.

Addressed comments.

hjyamauchi added inline comments.Aug 22 2018, 8:05 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
746

Can't do an early return here because we will fall through to look for selects below.

755

Likewise, we don't want to return yet because we look for selects below and because we try to curb out a larger scope (keep scopes connected) even if the branch is unbiased in the middle to increase the chance/scope of the branch merging (eg. merging three branches h1, u, h2 in a row into one where h1 and h2 are biased and u isn't).

831

It means to say a branch instruction doesn't produce a value (in terms of SSA) so no other instruction can't have a use-def edge (data dependence) to a branch instruction. Updated comment.

846

Since we initialized Unhoistables with the selects above and we are dropping this select, we also remove it from Unhoistables. Added a comment.

853

This is not redundant. Since we may remove a select above, the insert point may change here. Added a comment.

davidxl added inline comments.Aug 23 2018, 11:42 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
753

is it possible to have S0 == S1?

829

In this example, if the selects are after if (c3) {}, then the insertion point won't be before the first select, or is it handled in other way?

931

Add a description that this methods find all nested scopes and merge them if possible.

934

Why can' this scope be merged with subsequent ones?

936

This early check does not seem necessary -- it does not save much compile time, but makes the code less clean.

hjyamauchi marked 5 inline comments as done.

Addressed comments.

hjyamauchi added inline comments.Aug 24 2018, 10:16 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
753

I think it's unlikely but possible because we could have a branch instruction that has the same label for the true and the false targets.

829

The insert point would be the branch or the first select *in the first block" (if any). This particular sentence is specifically about this example. If the selects are instead after if (c3) {}, there's no select in the first block and the insert point would be right before the branch.

934

This will be merged with subsequent ones, if possible, after it's returned to the recursive call site in line 943. At the top level of the recursion, there's the sentinel top-level region at the root of the region tree, which doesn't have a subsequent region. So no merging opportunities missed.

davidxl added inline comments.Aug 27 2018, 9:59 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
363

maybe rename it to 'classifyBiasedScopes' -- which means classify them into true or false biased regions?

589

Through out the patch, 'double' is used for branch probability type. Please use existing fixed point representation BranchProbability type in Support/

1728

Is is possible for this be 'folded' into unconditional branch given the true predicate?

1731

Is it better to use IRBuilder interface here?

Added one more metric to the stats.

hjyamauchi marked 4 inline comments as done.

Addressed comments.

hjyamauchi added inline comments.Aug 27 2018, 4:05 PM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
1728

The true predicate is a placeholder. It will be replaced later in fixupBranchesAndSelects(). Added a comment.

1731

I'll keep it here as is to be symmetric with the removal of the old branch instruction above. But I'll use IRBuilder in the other places (addToMergedCondition()).

davidxl added inline comments.Aug 29 2018, 8:39 AM
lib/Transforms/Instrumentation/ControlHeightReduction.cpp
1809

This might be too conservative that the merged branch becomes less biased as it can be. It is very likely that the contributing branches are correlated. How about taking the 'min' BranchProb as the resulting BP?

hjyamauchi marked an inline comment as done.

Updated.

lib/Transforms/Instrumentation/ControlHeightReduction.cpp
1809

Done.

davidxl accepted this revision.Aug 30 2018, 11:00 AM

lgtm.

lib/Passes/PassBuilder.cpp
494

In thinLTO mode, you may want to only do this in the backend compilation, otherwise, there is a risk that that fallback path gets CHRed again which is not desirable.

This revision is now accepted and ready to land.Aug 30 2018, 11:00 AM
hjyamauchi marked an inline comment as done.Sep 4 2018, 10:01 AM
hjyamauchi added inline comments.
lib/Passes/PassBuilder.cpp
494

Will look into this later.

hjyamauchi marked an inline comment as done.

Rebased.

This revision was automatically updated to reflect the committed changes.
rtrieu added a subscriber: rtrieu.Sep 4 2018, 9:32 PM
rtrieu added inline comments.
llvm/trunk/lib/Transforms/Instrumentation/ControlHeightReduction.cpp
610–616 ↗(On Diff #163860)

This calculation is in danger of overflowing when the weights are large. I made a fix in r341444 to calculate the sum weight in 64 bits to prevent the overflow.

hjyamauchi added inline comments.Sep 5 2018, 8:42 AM
llvm/trunk/lib/Transforms/Instrumentation/ControlHeightReduction.cpp
610–616 ↗(On Diff #163860)

Thanks for the fix.