# [MBP] Enhance cost based branch prob threshold computation to handle general control flowsNeeds ReviewPublic

Authored by davidxl on Jun 23 2016, 3:14 PM.

# Details

Reviewers
 danielcdh congh
Summary

See comments in the code for details.

Also added more extensive testing in various scenarios to make sure the optimal layout (with minimal branch cost or maximal fall through) is picked.

# Diff Detail

### Event Timeline

davidxl updated this revision to Diff 61730.Jun 23 2016, 3:14 PM
davidxl retitled this revision from to [MBP] Enhance cost based branch prob threshold computation to handle general control flows.
davidxl updated this object.
davidxl added reviewers: danielcdh, congh.
davidxl added a subscriber: llvm-commits.
danielcdh added inline comments.Jun 23 2016, 7:52 PM
lib/CodeGen/MachineBlockPlacement.cpp
610

You mean in (C), right?

624

This is a little confusing, I think the condition should be:

F1 > max(2*F3, F2)

So when checking Pred1, the backward threhold should be 0.66, when checking Pred2, the backward threshold should be 0.5?

But for the current implementation, I tried all testcase, and some other testcases with different triangle-diamond combination, and it always gives me optimal solution.

davidxl added inline comments.Jun 23 2016, 8:38 PM
lib/CodeGen/MachineBlockPlacement.cpp
610

yes.

624

It is F1 > F2 + F3.

Basically, we are comparing selecting BB->Succ as fall through vs selecting BB->Pred1 and Pred2->Succ as fall throughs. In order for BB->Succ to succeed, the minimal min(F1) = F2 + F3.

When computing backward probability, only BB->Succ and Pred2->Succ edges are considered, so the probability threshold

`T = min(F1) /(min(F1) + F2) = (F2 + F3 )/(2*F2 + F3).`

I will update the comments.

danielcdh edited edge metadata.Jun 23 2016, 9:51 PM
danielcdh added a subscriber: danielcdh.

For case C, if F1=50, F2=40, F3=20, it seems it's most beneficial to choose
BB->Succ as fall-through than Pred1->Succ or Pred2->Succ, or am I missing
something?

davidxl added inline comments.Jun 23 2016, 10:16 PM
lib/CodeGen/MachineBlockPlacement.cpp
624

Dehao's question:

For case C, if F1=50, F2=40, F3=20, it seems it's most beneficial to choose BB->Succ as fall-through than Pred1->Succ or Pred2->Succ, or am I missing something?

The answer is, in this case, it is better not to choose BB->Succ as the fall through.

Let's take a look at an example: suppose BB and Pred2 in example C has a predecessor 'S'. With your choice, the layout is

S, BB, Succ, Pred2, Pred1 with cost F2 + F3 + F2 + F3

The optimal layout is in fact

S, BB, P1, P2, Succ with cost F1 + F2 + F3

danielcdh added inline comments.Jun 23 2016, 10:50 PM
lib/CodeGen/MachineBlockPlacement.cpp
624

I see. Thanks for explanation.

It may worth mentioning in the comment that the threshold is F1>max(2*F3, F2+F3), because F2>F3, thus F1>F2+F3.

664

Maybe combine the two scenarios to something like:

F2 = std::max(MBFI->getBlockFreq(BB) * SuccProb.getCompl();, MaxPredEdgeFreq);
F3 = std::min(MBFI->getBlockFreq(BB) * SuccProb.getCompl();, MaxPredEdgeFreq);