With runtime profile, we have more confidence in branch probability, thus during basic block layout, we set a lower hot prob threshold so that blocks can be layouted optimally.

# Details

# Diff Detail

- Repository
- rL LLVM

### Event Timeline

The PGO related change is only half correct. In fact the trace successor probability threshold should be computed based on existing parameters misfetch cost and jump instruction cost.

In a diamond shaped CFG, the resulting succProb threshold is indeed 50% as the CFG is symmetric. For Triagular shaped CFG, the result will be very different because breaking topological order introduces more overhead compared with diamond shape case.

Given

A / | B | \ | C

If Prob(A,B) is greater than Prob(A,C), A B C should always be the right order. However, if if Prob(A,B) is lower than P(A,C), we need to compare the cost of A B C vs A C B

For A B C, the cost is

Prob(A,C) * Misfetch_cost due to the taken branch from A to C

For A C B, the cost is

Prob(A, B) *Misfetch_cost // taken branch from A to B

+ Prob(A,B) * misfetch_cost // taken branch (unconditional) from B back to C + Prob(A,B) * jump_inst_cost // one more unconditional jump instruction is introduced

If jump instruction cost is assumed to be low (0), then Prob(A,C) needs to be >66% in order for A C B to be a better layout. If jump instr cost is the same as misfetch, we get 75% threshold.

We need to introduce a helper function to compute the threshold on the fly based on the CFG and parameter.

lib/CodeGen/MachineBlockPlacement.cpp | ||
---|---|---|

120 ↗ | (On Diff #59634) | hot-->likely |

125 ↗ | (On Diff #59634) | hot --> likely |

This function has gone bloated recently and needs some cleanups. I will first do a couple of NFC cleans so that it is in a form that is more friendly to inject cost based analysis first, you can then adapt your patch based on that.

lib/CodeGen/MachineBlockPlacement.cpp | ||
---|---|---|

472 ↗ | (On Diff #60231) | It should look at the predecessors of the block C/'Succ' and find a Pred/B that is a candidate to be scheduled -- simliar to the confilct detection down below. |

479 ↗ | (On Diff #60231) | This condition does not look correct. |

481 ↗ | (On Diff #60231) | Should it adjust threshold BB instead? |

I've done refactoring/cleanup. I think the PGO logic can now be put into the new method: getLayoutSuccessorProbThreshold() -- the method needs to pass in BB and Succ. The implementation should be PGO specific.

Also please add a test case.

I'm still putting the triangle shape handling code outside as the logic is not related to PGO. Even with static profile, the triangle-shaped CFG should have a more relaxed condition to be topologically placed.

Added unittest to cover all cases.

lib/CodeGen/MachineBlockPlacement.cpp | |||||
---|---|---|---|---|---|

596 ↗ | (On Diff #60727) | There is no need to pass this parameter -- MachineFunction is now a member field in the class. | |||

621 ↗ | (On Diff #60727) | It is probably not a good idea to bump the frequency of the alternate edge. Instead we should adjust the HotProb threshold computed by: HotProb = getLayoutSuccessorProbThreshold(BB, Succ); For instance BB
SUcc When Edge BB->Succ is examined, with PGO, the thresold should be adjusted to ~66% (compared to 50% default). In order words, in this shape, in order for BB->Succ to be considered, it needs at least 66% likely. We should not change the non-PGO default (80%) now (can be done later -- but since it requires 'guessing' probbaility, a lot more perf testing is needed in order to change that) |

Please also add a case to cover topo-order with triaglar shape (with profile) as well as a diamond case where BP is ~60%.

lgtm.