This is an archive of the discontinued LLVM Phabricator instance.

[MBP] Don't move bottom block before header if it can't reduce taken branches
ClosedPublic

Authored by Carrot on Jan 22 2019, 2:40 PM.

Details

Summary

If bottom of block BB has only one successor OldTop, in most cases it is profitable to move it before OldTop, except the following case:

-->OldTop<-
|    .    |
|    .    |
|    .    |
---Pred   |
     |    |
    BB-----

Move BB before OldTop can't reduce the number of taken branches, this patch detects this case and prevent the moving.

Diff Detail

Event Timeline

Carrot created this revision.Jan 22 2019, 2:40 PM
davidxl added inline comments.Jan 22 2019, 10:08 PM
lib/CodeGen/MachineBlockPlacement.cpp
1775

Is there any benefit of keeping the original layout?

Carrot marked an inline comment as done.Jan 23 2019, 10:51 AM
Carrot added inline comments.
lib/CodeGen/MachineBlockPlacement.cpp
1775

1 If there is no benefit, the original layout is much better for readability.

2 In theory on modern processor the consecutive two branches should be slower than two branches with other instructions between them.

case 1

...
jne 
jmp

case 2

jne
alu_1
...
alu_n
jmp

When prediction for jne is correct, there is no difference.
When prediction for jne is false and branch is not taken, following instructions is fetched and executed, in case 1 only one instruction is fetched and executed, in case 2 multiple instruction is fetched and executed at the same time. Usually branch instructions cause bubble in pipeline. In case 1 when jmp is executed, the pipeline is completely empty. In case 2 when jmp is executed, the pipeline can still execute previous instructions.

Carrot marked an inline comment as done.Jan 23 2019, 11:00 AM
Carrot added inline comments.
lib/CodeGen/MachineBlockPlacement.cpp
1775

I also noticed findBestLoopExit is only called when it can't find a better new Header. So if we move bottom BB as new Header, it will prevent finding a better loop exit edge.

davidxl added inline comments.Jan 23 2019, 11:24 AM
lib/CodeGen/MachineBlockPlacement.cpp
1775

Right. Suppose oldTop has another predecessor from outside the loop, when bottom BB is rotated before the oldTop, there will be a missing fall through from that predecessor to oldTop.

davidxl accepted this revision.Jan 24 2019, 9:57 AM

lgtm.

Do you have any benchmarking numbers?

This revision is now accepted and ready to land.Jan 24 2019, 9:57 AM

lgtm.

Do you have any benchmarking numbers?

No. I just observed it many multi times in unit tests, found it not optimal, and think it should be improved.

This revision was automatically updated to reflect the committed changes.