This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Merging duplicated basic blocks
AbandonedPublic

Authored by inouehrs on Mar 8 2017, 11:05 PM.

Details

Summary

This patch enables basic block deduplication, which finds duplicated basic blocks and merges them into one block, in SimplifyCFG pass.
Such basic block duplication is common and so Control Flow Optimizer (BranchFolding) does the similar optimization for Machine Function at last of the code generation. However, eliminating such duplication in an early phase increases optimization opportunities in other optimizers, such as shrink wrapping.
In SimplifyCFG, tail-sinking does similar optimization. But the tail-sinking can optimize basic blocks if all the incoming unconditional branches into one block have the duplicated instructions before the branch. The basic block deduplication can merge two (or more) BBs ending with unconditional branch into the same block regardless of the other incoming branches.
As long as I tested, this patch does not visibly increase the compilation time while compiling the LLVM source tree.

The original motivation of this patch is to optimize the hot method of tcmalloc. In which GCC can apply shrink wrapping, but LLVM cannot. By applying this patch in addition to https://reviews.llvm.org/D30900 , LLVM can do shrink wrapping for this hot method.

Diff Detail

Event Timeline

inouehrs created this revision.Mar 8 2017, 11:05 PM
inouehrs edited the summary of this revision. (Show Details)Apr 28 2017, 12:10 AM
sfertile added inline comments.May 23 2017, 12:45 PM
lib/Transforms/Utils/SimplifyCFG.cpp
1685

Variable name should start with an uppercase letter.

1694

Function name should start with a lower case letter.

1719

instrcutions --> instructions

1744

Missing period at the end of the comment, (most of the comments past here need a period added at the end).

1769

incomming --> incoming

1801

incomming --> incoming

1815

incomming --> incoming

inouehrs abandoned this revision.Jul 21 2017, 6:16 AM

This patch is intended to help another patch https://reviews.llvm.org/D30900. Since D30900 is abandoned, I abandon this patch too.

markus added a subscriber: markus.Nov 10 2021, 1:56 AM

We have some cases where a modified version of this patch (modified to also allow optimizing non-branch CFG edges) makes a huge difference in compile time and code size.
The code contains large switch cases where lots of the case BB are equal.

I have a question if this wouldn´t be a good optimization to have in the framework after all?

Originally, this patch is for helping another optimization (https://reviews.llvm.org/D30900). But potentially, removing duplication may affect other optimizations in general.
If you have a strong example to push this optimization, I think it is reasonable to submit a new patch with the description on how it improve the code with a large switch.

ljmf00 added a subscriber: ljmf00.Nov 14 2021, 9:27 AM

Originally, this patch is for helping another optimization (https://reviews.llvm.org/D30900). But potentially, removing duplication may affect other optimizations in general.
If you have a strong example to push this optimization, I think it is reasonable to submit a new patch with the description on how it improve the code with a large switch.

I think this can be beneficial too. With the same rationale of the switch lookup tables optimization, we can find a threshold where optimizing it is generally beneficial in terms of performance, plus, as mentioned, this can reduce significantly binary sizes and can perfectly fall into size optimization level.