This is an extension of basic block sections which allows emitting several hot basic blocks in the same section, in a given order.
Currently the -basicblock-sections=list option allows specifying unique sections for some basic blocks. For example:
!foo
!!1
!!2
!!4
instructs the compiler to emit the entry basic block and each of basic blocks #1, #2, and #4 into a separate unique section, while all the non-specified basic blocks are coalesced into a "cold" section.
With this patch, we can use the same option to specify clusters of basic blocks to be emitted into the same section. For instance:
!foo
!!0 1 2
!!4
means emitting one section containing the entry block, BB#1, and BB#2 in this order, while emitting BB#4 in a unique section of its own. Still all the excluded basic blocks go into the cold section.
One difference is that with the new approach, we don't always create a unique separate section for the entry block. It needs to be explicitly specified in the lists, or otherwise it would be coalesced into the cold section.
Another difference is with respect to the special exception section. We only create an exception section if the BB-cluster specification scatters EH pads into more than one section.
The final difference is with regards to the size directives. We now emit a size directive only for the BBs which start a section. That size is the size of the section marked by that BB. If we want sizes for every internal basic block as well, this requires being able to combine -basicblock-sections=labels with -basicblock-sections=list which is not currently possible.
Finally, with BB-clusters, we emit persistent non-temporary labels only for basic blocks which begin sections. Other basic blocks will use temp labels in the form of .LBBN_M as before.
The benefit of the BB-cluster approach is that it reduces the number of basic block sections created, which in turn reduces the CFI and DebugInfo overhead, and also reduces the burden of the linker.
Another benefit is being able to reuse the assembler's JCC mitigation strategy as discussed in http://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html.
We note the BB-cluster approach is only beneficial if the optimal block order can be computed prior to compilation, using profiles and the binary generated using -basicblock-secitons=labels We have shown that this is possible and the performance improvements match our previous results (please refer to the link above).
Move this to MachineFunction.h as a member?