Page MenuHomePhabricator

Propeller: LLVM support for basic block sections
Needs ReviewPublic

Authored by tmsriram on Sep 25 2019, 5:02 PM.



This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here:!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here:

This is one in the series of patches for Propeller.

This patch adds support for Basic Block Sections in LLVM.

We introduce a new compiler option, -fbasicblock-sections, which places every basic block in a unique ELF text section in the object file along with a symbol labelling the basic block. The linker can then order the basic block sections in any arbitrary sequence which when done correctly can encapsulate block layout, function layout and function splitting optimizations. However, there are a couple of challenges to be addressed for this to be feasible:

  1. The compiler must not allow any implicit fall-through between any two adjacent basic blocks as they could be reordered at link time to be non-adjacent. In other words, the compiler must make a fall-through between adjacent basic blocks explicit by retaining the direct jump instruction that jumps to the next basic block. These branches can only be removed later in the linking phase after the final ordering is performed as determined by Propeller
  1. Each additional section added to an object file bloats its size by tens of bytes. The number of basic blocks can be potentially very large compared to the size of functions and can bloat object sizes significantly. For instance, the clang binary contains 1.5M basic blocks from approximately 700K functions.
  1. All inter-basic block branch targets would now need to be resolved by the linker as they cannot be calculated during compile time. This is done using static relocations which bloats the size of the object files. Further, the compiler tries to use short branch instructions on some ISAs for branch offsets that can be accommodated in one byte. This is not possible with basic block sections as the offset is not determined at compile time, and long branch instructions have to be used everywhere.
  1. Debug Information (DebugInfo) and Call Frame Information (CFI) emission needs special handling with basic block sections. DebugInfo needs to be emitted with more relocations as basic block sections can break a function into potentially several disjoint pieces, and CFI needs to be emitted per basic block. This also bloats the object file and binary sizes significantly.

Updating DebugInfo and CFI:

Generating correct debug information (DebugInfo) and Call Frame Information (CFI) with basic block sections is challenging. Since basic blocks coming from different functions can be arbitrarily reordered and mixed together, we must appropriately update the DebugInfo and CFI.

DebugInfo is easier compared to CFI as we can leverage the DW_AT_ranges tag which allows description of a possibly non-contiguous range of addresses occupied by an entity. Thus, every basic block section forces a separate entry in DW_AT_ranges, plus two relocations pointing to symbols at the start and end of the basic block, respectively. DebugInfo will bloat object file sizes further with basic block sections.

On the other hand, CFI doesn’t provide any easy way to specify non-contiguous range of addresses occupied by a function – the DWARF standard explicitly requires emitting separate CFI Frame Descriptor Entries for each contiguous fragment of a function. Thus, the CFI information for all callee-saved registers (possibly including the frame pointer, if necessary) have to be emitted along with redefining the Call Frame Address (CFA), viz. where the current frame starts.

This causes a significant bloat of the .eh_frame sections, which is partially mitigated by de-duplicating common CFI instructions to the CFI Common Information Entry. We only de-duplicate CFI instructions with offset 0 from the beginning of the CFI frame, i.e. those that describe the CFI state before entering the frame.

Having support for non-contiguous ranges in CFI would significantly minimize the size overheads and complexity of supporting basic block sections.

To allow easy basic block rewriting in the linker (e.g. removing unnecessary fall-through jumps), we force relocations against symbols and not sections. Moreover, in cases where the range is represented in DWARF as start and length, we defer the length calculation to the link stage, by emitting an appropriate SIZE relocation instead of hardcoding the length directly in the object file by the compiler.

Labeling Basic Blocks :

Every basic block is labelled with a unique symbol as this allows easy mapping of virtual addresses from PMU profiles back to the corresponding basic blocks. Since the number of basic blocks is large, the labeling bloats the symbol table sizes and the string table sizes significantly. While the binary size does increase this does not affect performance as the symbol table is not loaded in memory during run-timea, the string table size bloat is kept very minimal using a unary naming scheme that uses string suffix compression. The basic blocks for function foo are named "", "", . . . This turns out to be very good for string table sizes and the bloat in the string table size for a very large binary is only 8 %.

Diff Detail

Event Timeline

tmsriram created this revision.Sep 25 2019, 5:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2019, 5:02 PM

A few comments after a quick high level scan of patch. Also - please mark all these related patches as parent and child versions in Phabricator.




please run clang-format on this patch


What happens if unique module id is not actually unique?


I don't see any calls using this new parameter in this patch - should this change be included with a different patch in the set?


Should this be uncommented, or removed?


Only a whitespace change in this file, remove from patch.


add comment summarizing when dedup is needed


Add parameter name constants to constant parameters here and in other calls to this function.


change to early return. Also a comment would be good


Confusing comment, as the first 2 sentences seem contradictory (module id not guaranteed to be unique, so we use it).

MaskRay added inline comments.

std::string(getNumber(), 'a')


R_*_SIZE is x86 specific. If you want to use it on other architectures, it will require an ABI change.

aprantl added inline comments.Sep 27 2019, 10:52 AM

Please doxygenify all comments in this patch according to


Please try to remove all string attributes that aren't strictly necessary, they are a maintenance burden.

tmsriram updated this revision to Diff 222268.Sep 27 2019, 4:54 PM
tmsriram marked 4 inline comments as done.

clang-format, comments, minor fixes suggested.

tmsriram marked 9 inline comments as done and an inline comment as not done.Sep 27 2019, 4:57 PM
tmsriram added inline comments.

It does not make the situation any worse and we would could end up with two or more internal linkage functions having the same name. Profile attribution would not be accurate and could cause sub-optimal performance decisions particularly if this function is hot. Maybe we could warn about this and let the user pick another name for this.


It is used in clang/lib/CodeGen/CodeGenModule.cpp where the function name is formed. Is this ok?


Great catch!


Currently guarded with a -mrelocate-with-symbols option. Anything better we could do here?

tejohnson added inline comments.Oct 4 2019, 11:38 AM

I think just clarify what the consequences of a collision are in the comments (would be a bigger problem if there was a possible correctness issue).


Can you connect these patches by parent/child relationships in phabricator?