This patch adds CLANG_BOLT_INSTRUMENT option that applies BOLT instrumentation
to Clang, performs a bootstrap build with the resulting Clang, merges resulting
fdata files into a single profile file, and uses it to perform BOLT optimization
on the original Clang binary.
The projects and targets used for bootstrap/profile collection are configurable via
CLANG_BOLT_INSTRUMENT_PROJECTS and CLANG_BOLT_INSTRUMENT_TARGETS.
The defaults are "llvm" and "count" respectively, which results in a profile with
~5.3B dynamically executed instructions.
The intended use of the functionality is through BOLT CMake cache file, similar
to PGO 2-stage build:
cmake <llvm-project>/llvm -C <llvm-project>/clang/cmake/caches/BOLT.cmake ninja clang++-bolt # pulls clang-bolt
Stats with a recent checkout (clang-16), pre-built BOLT and Clang, 72vCPU/224G
CMake configure with host Clang + BOLT.cmake | 1m6.592s |
Instrumenting Clang with BOLT | 2m50.508s |
CMake configure llvm with instrumented Clang | 5m46.364s (~5x slowdown) |
CMake build not with instrumented Clang | 0m6.456s |
Merging fdata files | 0m9.439s |
Optimizing Clang with BOLT | 0m39.201s |
Building Clang:
cmake ../llvm-project/llvm -DCMAKE_C_COMPILER=... -DCMAKE_CXX_COMPILER=... -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=clang -DLLVM_TARGETS_TO_BUILD=Native -GNinja
Release | BOLT-optimized | |
cmake | 0m24.016s | 0m22.333s |
ninja clang | 5m55.692s | 4m35.122s |
I know it's not rigorous, but shows a ballpark figure.
We could consider moving this block to a separate file which would then be included here since this file is already getting pretty large and the logic in this block is self-contained. That could be done in a follow up change though.