Leverage perf-training flow for BOLT profile collection, enabling reproducible
BOLT optimization. Remove the use of bootstrapped build for profile collection.
Test Plan:
- Regular (single-stage) build
$ cmake ... -C .../clang/cmake/caches/BOLT.cmake $ ninja clang-bolt ... [21/24] Instrumenting clang binary with BOLT [21/24] Generating BOLT profile for Clang [23/24] Merging BOLT fdata Profile from 2 files merged. [24/24] Optimizing Clang with BOLT ... 1291202496 : executed instructions (-1.1%) 27005133 : taken branches (-71.5%) ...
- Two stage build (ThinLTO+InstPGO)
$ cmake ... -C .../clang/cmake/caches/BOLT.cmake -C .../clang/cmake/caches/BOLT-PGO.cmake $ ninja clang-bolt $ ninja stage2-clang-bolt ... [2756/2759] Instrumenting clang binary with BOLT [2756/2759] Generating BOLT profile for Clang [2758/2759] Merging BOLT fdata [2759/2759] Optimizing Clang with BOLT ... BOLT-INFO: 7092 out of 184104 functions in the binary (3.9%) have non-empty execution profile 756531927 : executed instructions (-0.5%) 15399400 : taken branches (-40.3%) ...