All of the compilers we support for building LLVM should be able to
build a constexpr set of Descriptions in
DebugInfo/DWARF/DWARFExpression.h, so remove the FIXME and implement it.
Details
- Reviewers
Orlando dblaikie - Group Reviewers
debug-info
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Sure - though could you check if there's any revision history that explains the comment further/motivates why this would be valuable? (I have some apprehension about making extra things constexpr as it can increase compile time excessively (forcing the frontend to check for evaluation - when the backend can more cheaply optimize it away later anyway))
Sure, I did just blindly assume the TODO had merit. Is there a structured way you tend to evaluate these on a case-by-case basis? Should I compare instructions retired while compiling LLVM+Clang and while using the resulting LLVM+Clang to inspect DebugInfo with and without the change?
Edit:
I compared building the lib/DebugInfo/DWARF/all target without the patch vs. with the patch, with some minimal isolation and attempts to make the result statistically significant without spending too much time:
- I ran everything out of a linux tmpfs
- I used ninja as the build system
- I'm compiling with clang version 11.0.0 (https://github.com/llvm/llvm-project 176249bd6732a8044d457092ed932768724a6f06)
- I ran the build six times, cleaning between each and throwing out the first run
<stat>: <control> <constexpr> (<constexpr>/<control>) lib/libLLVMDebugInfoDWARF.a-size: 2021132 2016068 (0.997494) instructions:u: 4271564407084 4271164683246 (0.999906) Maximum resident set size: 694690 694689 (0.999999) branches: 495120306752 495185537556 (1.000132) branch-misses: 30440381987 30515429111 (1.002465) User time: 2321.876000 2331.440000 (1.004119)
If https://www.npopov.com/2020/05/10/Make-LLVM-fast-again.html is to be believed the user instructions retired and max-rss metrics are the best proxies available, but I included some of the "noisy" metrics too. None of them seem too outrageously out-of-line, but I can rerun to see if e.g. the 0.4% increase user-time is a fluke.
I stole the cmake, time, and perf command-lines from the repo for http://llvm-compile-time-tracker.com/ referenced from the same blog post above. I haven't done much in this area so I don't know if I made any mistakes; my script is:
#!/bin/bash build() { for f in control constexpr; do rm -rf build-$f mkdir -p build-$f rm -rf out-$f mkdir -p out-$f cmake \ -GNinja \ -S./source-$f/llvm \ -B./build-$f \ -DLLVM_ENABLE_PROJECTS="clang" \ -DLLVM_TARGETS_TO_BUILD="X86" \ -DLLVM_BUILD_TOOLS=false \ -DLLVM_APPEND_VC_REV=false \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_CCACHE_BUILD=false \ -DLLVM_USE_LINKER=gold \ -DLLVM_BINUTILS_INCDIR=/usr/include \ -DCLANG_ENABLE_ARCMT=false \ -DCLANG_ENABLE_STATIC_ANALYZER=false /usr/bin/ninja -C build-$f lib/DebugInfo/DWARF/all for i in $(seq 5); do LC_ALL=C ninja -C build-$f clean LC_ALL=C \ /usr/bin/time -v -o out-$f/time.$i \ perf stat -x \; -o out-$f/perf.$i \ -e instructions \ -e instructions:u \ -e cycles \ -e task-clock \ -e branches \ -e branch-misses \ ninja -C build-$f lib/DebugInfo/DWARF/all done done } print_stat_header() { printf "<stat>: <control> <constexpr> (<constexpr>/<control>)\n" } print_stat() { printf "%s: %s %s (%s)\n" "$1" "$2" "$3" "$(awk -- "BEGIN { printf \"%f\\n\", $3 / $2 }")" } size() { stat --printf='%s\n' "$@" } compare_size() { local file="$1" local control="$(size build-control/$file)" local constexpr="$(size build-constexpr/$file)" print_stat "$file-size" "$control" "$constexpr" } compare() { local dataset="$1"; shift local rowname="$1"; shift local fmt="$1"; shift local sep=: local idx=2 if [[ $dataset == perf ]]; then sep=\; idx=1 fi local awkprogram="/$rowname/ { sum += \$$idx; n++; } END { printf \"$fmt\\n\", sum / n; }" local control=$(awk -F "$sep" -- "$awkprogram" out-control/$dataset.*) local constexpr=$(awk -F "$sep" -- "$awkprogram" out-constexpr/$dataset.*) local relative=$(awk -- "BEGIN { printf \"%f\\n\", $constexpr / $control }") print_stat "$rowname" "$control" "$constexpr" } print() { print_stat_header compare_size lib/libLLVMDebugInfoDWARF.a compare perf 'instructions:u' '%u' compare time 'Maximum resident set size' '%u' compare perf 'branches' '%u' compare perf 'branch-misses' '%u' compare time 'User time' '%f' } main() { build print } "$@"
Run as bench.sh main
I will expand this to building the whole of llvm, and then dogfood the result, although I may not run as many trials to avoid it taking too long.
The actual use of the DebugInfo code during a normal compilation is minimal: it is just used to "fixup" some operations whose final argument values depend on layout that occurs pretty late.
With that in mind, I figured a better test would be to build a debug build of LLVM trunk, and then run release builds of llvm-dwarfdump over those debug executables, comparing the performance metrics with and without the relevant patches applied.
I first started with dumping clang, but it takes quite a while to chew through all the debug info in a clang executable. I backed down to llvm-dwarfdump itself, and llc. The results seem to indicate the change to constexpr is essentially a wash at runtime:
Switching to constexpr, running `release/llvm-dwarfdump debug/llvm-dwarfdump`: <stat>: <control> <patched> (<patched>/<control>) instructions:u: 223641285928 223732233232 (1.000407) Maximum resident set size: 698212 698365 (1.000219) branches: 22636320251 22435431611 (0.991125) branch-misses: 730543176 342250426 (0.468488) User time: 34.436667 28.920000 (0.839803) Switching to constexpr, running `release/llvm-dwarfdump debug/llc`: <stat>: <control> <patched> (<patched>/<control>) instructions:u: 1878329132612 1879189898727 (1.000458) Maximum resident set size: 5063790 5063321 (0.999907) branches: 191863771404 191266164346 (0.996885) branch-misses: 3513814256 2647395865 (0.753425) User time: 255.713333 242.900000 (0.949892)
With that in mind, and considering D147270 would be simplified if we just dropped all the static bounds and used SmallVector everywhere, I also tried starting with trunk and only changing the DWARFExpression::Operation::Op member to be a SmallVector<Element, 2>, and it also appears to be a wash:
Just changing the static array to a SmallVector<2>, running `release/llvm-dwarfdump debug/llvm-dwarfdump`: <stat>: <control> <patched> (<patched>/<control>) instructions:u: 223622266793 223954468175 (1.001486) Maximum resident set size: 698261 697617 (0.999078) branches: 22379620071 22552883328 (1.007742) branch-misses: 247854954 525422579 (2.119879) User time: 28.326667 32.413333 (1.144269) Just changing the static array to a SmallVector<2>, running `release/llvm-dwarfdump debug/llvm-llc`: <stat>: <control> <patched> (<patched>/<control>) instructions:u: 1878037338324 1880998221933 (1.001577) Maximum resident set size: 5063801 5065104 (1.000257) branches: 191989624604 191678199871 (0.998378) branch-misses: 3458085543 2997545838 (0.866822) User time: 252.236667 250.806667 (0.994331)
Assuming I've actually measured that there are no significant differences in instructions retired or max-rss (and that the remaining code changes needed to support a dynamic number of operands don't change things significantly) it seems like the more maintainable approach of using a SmallVector for the operands and not forcing things into constexpr boxes would be preferable.
Thoughts?
Assuming I've actually measured that there are no significant differences in instructions retired or max-rss (and that the remaining code changes needed to support a dynamic number of operands don't change things significantly) it seems like the more maintainable approach of using a SmallVector for the operands and not forcing things into constexpr boxes would be preferable.
Yeah, seems likely a good way to go. Thanks for looking into it!