This is an archive of the discontinued LLVM Phabricator instance.

[NFC][DebugInfo] Make Descriptions constexpr
AbandonedPublic

Authored by scott.linder on Mar 30 2023, 1:48 PM.

Details

Reviewers
Orlando
dblaikie
Group Reviewers
debug-info
Summary

All of the compilers we support for building LLVM should be able to
build a constexpr set of Descriptions in
DebugInfo/DWARF/DWARFExpression.h, so remove the FIXME and implement it.

Diff Detail

Event Timeline

scott.linder created this revision.Mar 30 2023, 1:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 1:48 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
scott.linder requested review of this revision.Mar 30 2023, 1:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 1:48 PM
Orlando accepted this revision.Mar 31 2023, 1:25 AM
Orlando added a subscriber: Orlando.

Seems reasonable, LGTM

This revision is now accepted and ready to land.Mar 31 2023, 1:25 AM
dblaikie accepted this revision.Apr 3 2023, 12:02 PM
dblaikie added a subscriber: dblaikie.

Sure - though could you check if there's any revision history that explains the comment further/motivates why this would be valuable? (I have some apprehension about making extra things constexpr as it can increase compile time excessively (forcing the frontend to check for evaluation - when the backend can more cheaply optimize it away later anyway))

scott.linder added a comment.EditedApr 5 2023, 11:06 AM

Sure - though could you check if there's any revision history that explains the comment further/motivates why this would be valuable? (I have some apprehension about making extra things constexpr as it can increase compile time excessively (forcing the frontend to check for evaluation - when the backend can more cheaply optimize it away later anyway))

Sure, I did just blindly assume the TODO had merit. Is there a structured way you tend to evaluate these on a case-by-case basis? Should I compare instructions retired while compiling LLVM+Clang and while using the resulting LLVM+Clang to inspect DebugInfo with and without the change?

Edit:

I compared building the lib/DebugInfo/DWARF/all target without the patch vs. with the patch, with some minimal isolation and attempts to make the result statistically significant without spending too much time:

<stat>: <control> <constexpr> (<constexpr>/<control>)
lib/libLLVMDebugInfoDWARF.a-size: 2021132 2016068 (0.997494)
instructions:u: 4271564407084 4271164683246 (0.999906)
Maximum resident set size: 694690 694689 (0.999999)
branches: 495120306752 495185537556 (1.000132)
branch-misses: 30440381987 30515429111 (1.002465)
User time: 2321.876000 2331.440000 (1.004119)

If https://www.npopov.com/2020/05/10/Make-LLVM-fast-again.html is to be believed the user instructions retired and max-rss metrics are the best proxies available, but I included some of the "noisy" metrics too. None of them seem too outrageously out-of-line, but I can rerun to see if e.g. the 0.4% increase user-time is a fluke.

I stole the cmake, time, and perf command-lines from the repo for http://llvm-compile-time-tracker.com/ referenced from the same blog post above. I haven't done much in this area so I don't know if I made any mistakes; my script is:

#!/bin/bash

build() {
  for f in control constexpr; do
    rm -rf build-$f
    mkdir -p build-$f
    rm -rf out-$f
    mkdir -p out-$f
    cmake \
      -GNinja \
      -S./source-$f/llvm \
      -B./build-$f \
      -DLLVM_ENABLE_PROJECTS="clang" \
      -DLLVM_TARGETS_TO_BUILD="X86" \
      -DLLVM_BUILD_TOOLS=false \
      -DLLVM_APPEND_VC_REV=false \
      -DCMAKE_BUILD_TYPE=Release \
      -DLLVM_CCACHE_BUILD=false \
      -DLLVM_USE_LINKER=gold \
      -DLLVM_BINUTILS_INCDIR=/usr/include \
      -DCLANG_ENABLE_ARCMT=false \
      -DCLANG_ENABLE_STATIC_ANALYZER=false
    /usr/bin/ninja -C build-$f lib/DebugInfo/DWARF/all
    for i in $(seq 5); do
      LC_ALL=C ninja -C build-$f clean
      LC_ALL=C \
        /usr/bin/time -v -o out-$f/time.$i \
        perf stat -x \; -o out-$f/perf.$i \
          -e instructions \
          -e instructions:u \
          -e cycles \
          -e task-clock \
          -e branches \
          -e branch-misses \
          ninja -C build-$f lib/DebugInfo/DWARF/all
    done
  done
}
print_stat_header() {
  printf "<stat>: <control> <constexpr> (<constexpr>/<control>)\n"
}
print_stat() {
  printf "%s: %s %s (%s)\n" "$1" "$2" "$3" "$(awk -- "BEGIN { printf \"%f\\n\", $3 / $2 }")"
}
size() {
  stat --printf='%s\n' "$@"
}
compare_size() {
  local file="$1"
  local control="$(size build-control/$file)"
  local constexpr="$(size build-constexpr/$file)"
  print_stat "$file-size" "$control" "$constexpr"
}
compare() {
  local dataset="$1"; shift
  local rowname="$1"; shift
  local fmt="$1"; shift
  local sep=:
  local idx=2
  if [[ $dataset == perf ]]; then
    sep=\;
    idx=1
  fi
  local awkprogram="/$rowname/ { sum += \$$idx; n++; } END { printf \"$fmt\\n\", sum / n; }"
  local control=$(awk -F "$sep" -- "$awkprogram" out-control/$dataset.*)
  local constexpr=$(awk -F "$sep" -- "$awkprogram" out-constexpr/$dataset.*)
  local relative=$(awk -- "BEGIN { printf \"%f\\n\", $constexpr / $control }")
  print_stat "$rowname" "$control" "$constexpr"
}
print() {
  print_stat_header
  compare_size lib/libLLVMDebugInfoDWARF.a
  compare perf 'instructions:u' '%u'
  compare time 'Maximum resident set size' '%u'
  compare perf 'branches' '%u'
  compare perf 'branch-misses' '%u'
  compare time 'User time' '%f'
}
main() {
  build
  print
}
"$@"

Run as bench.sh main

I will expand this to building the whole of llvm, and then dogfood the result, although I may not run as many trials to avoid it taking too long.

Sure - though could you check if there's any revision history that explains the comment further/motivates why this would be valuable? (I have some apprehension about making extra things constexpr as it can increase compile time excessively (forcing the frontend to check for evaluation - when the backend can more cheaply optimize it away later anyway))

Sure, I did just blindly assume the TODO had merit. Is there a structured way you tend to evaluate these on a case-by-case basis? Should I compare instructions retired while compiling LLVM+Clang and while using the resulting LLVM+Clang to inspect DebugInfo with and without the change?

Edit:

I compared building the lib/DebugInfo/DWARF/all target without the patch vs. with the patch, with some minimal isolation and attempts to make the result statistically significant without spending too much time:

<stat>: <control> <constexpr> (<constexpr>/<control>)
lib/libLLVMDebugInfoDWARF.a-size: 2021132 2016068 (0.997494)
instructions:u: 4271564407084 4271164683246 (0.999906)
Maximum resident set size: 694690 694689 (0.999999)
branches: 495120306752 495185537556 (1.000132)
branch-misses: 30440381987 30515429111 (1.002465)
User time: 2321.876000 2331.440000 (1.004119)

If https://www.npopov.com/2020/05/10/Make-LLVM-fast-again.html is to be believed the user instructions retired and max-rss metrics are the best proxies available, but I included some of the "noisy" metrics too. None of them seem too outrageously out-of-line, but I can rerun to see if e.g. the 0.4% increase user-time is a fluke.

I stole the cmake, time, and perf command-lines from the repo for http://llvm-compile-time-tracker.com/ referenced from the same blog post above. I haven't done much in this area so I don't know if I made any mistakes; my script is:

#!/bin/bash

build() {
  for f in control constexpr; do
    rm -rf build-$f
    mkdir -p build-$f
    rm -rf out-$f
    mkdir -p out-$f
    cmake \
      -GNinja \
      -S./source-$f/llvm \
      -B./build-$f \
      -DLLVM_ENABLE_PROJECTS="clang" \
      -DLLVM_TARGETS_TO_BUILD="X86" \
      -DLLVM_BUILD_TOOLS=false \
      -DLLVM_APPEND_VC_REV=false \
      -DCMAKE_BUILD_TYPE=Release \
      -DLLVM_CCACHE_BUILD=false \
      -DLLVM_USE_LINKER=gold \
      -DLLVM_BINUTILS_INCDIR=/usr/include \
      -DCLANG_ENABLE_ARCMT=false \
      -DCLANG_ENABLE_STATIC_ANALYZER=false
    /usr/bin/ninja -C build-$f lib/DebugInfo/DWARF/all
    for i in $(seq 5); do
      LC_ALL=C ninja -C build-$f clean
      LC_ALL=C \
        /usr/bin/time -v -o out-$f/time.$i \
        perf stat -x \; -o out-$f/perf.$i \
          -e instructions \
          -e instructions:u \
          -e cycles \
          -e task-clock \
          -e branches \
          -e branch-misses \
          ninja -C build-$f lib/DebugInfo/DWARF/all
    done
  done
}
print_stat_header() {
  printf "<stat>: <control> <constexpr> (<constexpr>/<control>)\n"
}
print_stat() {
  printf "%s: %s %s (%s)\n" "$1" "$2" "$3" "$(awk -- "BEGIN { printf \"%f\\n\", $3 / $2 }")"
}
size() {
  stat --printf='%s\n' "$@"
}
compare_size() {
  local file="$1"
  local control="$(size build-control/$file)"
  local constexpr="$(size build-constexpr/$file)"
  print_stat "$file-size" "$control" "$constexpr"
}
compare() {
  local dataset="$1"; shift
  local rowname="$1"; shift
  local fmt="$1"; shift
  local sep=:
  local idx=2
  if [[ $dataset == perf ]]; then
    sep=\;
    idx=1
  fi
  local awkprogram="/$rowname/ { sum += \$$idx; n++; } END { printf \"$fmt\\n\", sum / n; }"
  local control=$(awk -F "$sep" -- "$awkprogram" out-control/$dataset.*)
  local constexpr=$(awk -F "$sep" -- "$awkprogram" out-constexpr/$dataset.*)
  local relative=$(awk -- "BEGIN { printf \"%f\\n\", $constexpr / $control }")
  print_stat "$rowname" "$control" "$constexpr"
}
print() {
  print_stat_header
  compare_size lib/libLLVMDebugInfoDWARF.a
  compare perf 'instructions:u' '%u'
  compare time 'Maximum resident set size' '%u'
  compare perf 'branches' '%u'
  compare perf 'branch-misses' '%u'
  compare time 'User time' '%f'
}
main() {
  build
  print
}
"$@"

Run as bench.sh main

I will expand this to building the whole of llvm, and then dogfood the result, although I may not run as many trials to avoid it taking too long.

The actual use of the DebugInfo code during a normal compilation is minimal: it is just used to "fixup" some operations whose final argument values depend on layout that occurs pretty late.

With that in mind, I figured a better test would be to build a debug build of LLVM trunk, and then run release builds of llvm-dwarfdump over those debug executables, comparing the performance metrics with and without the relevant patches applied.

I first started with dumping clang, but it takes quite a while to chew through all the debug info in a clang executable. I backed down to llvm-dwarfdump itself, and llc. The results seem to indicate the change to constexpr is essentially a wash at runtime:

Switching to constexpr, running `release/llvm-dwarfdump debug/llvm-dwarfdump`:

<stat>: <control> <patched> (<patched>/<control>)
instructions:u: 223641285928 223732233232 (1.000407)
Maximum resident set size: 698212 698365 (1.000219)
branches: 22636320251 22435431611 (0.991125)
branch-misses: 730543176 342250426 (0.468488)
User time: 34.436667 28.920000 (0.839803)

Switching to constexpr, running `release/llvm-dwarfdump debug/llc`:

<stat>: <control> <patched> (<patched>/<control>)
instructions:u: 1878329132612 1879189898727 (1.000458)
Maximum resident set size: 5063790 5063321 (0.999907)
branches: 191863771404 191266164346 (0.996885)
branch-misses: 3513814256 2647395865 (0.753425)
User time: 255.713333 242.900000 (0.949892)

With that in mind, and considering D147270 would be simplified if we just dropped all the static bounds and used SmallVector everywhere, I also tried starting with trunk and only changing the DWARFExpression::Operation::Op member to be a SmallVector<Element, 2>, and it also appears to be a wash:

Just changing the static array to a SmallVector<2>, running `release/llvm-dwarfdump debug/llvm-dwarfdump`:

<stat>: <control> <patched> (<patched>/<control>)
instructions:u: 223622266793 223954468175 (1.001486)
Maximum resident set size: 698261 697617 (0.999078)
branches: 22379620071 22552883328 (1.007742)
branch-misses: 247854954 525422579 (2.119879)
User time: 28.326667 32.413333 (1.144269)

Just changing the static array to a SmallVector<2>, running `release/llvm-dwarfdump debug/llvm-llc`:

<stat>: <control> <patched> (<patched>/<control>)
instructions:u: 1878037338324 1880998221933 (1.001577)
Maximum resident set size: 5063801 5065104 (1.000257)
branches: 191989624604 191678199871 (0.998378)
branch-misses: 3458085543 2997545838 (0.866822)
User time: 252.236667 250.806667 (0.994331)

Assuming I've actually measured that there are no significant differences in instructions retired or max-rss (and that the remaining code changes needed to support a dynamic number of operands don't change things significantly) it seems like the more maintainable approach of using a SmallVector for the operands and not forcing things into constexpr boxes would be preferable.

Thoughts?

Assuming I've actually measured that there are no significant differences in instructions retired or max-rss (and that the remaining code changes needed to support a dynamic number of operands don't change things significantly) it seems like the more maintainable approach of using a SmallVector for the operands and not forcing things into constexpr boxes would be preferable.

Yeah, seems likely a good way to go. Thanks for looking into it!

scott.linder abandoned this revision.Apr 19 2023, 1:52 PM

Determined constexpr was not useful here