Page MenuHomePhabricator

[DebugInfo] Add support for multiple value references in debug values, and enable salvaging
Needs ReviewPublic

Authored by StephenTozer on Jan 13 2021, 1:07 PM.

Details

Reviewers
deadalnix
Summary

This patch contains the complete implementation of DIArgList, DBG_VALUE_LIST, and salvaging of Binary Operator and GEP instructions with non-constant operands. This is not intended to be an actual review (the stack of patches culimating in D91722 are the review patches), but an easy-to-apply patch for anyone seeking to test, poke around on, or add to the new feature.

Note that currently one of the tests added by this patch fails; the test is being rewritten. All the other tests (added by this patch or otherwise) should pass. The patch is currently up-to-date with revision 993c488ed.

Diff Detail

Event Timeline

StephenTozer created this revision.Jan 13 2021, 1:07 PM
StephenTozer requested review of this revision.Jan 13 2021, 1:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2021, 1:07 PM
StephenTozer edited the summary of this revision. (Show Details)Jan 13 2021, 1:09 PM

Rebased to 30b8f553.

Included some fixes, and also D95463 which is not part of the original patch stack or merged into master yet but is necessary to prevent dire IR-size inflation in builds of clang3.4.

vsk added a subscriber: vsk.Wed, Jan 27, 4:41 PM

Just chiming in with compile time numbers on the -O3-optimized bitcode file from D74986.

Starting with a baseline of rG30b8f55378cc:

% ./bin/llc -O3 glyphbench-metarenamed.bc -o /dev/null -filetype=obj -time-passes
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 393.2370 seconds (393.2617 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  191.6503 ( 49.1%)   0.2315 (  9.1%)  191.8819 ( 48.8%)  191.8954 ( 48.8%)  Machine Common Subexpression Elimination
  68.9803 ( 17.7%)   0.4233 ( 16.6%)  69.4036 ( 17.6%)  69.4081 ( 17.6%)  Greedy Register Allocator
  43.6310 ( 11.2%)   0.3162 ( 12.4%)  43.9471 ( 11.2%)  43.9499 ( 11.2%)  X86 DAG->DAG Instruction Selection
  43.3176 ( 11.1%)   0.0725 (  2.8%)  43.3901 ( 11.0%)  43.3930 ( 11.0%)  Merge disjoint stack slots
  26.0380 (  6.7%)   0.0566 (  2.2%)  26.0946 (  6.6%)  26.0950 (  6.6%)  Machine Instruction Scheduler
   6.2238 (  1.6%)   1.3036 ( 51.1%)   7.5274 (  1.9%)   7.5277 (  1.9%)  X86 Assembly Printer
   5.2834 (  1.4%)   0.0269 (  1.1%)   5.3103 (  1.4%)   5.3105 (  1.4%)  Debug Variable Analysis
   1.1216 (  0.3%)   0.0002 (  0.0%)   1.1218 (  0.3%)   1.1217 (  0.3%)  X86 Execution Dependency Fix
   0.7940 (  0.2%)   0.0126 (  0.5%)   0.8066 (  0.2%)   0.8066 (  0.2%)  Live DEBUG_VALUE analysis
   0.3580 (  0.1%)   0.0018 (  0.1%)   0.3599 (  0.1%)   0.3598 (  0.1%)  Virtual Register Rewriter
   0.2394 (  0.1%)   0.0104 (  0.4%)   0.2498 (  0.1%)   0.2498 (  0.1%)  Module Verifier
   0.2408 (  0.1%)   0.0049 (  0.2%)   0.2458 (  0.1%)   0.2458 (  0.1%)  Module Verifier #2
   0.2159 (  0.1%)   0.0033 (  0.1%)   0.2192 (  0.1%)   0.2192 (  0.1%)  CodeGen Prepare
   0.2120 (  0.1%)   0.0040 (  0.2%)   0.2160 (  0.1%)   0.2160 (  0.1%)  Simple Register Coalescing
   0.2091 (  0.1%)   0.0024 (  0.1%)   0.2115 (  0.1%)   0.2115 (  0.1%)  Live Variable Analysis

With D94631 applied on rG30b8f55378cc:

% ./bin/llc -O3 glyphbench-metarenamed.bc -o /dev/null -filetype=obj -time-passes
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 381.6156 seconds (382.0158 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  180.1541 ( 47.5%)   0.0808 (  3.3%)  180.2349 ( 47.2%)  180.2320 ( 47.2%)  Machine Common Subexpression Elimination
  73.8616 ( 19.5%)   0.4514 ( 18.6%)  74.3130 ( 19.5%)  74.6003 ( 19.5%)  Greedy Register Allocator
  42.2676 ( 11.1%)   0.2528 ( 10.4%)  42.5204 ( 11.1%)  42.5199 ( 11.1%)  X86 DAG->DAG Instruction Selection
  39.5782 ( 10.4%)   0.0186 (  0.8%)  39.5968 ( 10.4%)  39.5958 ( 10.4%)  Merge disjoint stack slots
  24.3342 (  6.4%)   0.0234 (  1.0%)  24.3576 (  6.4%)  24.3622 (  6.4%)  Machine Instruction Scheduler
   6.8635 (  1.8%)   1.3182 ( 54.4%)   8.1817 (  2.1%)   8.2164 (  2.2%)  X86 Assembly Printer
   5.6064 (  1.5%)   0.0370 (  1.5%)   5.6434 (  1.5%)   5.6454 (  1.5%)  Debug Variable Analysis
   1.3488 (  0.4%)   0.1053 (  4.3%)   1.4541 (  0.4%)   1.4764 (  0.4%)  Live DEBUG_VALUE analysis
   1.3050 (  0.3%)   0.0024 (  0.1%)   1.3075 (  0.3%)   1.3122 (  0.3%)  X86 Execution Dependency Fix
   0.5689 (  0.2%)   0.0052 (  0.2%)   0.5741 (  0.2%)   0.5992 (  0.2%)  Virtual Register Rewriter
   0.2367 (  0.1%)   0.0060 (  0.2%)   0.2428 (  0.1%)   0.2429 (  0.1%)  Module Verifier #2
   0.2330 (  0.1%)   0.0086 (  0.4%)   0.2416 (  0.1%)   0.2416 (  0.1%)  Module Verifier
   0.2246 (  0.1%)   0.0005 (  0.0%)   0.2251 (  0.1%)   0.2286 (  0.1%)  X86 Byte/Word Instruction Fixup
   0.2099 (  0.1%)   0.0095 (  0.4%)   0.2193 (  0.1%)   0.2193 (  0.1%)  Live Variable Analysis
   0.2111 (  0.1%)   0.0006 (  0.0%)   0.2117 (  0.1%)   0.2120 (  0.1%)  Prologue/Epilogue Insertion & Frame Finalization
   0.2070 (  0.1%)   0.0038 (  0.2%)   0.2107 (  0.1%)   0.2107 (  0.1%)  CodeGen Prepare
   0.1945 (  0.1%)   0.0006 (  0.0%)   0.1951 (  0.1%)   0.2022 (  0.1%)  Machine Copy Propagation Pass
   0.1932 (  0.1%)   0.0016 (  0.1%)   0.1949 (  0.1%)   0.1949 (  0.1%)  Simple Register Coalescing
   0.1908 (  0.1%)   0.0008 (  0.0%)   0.1916 (  0.1%)   0.1923 (  0.1%)  Machine Copy Propagation Pass #2
   0.1308 (  0.0%)   0.0057 (  0.2%)   0.1365 (  0.0%)   0.1365 (  0.0%)  Live Interval Analysis

I'll need to report back with the end-to-end timings, as this takes ages to complete.

In D94631#2526890, @vsk wrote:

Just chiming in with compile time numbers on the -O3-optimized bitcode file from D74986.

Good to hear - these numbers look promising. The apparent slowdown on LiveDebugValues is unfortunate and might be worth fixing if possible, but it's nothing close to the "old" performance. The question now is whether the presence of DBG_VALUE_LISTs causes a large performance hit; I don't think there's anything in there that should cause any dire issues, but I appreciate any certainty that can be given.

vsk added a comment.Thu, Jan 28, 12:38 PM

Here are some numbers from end-to-end testing with a Release clang binary.

Starting with a baseline of rG30b8f55378cc:

% for I in $(seq 10); do time ./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null ; done
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  189.68s user 0.71s system 99% cpu 3:10.53 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  190.62s user 0.68s system 99% cpu 3:11.37 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  188.33s user 0.72s system 99% cpu 3:09.10 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  190.87s user 0.73s system 99% cpu 3:11.74 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  200.22s user 0.83s system 99% cpu 3:21.21 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  183.18s user 0.69s system 99% cpu 3:03.89 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  183.99s user 0.64s system 99% cpu 3:04.65 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  184.07s user 0.68s system 99% cpu 3:04.77 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  184.06s user 0.70s system 99% cpu 3:04.78 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  182.41s user 0.67s system 99% cpu 3:03.09 total

With D94631 applied on rG30b8f55378cc:

% for I in $(seq 10); do time ./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null ; done
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  190.92s user 0.89s system 99% cpu 3:12.37 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  193.07s user 0.93s system 99% cpu 3:14.22 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  188.49s user 0.76s system 99% cpu 3:09.41 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  193.46s user 0.99s system 99% cpu 3:14.54 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  188.78s user 0.78s system 99% cpu 3:09.59 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  189.35s user 0.80s system 99% cpu 3:10.20 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  190.92s user 0.84s system 99% cpu 3:12.02 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  181.41s user 0.65s system 99% cpu 3:02.07 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  183.23s user 0.68s system 99% cpu 3:04.02 total
./bin/clang -w -O3 glyphbench-O0-with-disable-optnone.bc -o /dev/null  180.78s user 0.68s system 99% cpu 3:01.47 total
In D94631#2528964, @vsk wrote:

Here are some numbers from end-to-end testing with a Release clang binary.

Thanks very much, these numbers are very encouraging! It looks as though we have no discernable performance cost, which seems to agree with the other tests I've done. This should mean that we can safely merge in the patches once reviews are complete, unless any correctness issues turn up.