This is an archive of the discontinued LLVM Phabricator instance.

[2/2][RemoveRedundantDebugValues] Introduce a MIR pass that removes redundant DBG_VALUEs
ClosedPublic

Authored by djtodoro on Jul 1 2021, 6:51 AM.

Details

Summary

This patch adds the forward scan for finding redundant DBG_VALUEs.

This analysis aims to remove redundant DBG_VALUEs by going forward
in the basic block by considering the first DBG_VALUE as a valid
until its first (location) operand is not clobbered/modified.
For example:

(1) DBG_VALUE $edi, !"var1", ...
(2) <block of code that does affect $edi>
(3) DBG_VALUE $edi, !"var1", ...
 ...

in this case, we can remove (3).

Diff Detail

Event Timeline

djtodoro created this revision.Jul 1 2021, 6:51 AM
djtodoro requested review of this revision.Jul 1 2021, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2021, 6:51 AM

Hi @djtodoro, thank you for implementing this! I've left some inline comments.

I wonder if it would it be worth getting an llvm-locstats comparison on a codebase built with and without this patch-set to get confidence that it is a NFC?

llvm/lib/CodeGen/RemoveRedundantDebugValues.cpp
96–97

I think you need to pass None as the second argument to the DebugVariable ctor (i.e. using the overload that takes an Optional<FragmentInfo>).

Otherwise this function may remove a DBG_VALUE for a variable fragment that overlaps with the fragment of a preceding DBG_VALUE because their FragmentInfo doesn't match. This would result in incorrect debugging information.

In the example below, reduceDbgValsForwardScan will currently (wrongly) remove DBG_VALUE 3 because 1 and 3 have the same variable location, and DBG_VALUE 2 has a different fragment so it is not found in the VariableMap lookup. This causes an inaccuracy, because DBG_VALUE 2 implicitly describes a new variable location for the fragment {0, 32} which means that DBG_VALUE 3 is not actually redundant.

1  DBG_VALUE $rax, "x", DIExpression(DW_OP_fragment, 0, 32)
   ...
2  DBG_VALUE $rdi, "x", DIExpression()
   ...
3  DBG_VALUE $rax, "x", DIExpression(DW_OP_fragment, 0, 32)

Using None for the Optional<FragmentInfo> means that the lookup will succeed for a {variable, inlined-at} pair regardless of the fragment. The downside is that this approach is slightly over-conservative; a DBG_VALUE may end up "blocking" the valid removal of a subsequent DBG_VALUE for the same variable with a non-overlapping fragment. But this is better than introducing inaccurate debug info IMO. Making this change would then match how
removeRedundantDbgInstrsUsingForwardScan in D71478 is implemented. Note that, as in D71478, you only need to do this in the forward scan.

115–117

nit/suggestion: I think this comment could be more concise, maybe something like: "Stop tracking any locations that are clobbered by this instruction."

Hi @djtodoro, thank you for implementing this! I've left some inline comments.

I wonder if it would it be worth getting an llvm-locstats comparison on a codebase built with and without this patch-set to get confidence that it is a NFC?

Thanks for your comments!
Actually, the llvm-locstats numbers will improve, since this fixes MIR, and LiveDebugValues will generate more entry-values for example, since the analysis generates entry-values for unmodified parameters, but with these duplicates, we have had a lot of false positives with the respect to param's changing (the pass considered a new/redundant DBG_VALUE as an assignment). I'll share the locstats numbers anyway.

llvm/lib/CodeGen/RemoveRedundantDebugValues.cpp
96–97

Nice catch! Thanks!

Actually, the llvm-locstats numbers will improve, since this fixes MIR, and LiveDebugValues will generate more entry-values for example, since the analysis generates entry-values for unmodified parameters, but with these duplicates, we have had a lot of false positives with the respect to param's changing (the pass considered a new/redundant DBG_VALUE as an assignment). I'll share the locstats numbers anyway.

Ah okay that makes sense. Thanks, I'd be interested to see those numbers.

Is the coverage boost expected to come from only entry-values then? If that's the case you could try a before/after comparison with entry-values disabled to check the NFC-ness (but it's not a requirement - up to you).

djtodoro updated this revision to Diff 356134.Jul 2 2021, 2:30 AM

-addressing comments

llvm-locstats numbers (on the gdb-7.11 binary compiled with -g -O2)

$ llvm-locstats ./gdb-before-the-patch
 =================================================
            Debug Location Statistics
 =================================================
     cov%           samples         percentage(~)
 -------------------------------------------------
    0%                 2517                2%
   (0%,10%)           5849                5%
   [10%,20%)          4992                4%
   [20%,30%)          4886                4%
   [30%,40%)          4468                4%
   [40%,50%)          4127                4%
   [50%,60%)          4535                4%
   [60%,70%)          4670                4%
   [70%,80%)          5529                5%
   [80%,90%)          6257                6%
   [90%,100%)        10197                9%
   100%              44979               43%
 =================================================
 -the number of debug variables processed: 103006
 -PC ranges covered: 57%
 -------------------------------------------------
 -total availability: 86%
 =================================================


bash-4.2$ ../../build/bin/llvm-locstats --ignore-debug-entry-values ./gdb-before-the-patch
 =================================================
            Debug Location Statistics
 =================================================
     cov%           samples         percentage(~)
 -------------------------------------------------
   0%                 2517                2%
   (0%,10%)           6898                6%
   [10%,20%)          6008                5%
   [20%,30%)          5870                5%
   [30%,40%)          5347                5%
   [40%,50%)          4971                4%
   [50%,60%)          5316                5%
   [60%,70%)          5241                5%
   [70%,80%)          6147                5%
   [80%,90%)          6826                6%
   [90%,100%)        12995               12%
   100%              34870               33%
 =================================================
 -the number of debug variables processed: 103006
 -PC ranges covered: 54%
 -------------------------------------------------
 -total availability: 86%
 =================================================

$ llvm-locstats ./gdb-after-the-patch
 =================================================
            Debug Location Statistics
 =================================================
     cov%           samples         percentage(~)
 -------------------------------------------------
   0%                 2517                2%
   (0%,10%)           5847                5%
   [10%,20%)          4991                4%
   [20%,30%)          4887                4%
   [30%,40%)          4470                4%
   [40%,50%)          4126                4%
   [50%,60%)          4535                4%
   [60%,70%)          4667                4%
   [70%,80%)          5528                5%
   [80%,90%)          6258                6%
   [90%,100%)        10176                9%
   100%              45004               43%
 =================================================
 -the number of debug variables processed: 103006
 -PC ranges covered: 57%
 -------------------------------------------------
 -total availability: 86%
 =================================================

$ llvm-locstats --ignore-debug-entry-values ./gdb-after-the-patch
 =================================================
            Debug Location Statistics
 =================================================
     cov%           samples         percentage(~)
 -------------------------------------------------
   0%                 2517                2%
   (0%,10%)           6897                6%
   [10%,20%)          6009                5%
   [20%,30%)          5869                5%
   [30%,40%)          5350                5%
   [40%,50%)          4969                4%
   [50%,60%)          5317                5%
   [60%,70%)          5239                5%
   [70%,80%)          6147                5%
   [80%,90%)          6827                6%
   [90%,100%)        12995               12%
   100%              34870               33%
 =================================================
 -the number of debug variables processed: 103006
 -PC ranges covered: 54%
 -------------------------------------------------
 -total availability: 86%
 =================================================

Thank you for sharing the stats, they look great (both for NFC-ness and with entry-values)! There is an absolutely miniscule shift in numbers in the middle buckets in the non-entry-value case (NFC), but this is to be expected and acceptable imo.

LGTM - Same as for D105279 I'll give others a chance to comment before hitting Accept.

llvm/lib/CodeGen/RemoveRedundantDebugValues.cpp
96–97

Please can you add a test for this too? Sorry - I should've asked for this in my first comment.

djtodoro updated this revision to Diff 356176.Jul 2 2021, 8:15 AM
  • addressing comments
StephenTozer added inline comments.
llvm/lib/CodeGen/RemoveRedundantDebugValues.cpp
91

In order to make this compatible with DBG_VALUE_LIST, it would probably be best to add a check: if MI.isListDebugValue(), then terminate any existing VariableMap entry for Var. This should ensure that we don't incorrectly attempt to handle DBG_VALUE_LISTs by considering only their first debug operand, or ignore them entirely and potentially incorrectly remove identical DBG_VALUEs that have a DBG_VALUE_LIST for the same variable between them.

djtodoro updated this revision to Diff 356946.Jul 7 2021, 7:12 AM
  • adding TODO marker for DBG_VALUE_LIST for the forward scan only
StephenTozer added inline comments.Jul 7 2021, 7:32 AM
llvm/lib/CodeGen/RemoveRedundantDebugValues.cpp
116

I may be missing something; is there a need for this to be lambda, since it looks like it's only invoked here?

djtodoro updated this revision to Diff 357180.Jul 8 2021, 3:28 AM
  • remove the lambda since it is not needed
StephenTozer added inline comments.Jul 8 2021, 4:02 AM
llvm/lib/CodeGen/RemoveRedundantDebugValues.cpp
112

This LGTM with this edit, unless I've made a mistake in believing it necessary. I believe this will be necessary to prevent incorrect debug info from being caused in the following case:

1  DBG_VALUE $rax, "x", DIExpression()
   ...
2  DBG_VALUE_LIST "x", DIExpression(...), $rax, $rbx
   ...
3  DBG_VALUE $rax, "x", DIExpression()

Currently we would ignore the DBG_VALUE_LIST entirely, and so 3 would falsely appear to be redundant due to being a repeat of 1. Killing any existing mapping for the variable of a DBG_VALUE_LIST is the simplest way to ensure this doesn't happen, without needing a full implementation for list debug values.

Orlando added inline comments.Jul 8 2021, 4:27 AM
llvm/lib/CodeGen/RemoveRedundantDebugValues.cpp
112

FWIW I agree with @StephenTozer and just wanted to add: please can you add a test for this too when you update the code?

djtodoro added inline comments.Jul 8 2021, 5:21 AM
llvm/lib/CodeGen/RemoveRedundantDebugValues.cpp
112

Sure! Thanks! Good catch!

In addition, we should stop tracking a variable in this case as well (if VMI != VariableMap.end()):

if (!Loc.isReg())
  continue

I'll make tests for this before updating the patch.

djtodoro updated this revision to Diff 357208.Jul 8 2021, 6:32 AM
  • handle DBG_VALUE_LIST + test
StephenTozer accepted this revision.Jul 8 2021, 7:20 AM

Latest revision LGTM.

This revision is now accepted and ready to land.Jul 8 2021, 7:20 AM
djtodoro updated this revision to Diff 357227.Jul 8 2021, 8:03 AM

@StephenTozer Thanks!

  • in this revision, I am just fixing a comment in the test
djtodoro updated this revision to Diff 357230.Jul 8 2021, 8:08 AM
  • posting a right patch
This revision was landed with ongoing or failed builds.Jul 15 2021, 12:09 AM
This revision was automatically updated to reflect the committed changes.