Page MenuHomePhabricator

[POC][DebugInfo] Use entry values within IR
Needs ReviewPublic

Authored by djtodoro on Sep 7 2020, 6:45 AM.

Details

Summary

Using entry values on IR will give us better debugging user experience when debugging "optimized" code (swift uses llvm.dbg.values even in non-optimized code, so this refers to that code as well). This patch implements a utility (within Transforms/Utils/Local.cpp) that finds an entry Value for a given Variable.

Consider this (simple) test case:

void f1(int);
void f2(int i) {
  f1(1);
  i = i + 5;
  f1(3);
}

IR ($ clang -g -O2 -S) for the func looks:

define dso_local void @f2(i32 %i) local_unnamed_addr !dbg !7 {
entry:
  call void @llvm.dbg.value(metadata i32 %i, metadata !12, metadata !DIExpression()), !dbg !13
  tail call void @f1(i32 1), !dbg !14
  call void @llvm.dbg.value(metadata i32 %i, metadata !12, metadata !DIExpression(DW_OP_plus_uconst, 5, DW_OP_stack_value)), !dbg !13
  tail call void @f1(i32 3), !dbg !15
  ret void, !dbg !16
}

SelectionDAG for the second dbg.value generates:

DBG_VALUE $noreg, $noreg, !"i", !DIExpression(DW_OP_plus_uconst, 5, DW_OP_stack_value), debug-location !13; test.c:0 line no:2

After this set of patches, we will salvage the value by using the entry values, so the DBG_VALUE looks as following:

DBG_VALUE $edi, $noreg, !12, !DIExpression(DW_OP_LLVM_entry_value, 1, DW_OP_plus_uconst, 5, DW_OP_stack_value), debug-location !13

Diff Detail

Event Timeline

djtodoro created this revision.Sep 7 2020, 6:45 AM

This is actually ISel phase, but we still consider SSA, so this should be the latest point we are able to use the findEntryValue() in the pipeline.

djtodoro added a comment.EditedSep 7 2020, 6:59 AM

(test.c was listed within llvm/test/DebugInfo/X86/entry-values-for-isel-invalidated-nodes.ll)
Looks like LLDB has a problem with parsing "complex" expression with DW_OP_entry_value:

$ cat main.c
#include <stdio.h>

void f1(int x) {
  printf("%d\n", x);
}

int main()
{
  f2(4);
  return 0;
}

GCC generated debug_loc (gcc-test.o):

    00000000 0000000000000000 0000000000000009 (DW_OP_reg5 (rdi))
    00000013 0000000000000009 000000000000000e (DW_OP_GNU_entry_value: (DW_OP_reg5 (rdi)); DW_OP_stack_value)
    00000029 000000000000000e 000000000000001c (DW_OP_GNU_entry_value: (DW_OP_reg5 (rdi)); DW_OP_plus_uconst: 5; DW_OP_stack_value)
$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc main.c gcc-test.o -o main-gcc
$ lldb ./main-gcc
(lldb) target create "./main-gcc"
Current executable set to 'main-gcc' (x86_64).
(lldb) b test.c:5
Breakpoint 1: where = main-gcc`f2 + 14 at test.c:5, address = 0x000000000040119e
(lldb) r
Process 16815 launched: 'main-gcc' (x86_64)
1
Process 16815 stopped
* thread #1, name = 'main-gcc', stop reason = breakpoint 1.1
    frame #0: 0x000000000040119e main-gcc`f2(i=5) at test.c:5
   2    void f2(int i) {
   3      f1(1);
   4      i = i + 5;
-> 5      f1(3);
   6    }
   7
(lldb) p i
(int) $0 = 5

The same situation is with the test case compiled with clang (patched by using this change).

t-tye added a subscriber: t-tye.Sep 7 2020, 6:37 PM
dstenb added a comment.Sep 8 2020, 5:55 AM

Interesting!

I will go through this more deeply, but just an initial comment is that I think it may be worth considering splitting it into two patches, in order to make it easier to review:

  1. One which loosens the DIExpression limitations, and adds support to DwarfExpression. With test cases using handcrafted DIExpressions.
  2. One which adds the SelectionDAG support.
dstenb added inline comments.Sep 8 2020, 6:09 AM
llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
608–609

This means that the entry value will cover the whole expression, regardless of the number of operands that are specified in DW_OP_LLVM_entry_value? I think we want to do this conditionally in the ExprCursor loop?

Doing this after addStackValue() makes it so that the entry value contains an implicit location description, but as mentioned in DWARF5 2.5.1.7 the second operand in a DW_OP_entry_value is "a block containing a DWARF expression or a register location description", and as was discussed in D75270 and [0], location descriptions does not seem to be allowed to be used where "DWARF expressions" are specified.

[0] http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2020-March/004610.html

dstenb added inline comments.Sep 8 2020, 6:24 AM
llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
456–458

Is there test case where we land in this situation? As I understand it we still have the limitation inside DIExpression that it can only cover one operand (the register operand), and if I am not overlooking something we should have finalized the entry value on line 299:

if (isEntryValue())
  finalizeEntryValue();

@dstenb Thanks a lot for your comments! I will split this into two patches.

llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
456–458

I should have removed this from the patch, since this is unused indeed (it was part of some testing during the development).

608–609

The same as above.

djtodoro updated this revision to Diff 290680.Sep 9 2020, 2:51 AM

-Split into 2 parts
-Remove unused parts

dstenb added inline comments.Sep 10 2020, 12:03 AM
llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
513

This might be outside of the scope of this patch, but I wonder if should change the location kind from Register to Unknown/Implicit after finalizing the entry value? The entry value operation's operand is a register location description, but the operation itself just pushes a value on the stack.

llvm/lib/IR/DebugInfoMetadata.cpp
1032

Ah, I it seems that we did not have a small Verifier case for the last condition. Perhaps we now should add a small permissive one. Something like this:

--- a/llvm/test/Verifier/diexpression-valid-entry-value.ll
+++ b/llvm/test/Verifier/diexpression-valid-entry-value.ll
@@ -1,5 +1,6 @@
 ; RUN: opt -S < %s 2>&1 | FileCheck %s
 
-!named = !{!0}
+!named = !{!0, !1}
 ; CHECK-NOT: invalid expression
 !0 = !DIExpression(DW_OP_LLVM_entry_value, 1)
+!1 = !DIExpression(DW_OP_LLVM_entry_value, 1, DW_OP_plus_uconst, 3, DW_OP_stack_value)
djtodoro added inline comments.Sep 10 2020, 4:18 AM
llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
513

Hm... yes, I guess. It might be closer to Implicit location? I think that should be a separate change.

llvm/lib/IR/DebugInfoMetadata.cpp
1032

Yes, thanks!

djtodoro updated this revision to Diff 290937.Sep 10 2020, 4:19 AM
  • Test new expression
djtodoro edited the summary of this revision. (Show Details)Sep 10 2020, 4:22 AM
djtodoro edited the summary of this revision. (Show Details)

@dstenb Please let me know if this looks OK to you now.

Any other comments?

Orlando added inline comments.
llvm/lib/Transforms/Utils/Local.cpp
1601

Hi @djtodoro. IIUC this function assumes that the returned dbg.value - the first one we find - for parameter variable Var uses the corresponding argument value (as opposed to some other value that a mutable parameter could be assigned). If that is correct, why is that assumption safe?

djtodoro added inline comments.Sep 21 2020, 5:57 AM
llvm/lib/Transforms/Utils/Local.cpp
1601

(Hmm.... I've forgotten to take another look into this, in spite of the fact that I've had an item for it within my TODO list...)

Hi @Orlando.
I had a crazy premise in my mind, that every parameter should have a dbg_value on the beginning of the bb.entry. In some situations, the Value could be undef (due to some optimizations), but it will indicate just that the value has been optimized out (but it should be still there in the code). Is that correct? I've prepared this POC in rush, and have forgotten to check this once again. If the users() aren't sorted out, this should be reimplemented.

Orlando added a subscriber: jmorse.Sep 21 2020, 7:20 AM
Orlando added inline comments.
llvm/lib/Transforms/Utils/Local.cpp
1601

Every parameter should have a dbg_value on the beginning of the bb.entry

My understanding is that there should be _at least_ one dbg.value for each param at the top of the function. E.g. in the following source, a parameter test has been split into two argument values and there's a fragment expression for each.

$ cat test.c
struct S {
  long long one;
  long long two;
};
void foo(struct S test) {}

$ clang -O2 -g -S -emit-llvm -o - test.c
...
define dso_local void @foo(i64 %test.coerce0, i64 %test.coerce1) local_unnamed_addr #0 !dbg !7 {
entry:
  call void @llvm.dbg.value(metadata i64 %test.coerce0, metadata !16, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 64)), !dbg !17
  call void @llvm.dbg.value(metadata i64 %test.coerce1, metadata !16, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64)), !dbg !17
  ret void, !dbg !18
}
...

If the users() aren't sorted out, this should be reimplemented

I am not sure if users() is sorted. I had a hunch that it isn't and I can't see any comments indicating that it is. @jmorse - or anyone else - do you know for sure?

Supposing that users() isn't sorted and we take the first assumption as truth (that every parameter should have a dbg.value at the top of the function, ignoring fragmented parameters), then I don't think it should be too hard to go forward from here. I think we could just order the dbg.values on Instruction::comesBefore(const Instruction *).

Additionally, shouldn't we be filtering out dbg.values from inlined uses of Var?

dblaikie added inline comments.Sep 21 2020, 10:47 AM
llvm/lib/Transforms/Utils/Local.cpp
1601

I am not sure if users() is sorted. I had a hunch that it isn't and I can't see any comments indicating that it is. @jmorse - or anyone else - do you know for sure?

I'm about 98% sure that use list order is not stable. (that's why there's this: https://llvm.org/doxygen/UseListOrder_8h_source.html )

djtodoro added inline comments.Sep 22 2020, 6:49 AM
llvm/lib/Transforms/Utils/Local.cpp
1601

Thanks!

We can simply iterate throughout the entry-bb and find the first dbg.value for a given parameter.

Additionally, shouldn't we be filtering out dbg.values from inlined uses of Var?

Yes. We should avoid fragmented variables as well, since we do not support such entry values at the moment.

djtodoro updated this revision to Diff 293444.Sep 22 2020, 6:52 AM
  • Update the algorithm for finding the entry values
  • Improve the tests
Orlando added inline comments.Sep 22 2020, 9:42 AM
llvm/lib/Transforms/Utils/Local.cpp
1601

This function (and the unit test) LGTM now, thanks.

Sorry for a piecemeal review from my part!

llvm/lib/Transforms/Utils/Local.cpp
1617

Perhaps this is too verbose?

1618

Maybe make this an early continue?

DbgValueInst *DVI = dyn_cast<DbgValueInst>(&I);
if (!DVI)
  continue;
1627–1628

I might be misunderstanding something here, but shouldn't we want to still search for the debug value even though we have encountered an inline variant of the same variable? I guess this is something that is not really common but still.

If so, this needs to be moved above the getNumElements() check.

Sorry for a piecemeal review from my part!

No problem, the tempo works for me. :) Thanks!

llvm/lib/Transforms/Utils/Local.cpp
1617

No problem, I'll remove this.

1618

Sure.

1627–1628

Nice catch, thanks!

djtodoro updated this revision to Diff 293676.Sep 23 2020, 2:26 AM
  • addressing comments
StephenTozer accepted this revision.EditedSep 24 2020, 8:58 AM
StephenTozer added a subscriber: StephenTozer.

LGTM. I think the Implicit flag idea might be able to be folded into this patch without much issue, but it's not urgent and can definitely be shifted to another patch (or even just left as a TODO). Should probably be looked at by others with experience in the area, but I think most of the comments are probably going to be for the later patches in this stack.

llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
303

With regards to the comment below about using an Implicit location, this would be a good place to do it I think. If you set LocationKind = isIndirect() ? Memory : Implicit, then DwarfExpression::addExpression will add the required stack value automatically at the end (in addition to removing the need to add || isEntryValue() to the clauses below).

This revision is now accepted and ready to land.Sep 24 2020, 8:58 AM
dstenb accepted this revision.Sep 24 2020, 3:16 PM

I'm equally fine with doing the Implicit flag change here, or for someone to do it in a separate patch.

djtodoro planned changes to this revision.Sep 25 2020, 12:36 AM

@StephenTozer @dstenb Thanks for your comments! I'll try to change the approach (by putting some extra context), so we might end up changing this one (a bit).

I'll let you know once I am ready. :)

I am about to post new set of patches addressing this problem. We are back to the initial idea that refers to extending the llvm.dbg.value() to hold info about the entry Value. We need to do that, since the current approach (by relying on the method that uses the DILocalVariable*) couldn't detect variable modifications (in an efficient way). Another advantage we are getting with this new approach is that we can handle local variables as well, if it can be expressed in terms of an entry value such as:

void fn (int param) {
 ...
 int local = param + 2;
 ...

The idea is to extend the debug intrinsic with two extra operands:

  1. 4th operand points to the Value * that holds the entry value the variable depends on
  2. 5th the expression that applies to the entry Value * expressing a modification in terms of the entry value

In this initial stage, I am not handling the AutoUpgrade and bitcode backward compatibility. I will add that as well in following days, as well as some unit tests for the new functionality.

djtodoro updated this revision to Diff 296160.Mon, Oct 5, 5:31 AM
  • Extend the llvm.dbg.value() with 2 operands by describing Entry Value

TODOs:

  1. Handle auto-upgrade
  2. Handle bitcode backward compatibility
  3. Add tests (unit tests)
  4. Add documentation

For now, the usage of this can be seen at: D87357

This revision is now accepted and ready to land.Mon, Oct 5, 5:31 AM
djtodoro requested review of this revision.Mon, Oct 5, 5:32 AM
djtodoro updated this revision to Diff 296447.Tue, Oct 6, 6:49 AM
  • Avoid null metadata within the new llvm.dbg.value()
dstenb added a comment.Tue, Oct 6, 7:07 AM

Some minor code style comments while I look into this.

llvm/lib/Transforms/Utils/Local.cpp
1600–1658

Perhaps make the return type Optional<EntryValue>?

1634–1638

I think this should be able to be simplified into:

return DIExpression::prepend(DIExpr, DIExpression::StackValue, Offset);

(And perhaps the lambda is not necessary then.)

djtodoro added inline comments.Tue, Oct 6, 7:10 AM
llvm/lib/Transforms/Utils/Local.cpp
1600–1658

Makes sense, I'll refactor it :)

1634–1638

I've used "copy-paste" technique to make the tests working. This piece of code was copied from salvageDebugInfoImpl(), and I think this can be refactored into a static function or so.

djtodoro updated this revision to Diff 296453.Tue, Oct 6, 7:21 AM
  • Use VMContext instead of InsertBB->getParent()->getContext()
  • Additional TODOs:
  • handle findDbgUsers() and dropDebugUsers()
  • refactoring

TODO: Handle Function Inlining

dstenb added a comment.Wed, Oct 7, 2:01 AM

If I understand this correctly, the new {EntryValue, EntryExpr} operands do, if EntryExpr is not undef, specify a location that is identical to the dbg.value's current {Value, Expr} operands, but with DW_OP_LLVM_entry_value implicitly being applied to EntryValue before EntryExpr. Is that correct?

If so, just throwing out some questions (sorry in case I have overlooked the answers for this in the implementation):

  • I assume that we are only interested in EntryValue operands that are Argument values? Should the verifier catch that? Or is there some case where it could anything else than an Argument?
  • What should happen in cases where the dbg.values Value operand already is an Argument? Do we need to specify that value in the EntryValue operand also to ensure that entry values can be emitted?
llvm/lib/Transforms/Utils/Local.cpp
1620–1623

Nit: Can use BI here instead.

If I understand this correctly, the new {EntryValue, EntryExpr} operands do, if EntryExpr is not undef, specify a location that is identical to the dbg.value's current {Value, Expr} operands, but with DW_OP_LLVM_entry_value implicitly being applied to EntryValue before EntryExpr. Is that correct?

Yes, that is correct.

If so, just throwing out some questions (sorry in case I have overlooked the answers for this in the implementation):

  • I assume that we are only interested in EntryValue operands that are Argument values? Should the verifier catch that? Or is there some case where it could anything else than an Argument?

I think we should extend the Verifier with some kind of support for this. Not sure at the moment how we are going to handle this:

fn(%param) {
...
}

isa<Argument>(%param) ==>true
isa<Argument>(%param.something) ==> false (something could be `addr`)

Actually, I extended the Verifier locally, and I am catching some cases during the replaceAllUsesWith() where the entry value needs to be handled differently, e.g.:

llvm.dbg.value(%p1, !var_p1, !DIExpr(), %p1, !DIExpr()) becomes llvm.dbg.value(i32 0, !var_p1, !DIExpr(), i32 0, !DIExpr())

but we want the entry value to be an undef in this situation, so I am wondering whether we need a new representation of the entry-value, something like MetadataAsEntryValue, so we can invalidate such nodes in an efficent way.

  • What should happen in cases where the dbg.values Value operand already is an Argument? Do we need to specify that value in the EntryValue operand also to ensure that entry values can be emitted?

We should, since the Argument (from some point) can start depending on different entry value, please consider this case:

void fn (int p1, int p2) {
   ...
   p1 = p2;
}

And the intrinsics would be:

llvm.dbg.value(%p1, !var_p1, !DIExpr(), %p1, !DIExpr())
llvm.dbg.value(%p2, !var_p1, !DIExpr(), %p2, !DIExpr())
....
llvm.dbg.value(%p1, !var_p1, !DIExpr(), %p2, !DIExpr()) <=== p1 is an argument, but it depends on different entry value

If I understand this correctly, the new {EntryValue, EntryExpr} operands do, if EntryExpr is not undef, specify a location that is identical to the dbg.value's current {Value, Expr} operands, but with DW_OP_LLVM_entry_value implicitly being applied to EntryValue before EntryExpr. Is that correct?

Yes, that is correct.

If so, just throwing out some questions (sorry in case I have overlooked the answers for this in the implementation):

  • I assume that we are only interested in EntryValue operands that are Argument values? Should the verifier catch that? Or is there some case where it could anything else than an Argument?

I think we should extend the Verifier with some kind of support for this. Not sure at the moment how we are going to handle this:

fn(%param) {
...
}

isa<Argument>(%param) ==>true
isa<Argument>(%param.something) ==> false (something could be `addr`)

Actually, I extended the Verifier locally, and I am catching some cases during the replaceAllUsesWith() where the entry value needs to be handled differently, e.g.:

llvm.dbg.value(%p1, !var_p1, !DIExpr(), %p1, !DIExpr()) becomes llvm.dbg.value(i32 0, !var_p1, !DIExpr(), i32 0, !DIExpr())

but we want the entry value to be an undef in this situation, so I am wondering whether we need a new representation of the entry-value, something like MetadataAsEntryValue, so we can invalidate such nodes in an efficent way.

  • What should happen in cases where the dbg.values Value operand already is an Argument? Do we need to specify that value in the EntryValue operand also to ensure that entry values can be emitted?

We should, since the Argument (from some point) can start depending on different entry value, please consider this case:

void fn (int p1, int p2) {
   ...
   p1 = p2;
}

And the intrinsics would be:

llvm.dbg.value(%p1, !var_p1, !DIExpr(), %p1, !DIExpr())
llvm.dbg.value(%p2, !var_p1, !DIExpr(), %p2, !DIExpr())

!var_p2, sorry for my mistake

....
llvm.dbg.value(%p1, !var_p1, !DIExpr(), %p2, !DIExpr()) <=== p1 is an argument, but it depends on different entry value
StephenTozer added inline comments.Mon, Oct 19, 7:16 AM
llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
290–292

Should be fixed up before merging (relatively simple logic reassociation).

llvm/lib/Transforms/Utils/Local.cpp
1630

Nit: I believe you don't need to end the final statement in LLVM_DEBUG with a semicolon.

1632–1633

Would it not be possible to just use salvageDebugInfoImpl directly here? As far as I understand, there's no reason we shouldn't be able to support all of the salvage options for an entry value. Also, it should be possible to perform this salvage repeatedly; i.e. if we have:

void foo(int param) {
  int i = param + 3;
  int j = i + 6;
}

When creating an entry value for j, we could attempt to salvage it, find that the first operand %i is not an argument but can itself be salvaged, then attempt to salvage %i and find that its first operand is an argument, resulting in an entry value %param, DIExpression(DW_OP_plus_uconst, 3, DW_OP_plus_uconst, 6).