This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Support for DW_OP_implicit_pointer (CodeGen phase)
Needs ReviewPublic

Authored by alok on Nov 6 2019, 3:36 AM.

Details

Summary

This patch (5/N) stems from D69787 as suggested by reviewers.

The original patch is being divided as described below.

CodeGen / DwarfInfo-emission phase of the implementation works on IR with DW_OP_implicit_pointer operation to emit debug info. this will have two patch (PATCH 1-2). PATCH-1 has changes for IR Verify and Bitcode changes. The current patch (PATCH-2) will complete the modification for Codegen. It also contains test cases with IR generated with future patches.

IR transformation phase of the implementation deals with preserving debug info while instruction is being erased (optimized out) in IR transformation phase. In this phase all the opportunities to generate DW_OP_implicit_pointer operation in IR, are found and handled appropriately. For this phase multiple patches will be submitted.

PATCH 3) Enhance salvaging implementation to generate DW_OP_implicit_pointer operation
PATCH 4-N) All the opportunities to preserve debug info are dealt with (to support various optimizations)

Current PATCH is PATCH-2. Once this gets accepted, PATCH-3 will be submitted.
After PATCH-3 gets accepted, PATCH 4-N can be submitted independently.

 Summary:

  This patch completes the back-end support for DW_OP_implicit_pointer operation. When it receives IR with DW_OP_LLVM_implicit_pointer operator, it processes that to generate proper debug-info. It generates location lists having DW_OP_implicit_pointer operations.

Testing:
 
   - Added unit tests (with IR generated from future patches) for validation thru llvm-dwarfdump
   - check-llvm, and an end-to-end test using gnu GDB to debug an optimized program (LLDB need to be enhanced to support).
   - check-debuginfo (the debug info integration tests)

Diff Detail

Event Timeline

alok created this revision.Nov 6 2019, 3:36 AM
jmorse added a comment.Nov 6 2019, 9:00 AM

Hi Alok,

Thanks for splitting up the patch and tests; I'm not very familiar with implicit pointers and I'm still working it through my mind, so I have a couple of questions if you'll bear with me. At the moment though, it looks like implicit-pointer dbg.value insts refer directly to DILocalVariables -- this will become a problem if the same function is inlined twice, as there's nothing to distinguish which inlining instance the dbg.value refers to. You can probably make a metadata pair of {DILocalVariable, DILocation} where the DILocation is the inlining location, and use that as the operand of the dbg.value instead. There should also be a test for multiple inlining instances too.

Looking at the tests, am I right in thinking that the "target" of the implicit pointer can change throughout the function? For example

int a = 0, b = 1, c = 2;
int *foo = &a;
call();
foo = &b;
call();
foo = &c;

Is it (theoretically) possible to describe promoted non-local variables with implicit pointers? LLVM will occasionally promote an in-memory value into a register, after which there's no way for current debuginfo to describe where it is.

The code looks like a good start, although I don't know anything about the DWARF backend.

llvm/test/DebugInfo/dwarfdump-implicit_pointer_instcomb.ll
2

I don't believe running clang in an internal LLVM test is supported, this should probably be running llc -O2 -filetype=obj instead? Same with the other tests.

47

Assuming these attributes are un-necessary, they should be dropped as they can be an extra maintenance burden.

alok added a comment.Nov 6 2019, 10:39 AM

Hi Alok,

Thanks for splitting up the patch and tests; I'm not very familiar with implicit pointers and I'm still working it through my mind, so I have a couple of questions if you'll bear with me. At the moment though, it looks like implicit-pointer dbg.value insts refer directly to DILocalVariables -- this will become a problem if the same function is inlined twice, as there's nothing to distinguish which inlining instance the dbg.value refers to. You can probably make a metadata pair of {DILocalVariable, DILocation} where the DILocation is the inlining location, and use that as the operand of the dbg.value instead. There should also be a test for multiple inlining instances too.

Thanks a lot for taking your time to review this, let me try to answer this, actually the functionality of DW_OP_implicit_pointer is to preserve the dereferenced values of optimized out variables, it doesnt preserve the location information. Our interest is to emit first operand which is offset of target variable within debuginfo dwarf section. We preserve DILocalVariable (temporarily) until debug info section is formed and variable's offset within it is known. Please let me know in case I misunderstood the question.

Sure, I shall add a test case.

Looking at the tests, am I right in thinking that the "target" of the implicit pointer can change throughout the function? For example

int a = 0, b = 1, c = 2;
int *foo = &a;
call();
foo = &b;
call();
foo = &c;

Yes, it is possible.

Is it (theoretically) possible to describe promoted non-local variables with implicit pointers? LLVM will occasionally promote an in-memory value into a register, after which there's no way for current debuginfo to describe where it is.

If a variable is a pointer and is eligible to have DW_AT_location attribute (in dwarf), then theoretically it should be possible to describe this attribute using DW_OP_implicit_pointer. Current patch covers for local variables only, we can investigate for non local variables later

The code looks like a good start, although I don't know anything about the DWARF backend.

Thanks again for your comments.

llvm/test/DebugInfo/dwarfdump-implicit_pointer_instcomb.ll
2

Thanks for pointing it out. I shall incorporate this comment.

47

Thanks. I shall incorporate this comment.

alok updated this revision to Diff 228105.Nov 6 2019, 10:56 AM

This is to incorporate review comments.

  • To use llc in place of clang in tests
  • To delete unnecessary attributes from tests
vsk added a subscriber: vsk.Nov 6 2019, 1:35 PM

Hey @alok, thanks for working on this.

Currently, the operands of dbg.value constitute a {value, variable, expression} tuple. This patch seems to change that to {value | variable-to-be-deref'd, variable, expression | expression-with-implicit-pointer}. What alternatives have you considered? While reading your tests, I found it hard to reason about how implicit pointer values are referenced & defined. This made me concerned about the amount of hidden state that the current approach would introduce into IR. Further, I'm not really convinced this does the right thing in the presence of inlining (at least, not yet). And I'm keen to avoid extending SelectionDAG to represent implicit pointer debug values, as that presumably creates work on the GlobalISel side as well.

One possible alternative (haven't fully thought this through yet, but here goes!) might be to have dbg.value(value, variable, expr) return a token value, instead of void. That token could then be referenced later, like: dbg.value(%implicit_pointer_token, <variable>, DIExpression(DW_OP_implicit_pointer(<offset>))). The main advantage of this is that it makes the link between a description of the implicit pointer and its dereference explicit. I think this will be a lot easier to reason about. But there are more concrete benefits. E.g., if a pass were to move the DW_OP_implicit_pointer dbg.value before the def of its implicit pointer token, we'd detect the use-before-def instead of silently emitting the wrong value.

p.s: Could you number your patches (e.g. [1/3], [2/3], etc) to make them easier to find? Could you also make the tests for each patch independent so that they can be landed incrementally?

llvm/test/DebugInfo/dwarfdump-implicit_pointer_instcomb.ll
2

Please include the original C source when applicable, so that future authors have steps to regenerate the test if needed.

llvm/test/DebugInfo/dwarfdump-implicit_pointer_mem2reg.ll
8

Please only capitalize words where necessary.

10

"arr1" is metadata !18, right? Where does lit0 come from?

12

Please structure your checks as:

; <description of test 1>
; CHECK-LABEL: DW_AT_name  (<variable name>)
; CHECK-NEXT: ...

; <description of test 2>
...
llvm/test/DebugInfo/dwarfdump-implicit_pointer_sroa_inline.ll
10

Please avoid hardcoding constants in check lines. If the specific constant isn't important, don't match it. Otherwise, you can capture it with a filecheck variable, e.g.:

DW_OP_implicit_pointer [[DIE:0x.+]] +0
DW_OP_implicit_pointer [[DIE]] +4

Apologies for not having enough time to review this in more detail at the moment, so this feedback is shorting from the hip slightly based on reading the existing conversations and patch description

Broadly:

I'd expect more than 3 patches and in this order:

  1. Backend/DWARF emission support for new debug info IR/metadata/intrinsic usage/DW_OP operators. This should include tests with hand-crafted IR (or generated from future patches - but I think this should be narrow enough that a small example could be given mostly with IR generated from clang/LLVM, and only a minor edit to use the new feature)

2-N) Individual patches, one for each optimization that generates this new representation where possible/useful - each one of these patches should be independent and include tests for the change (tests that are IR->IR tests, testing only the specific optimization)

(the 2-N patches can't come before (1) or the trunk would be broken by passes generating IR that the DWARF backend could not handle)

@probinson @aprantl - do you folks know why implicit_pointer needs to point to another DIE, rather than describing bytes directly? I guess it's intended to handle void*, for instance? (where the type information for the pointee isn't present in the pointer type) But for something like int* if the int isn't in memory (so can't be pointed to) - say, it's in a register, I'd expect to describe the value of the int without any extra DIE references, etc? (who knows if the DIE is in a variable/described by any other DIE). I'm pretty confused by the way this DWARF feature is spec'd & wondering if we should avoid implementing it/instead implement a better extension? But quite possible there's things about the design that I don't understand.

I'd expect more than 3 patches and in this order:

+1.

@probinson @aprantl - do you folks know why implicit_pointer needs to point to another DIE, rather than describing bytes directly? I guess it's intended to handle void*, for instance? (where the type information for the pointee isn't present in the pointer type) But for something like int* if the int isn't in memory (so can't be pointed to) - say, it's in a register, I'd expect to describe the value of the int without any extra DIE references, etc? (who knows if the DIE is in a variable/described by any other DIE). I'm pretty confused by the way this DWARF feature is spec'd & wondering if we should avoid implementing it/instead implement a better extension? But quite possible there's things about the design that I don't understand.

I think the main use-case was for when a pointed-to object is optimized into registers (which don't have addresses), but the pointed-to values are easily found. For example if you have a function that takes a pointer-to-object, and is then inlined, and the optimizer decides it doesn't need the object to be in memory after all. If values from the "pointed-to" object are still findable, you probably know what object that was originally, and so implicit_pointer can refer to that object's DIE. Among other things, this indirection has an advantage if the "pointed-to" object moves around; the pointed-to object's locations are tracked in the pointed-to object's DIE, and the pointer DIE just says "ask that DIE where it is right now."

It might possibly be helpful to read the proposal (http://dwarfstd.org/ShowIssue.php?issue=100831.1) where the first example is a bit more compelling than the examples that made it into the spec, unfortunately.

Note that I have not read this patch, or the others.

davide added a subscriber: davide.Nov 6 2019, 5:21 PM
alok added a comment.Nov 6 2019, 10:15 PM

Apologies for not having enough time to review this in more detail at the moment, so this feedback is shorting from the hip slightly based on reading the existing conversations and patch description

Broadly:

I'd expect more than 3 patches and in this order:

  1. Backend/DWARF emission support for new debug info IR/metadata/intrinsic usage/DW_OP operators. This should include tests with hand-crafted IR (or generated from future patches - but I think this should be narrow enough that a small example could be given mostly with IR generated from clang/LLVM, and only a minor edit to use the new feature)

Agree. Fortunately the current patch (submitted first) contains support for DWARF emission. The current patch also has the testcases with IR generated from future patches.

2-N) Individual patches, one for each optimization that generates this new representation where possible/useful - each one of these patches should be independent and include tests for the change (tests that are IR->IR tests, testing only the specific optimization)

Agree. I am working on splitting this way. I plan to submit second patch (after first patch gets place in trunk) which will include enhanced salvaging debug info utilities to be used (called) in different optimizations. once this second patch goes thru. the remaining individual patches for each optimization will be posted for review.
Please let me know if it sounds good. Thanks for your suggestions @dblaikie.

(the 2-N patches can't come before (1) or the trunk would be broken by passes generating IR that the DWARF backend could not handle)

Agree. I will hold other patches till (1) gets approved.

@probinson @aprantl - do you folks know why implicit_pointer needs to point to another DIE, rather than describing bytes directly? I guess it's intended to handle void*, for instance? (where the type information for the pointee isn't present in the pointer type) But for something like int* if the int isn't in memory (so can't be pointed to) - say, it's in a register, I'd expect to describe the value of the int without any extra DIE references, etc? (who knows if the DIE is in a variable/described by any other DIE). I'm pretty confused by the way this DWARF feature is spec'd & wondering if we should avoid implementing it/instead implement a better extension? But quite possible there's things about the design that I don't understand.

alok marked 4 inline comments as done.Nov 6 2019, 10:48 PM
alok added inline comments.
llvm/test/DebugInfo/dwarfdump-implicit_pointer_mem2reg.ll
8

Sure. I shall incorporate this in next version.

10

In the source program -

int *ptr2 = 0;

In IR it is represented as -

call void @llvm.dbg.value(metadata i32* null, metadata !24, metadata !DIExpression()), !dbg !30
!24 = !DILocalVariable(name: "ptr2", scope: !12, file: !3, line: 51, type: !23)

Without current patch DIE for ptr2 would look like -

0x00000068: DW_TAG_variable

DW_AT_const_value     (0)
DW_AT_name    ("ptr2")
DW_AT_decl_file       ("/home/alok/mem2reg.c")
DW_AT_decl_line       (8)
DW_AT_type    (0x000000ab "int*")
  • Please note that ptr2 is initialized to null (DW_AT_const_value (0))

With current patch -

0x00000068: DW_TAG_variable

DW_AT_location        (0x00000019
   [0x0000000000400480, 0x0000000000400487): DW_OP_lit0, DW_OP_stack_value
   [0x0000000000400487, 0x0000000000400494): DW_OP_implicit_pointer 0x53 +0)
DW_AT_name    ("ptr2")
DW_AT_decl_file       ("Inputs/dwarfdump-implicit_pointer_mem2reg.c")
DW_AT_decl_line       (8)
DW_AT_type    (0x000000c1 "int*")
  • Please note that, now DIE has two values (in different code ranges), so in place of initializing it uses Location list.
12

Thanks for the input, I will incorporate this in next version.

llvm/test/DebugInfo/dwarfdump-implicit_pointer_sroa_inline.ll
10

Thanks for the input. I shall incorporate this in next version.

alok updated this revision to Diff 228184.Nov 7 2019, 1:01 AM
alok retitled this revision from [DebugInfo] Support for DW_OP_implicit_pointer (Post IR transformation phase) to [DebugInfo] Support for DW_OP_implicit_pointer (CodeGen phase).
alok edited the summary of this revision. (Show Details)

This update includes incorporation of review comments from Vedant Kumar (@vsk)

alok marked 2 inline comments as done and 2 inline comments as done.Nov 7 2019, 1:08 AM
alok added inline comments.
llvm/test/DebugInfo/dwarfdump-implicit_pointer_instcomb.ll
2

Incorporated this comment.

llvm/test/DebugInfo/dwarfdump-implicit_pointer_mem2reg.ll
8

Incorporated this comment.

alok marked 9 inline comments as done.Nov 7 2019, 1:13 AM
alok marked an inline comment as done.Nov 7 2019, 1:30 AM
alok marked an inline comment as done.

@alok Thanks for doing this. I just put advices on how to make your tests shorter, so please address them in the other tests as well.

llvm/test/DebugInfo/dwarfdump-implicit_pointer_instcomb.ll
39

TBAA metadata are usually unnecessary as well, so I think you can delete it.

45

Since you dropped the attributes, we can delete those '#2' (#N) from the testcase.

53

producer: "clang version 10.0.0" is enough.

54

We usually make tests as short as possible, so my advice is to avoid your exact path from local PC here, you can just put "/dir".

62

producer: "clang version 10.0.0" is enough.

alok marked 5 inline comments as done.Nov 7 2019, 2:25 AM
alok added inline comments.
llvm/test/DebugInfo/dwarfdump-implicit_pointer_instcomb.ll
39

Thanks for your tricks. I will incorporate this in next version.

45

I shall incorporate this in next version.

53

I shall incorporate this in next version.

54

I shall incorporate this in next version.

62

I shall incorporate this in next version.

alok added a comment.Nov 7 2019, 2:28 AM

@alok Thanks for doing this. I just put advices on how to make your tests shorter, so please address them in the other tests as well.

Thanks a lot for your time and tricks as well. I shall be apply those in next version.

alok updated this revision to Diff 228198.Nov 7 2019, 2:38 AM

Updated for comments from @djtodoro (Djordje Todorovic) to apply tricks to shorten the tests

alok marked 5 inline comments as done.Nov 7 2019, 2:40 AM
dstenb added a subscriber: dstenb.Nov 7 2019, 4:09 AM

Thanks for adding the source reproducers; however, they are typically not stored as separate files (with one exception being tests operating on binary files). Feel free to inline the C code as comments in the IR files though!

Could you please split off a patch just adds the IR, Bitcode and verifier support from this review into a separate one? That one should be very straightforward to review & land and it will make the rest easier.

alok updated this revision to Diff 228357.Nov 7 2019, 8:35 PM

This version is updated for comment of David Stenberg (@dstenb)
(removed original reproducer c programs and added those in test cases itself as comments)

alok added a comment.Nov 7 2019, 8:44 PM

Thanks for adding the source reproducers; however, they are typically not stored as separate files (with one exception being tests operating on binary files). Feel free to inline the C code as comments in the IR files though!

Thanks for your comment. I have incorporated your comment.

alok added a comment.Nov 7 2019, 9:09 PM
In D69886#1736182, @vsk wrote:

Hey @alok, thanks for working on this.

Thanks Vedant for your time in reviewing this.

Currently, the operands of dbg.value constitute a {value, variable, expression} tuple. This patch seems to change that to {value | variable-to-be-deref'd, variable, expression | expression-with-implicit-pointer}. What alternatives have you considered? While reading your tests, I found it hard to reason about how implicit pointer values are referenced & defined. This made me concerned about the amount of hidden state that the current approach would introduce into IR. Further,

While implementing this, my priority was to avoid changing the interface of dbg.value which is

declare void @llvm.dbg.value(metadata, metadata, metadata)

Which permits to have another variable metadata at first argument. Since DW_OP_implicit_pointer suggests that check the value of (optimized out) variable at another variable. Last argument is expression, which can be used to identify that first argument is redirection variable (using isImplicitPointer()). For example

call void @llvm.dbg.value(metadata !16, metadata !25, metadata !DIExpression(DW_OP_implicit_pointer, 0)), !dbg !31
!16 = !DILocalVariable(name: "arr", scope: !12, file: !3, line: 24, type: !17)
!25 = !DILocalVariable(name: "ptr", arg: 1, scope: !26, file: !3, line: 15, type: !29)

which suggests, "ptr" is optimized out but its value is same as "arr", check "arr"

I'm not really convinced this does the right thing in the presence of inlining (at least, not yet).

Currently I have two test cases which test the implementation for inlining. Handling of inilining is done in future patch (IR transformation phase). I can explain it later once I submit that patch. Please do share any scenario which you think should be considered.

And I'm keen to avoid extending SelectionDAG to represent implicit pointer debug values, as that presumably creates work on the GlobalISel side as well.

One possible alternative (haven't fully thought this through yet, but here goes!) might be to have dbg.value(value, variable, expr) return a token value, instead of void. That token could then be referenced later, like: dbg.value(%implicit_pointer_token, <variable>, DIExpression(DW_OP_implicit_pointer(<offset>))). The main advantage of this is that it makes the link between a description of the implicit pointer and its dereference explicit. I think this will be a lot easier to reason about. But there are more concrete benefits. E.g., if a pass were to move the DW_OP_implicit_pointer dbg.value before the def of its implicit pointer token, we'd detect the use-before-def instead of silently emitting the wrong value.

Alternatively, what approach i have taken is to identify the kind of first argument based on third argument, and that solves the same purpose (using function isImplicitPointer()). Code movement part is handled in IR transformation phase (future patch). Please let me know if you have any scenario that should be considered. please let me know in case i did not understand anything.

p.s: Could you number your patches (e.g. [1/3], [2/3], etc) to make them easier to find? Could you also make the tests for each patch independent so that they can be landed incrementally?

Sure I shall keep this in mind.

alok added a comment.Nov 8 2019, 2:54 AM

Could you please split off a patch just adds the IR, Bitcode and verifier support from this review into a separate one? That one should be very straightforward to review & land and it will make the rest easier.

Thank you @aprantl for you suggestion. I have raised D69999 for this.

alok updated this revision to Diff 228631.Nov 10 2019, 10:09 PM
alok edited the summary of this revision. (Show Details)

This patch is updated post split. part of it went to D69999, which is also the dependency for current patch. (for current patch to work, D69999 is needed)

alok updated this revision to Diff 230808.Nov 24 2019, 10:18 AM
alok edited the summary of this revision. (Show Details)

updated to re-base this due to changes in trunk and parent patches.

alok updated this revision to Diff 234505.Dec 18 2019, 4:23 AM

Updated to re-base and included some changes separated from D70644 to keep similar changes together as suggested by @jmorse.