Page MenuHomePhabricator

[Debuginfo][SROA] Need to handle dbg.value in SROA pass.
Needs ReviewPublic

Authored by avl on Thu, Jul 11, 2:00 PM.

Details

Reviewers
aprantl
dblaikie
chandlerc
Group Reviewers
debug-info
Summary

SROA pass processes debug info incorrecly if applied twice.
Specifically, after SROA works first time, instcombine converts dbg.declare
intrinsics into dbg.value. Inlining creates new opportunities for SROA,
so it is called again. This time it does not handle correctly previously
inserted dbg.value intrinsics.

That is IR coming to SROA pass before applied second time :

%result = alloca %struct.S1, align 4
%0 = bitcast %struct.S1* %result to i8*, !dbg !21
call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #3, !dbg !21
%call = call i32 @_Z3foov(), !dbg !22
%coerce.dive = getelementptr inbounds %struct.S1, %struct.S1* %result, i64 0, i32 0, !dbg !22
store i32 %call, i32* %coerce.dive, align 4, !dbg !22
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !23
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !24, metadata !DIExpression()), !dbg !28
%cmp.i = icmp eq i32 %call, 0, !dbg !31
br i1 %cmp.i, label %if.then, label %if.end, !dbg !32

note it has llvm.dbg.value.
That is the output of SROA pass after applied second time:

%call = call i32 @_Z3foov(), !dbg !21
call void @llvm.dbg.value(metadata %struct.S1* undef, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !22
call void @llvm.dbg.value(metadata %struct.S1* undef, metadata !23, metadata !DIExpression()), !dbg !27
%cmp.i = icmp eq i32 %call, 0, !dbg !30
br i1 %cmp.i, label %if.then, label %if.end, !dbg !31

That is the correct output:

%call = call i32 @_Z3foov(), !dbg !21
call void @llvm.dbg.value(metadata i32 %call, metadata !12, metadata !DIExpression()), !dbg !22
%cmp.i = icmp eq i32 %call, 0, !dbg !23
br i1 %cmp.i, label %if.then, label %if.end, !dbg !30

The fix is to remove llvm.dbg.value and insert dbg.declare for new fragments.

Diff Detail

Event Timeline

avl created this revision.Thu, Jul 11, 2:00 PM
vsk added a subscriber: vsk.

Thanks for the patch!

I think the deleteDeadInstructions change looks good, but have some reservations about the way new dbg.declares are inserted. More inline --

llvm/lib/Transforms/Scalar/SROA.cpp
4250

Why not filter the result from llvm::findDbgUsers?

4375

Typically, container sizes are written in decimal, not hex.

4378

Iterating over a DenseSet will give non-deterministic results. You need a vector-like container (or perhaps a SetVector) here.

4379

Please avoid inserting vertical whitespace after open curly braces, and/or whitespace changes in code that is otherwise not being modified.

4393

I'm not sure this fragment expr is valid for, say, dbg.values with OP_deref attached. I think this would result in a dbg.declare with an OP_deref, which doesn't seem to make sense. Currently you're uniquing debug intrinsics by variable name, so you might not hit the issue right away, but eventually it'll pop up.

Perhaps it'd help to start with simple expressions, i.e. empty expressions, and then incrementally build on that + add tests.

avl marked 3 inline comments as done.Fri, Jul 12, 6:13 AM
avl added inline comments.
llvm/lib/Transforms/Scalar/SROA.cpp
4250

In that case size of resulting container would be bigger. If that is not a problem - I would change to llvm::findDbgUsers.

4378

It looks like vector-like container already used here - SmallVector.

4393

Thanks, I would change to empty expression then.

Few nits inline

llvm/lib/Transforms/Scalar/SROA.cpp
4245

s/specifing/specifying/

4246

... are living ...

4248

Spurious space

4249

Spurious newline. I guess doxygen is still smart enough to connect it to the next function, but still better to remove it.

avl updated this revision to Diff 209791.Mon, Jul 15, 4:04 AM

addressed style issues. created empty expression for dbg.value case.

jmorse added a subscriber: jmorse.Mon, Jul 15, 9:15 AM

This is a great bug find -- I'd no idea SROA ran twice! Applying this patch gives a ~3% increase in variable locations covered on a clang-3.4 build.

However, IMHO converting dbg.values to dbg.declares is going to prove problematic. There's no (AFAIK) good way to confirm that a dbg.value was originally _supposed_ to be a dbg.declare, and so dbg.values that accidentally refer to an alloca might be converted. There are all kinds of optimisations and salvaging that could make dbg.values point back at an alloca.

For example, with your current patch applied to r365447, if I add a pointer "ptr" that points at the alloca, and another field to the struct:

struct S1 {
    int p1;
    long int p2; // jmorse
    
    bool IsNull (  ) {
        return p1 == 0;
    }
};
    
S1 foo ( void );
    
int bar (  ) {
    
    S1 result = foo();
    S1 * ptr = &result; //jmorse
    
    if ( result.IsNull() )
        return 0;
    
    return result.p1 + 1;
}

And run "./bin/clang++ -O2 -g test.c -o test.o -c", then objdump and llvm-dwarfdump, the alloca gets promoted:

0000000000000000 <_Z3barv>:
   0:   50                      push   %rax
   1:   e8 00 00 00 00          callq  6 <_Z3barv+0x6>
                        2: R_X86_64_PLT32       _Z3foov-0x4
   6:   8d 48 01                lea    0x1(%rax),%ecx
   9:   85 c0                   test   %eax,%eax
   b:   0f 45 c1                cmovne %ecx,%eax
   e:   59                      pop    %rcx
   f:   c3                      retq

But ptr isn't optimised out as it should be. According to dwarfdump:

0x000000b1: DW_TAG_variable
              DW_AT_location    (0x00000000
                 [0x0000000000000006,  0x000000000000000e): DW_OP_reg0 RAX, DW_OP_piece 0x4)
              DW_AT_name        ("ptr")
              DW_AT_decl_file   ("/faster/fs/release/test.c")
              DW_AT_decl_line   (15)
              DW_AT_type        (0x0000008f "S1*")

The location is given as $eax, which actually contains one of the struct fields, and would be misleading. This is down to its dbg.value becoming a dbg.declare, when really it should have been dropped. (This doesn't happen if you don't add the extra "p2" field due to ConvertDebugDeclareToDebugValue being confused at types that don't make sense).

One potential workaround might be to examine the DIExpression to see whether or not it actually dereferences the alloca address and has an aggregate type. However that means digging through the DIExpression operands, which I believe can become complicated if a lot of optimization has happened. There might also be non-C/C++ frontends that expect to be able to temporarily bind variable names to memory, then move them later. Converting their dbg.values to dbg.declares would change the duration that the variable referred to a particular piece of memory.

~

Instead, maybe we could just not convert dbg.declare for structs to dbg.values during instcombine in the first place? While digging into this I noticed that the comment at [0] says only scalars are supposed to be dbg.declare-converted during instcombine, but the code allows structs to slip through. I don't know the history of LowerDbgDeclare though, do other debug-info people know why structs get converted there?

~

For the avoidance of doubt, this is a great find with a big coverage win, but I think we need a safer way of finding the defective dbg.values (or; not converting them in the first place).

[0] https://github.com/llvm/llvm-project/blob/db101864bdc938deb1d63fe4f7da761bd38e5cae/llvm/lib/Transforms/Utils/Local.cpp#L1419

avl added a comment.Tue, Jul 16, 2:29 PM

@jmorse Thank you for this case and explanations. I will evaluate it more and adapt fix accordingly.

However, IMHO converting dbg.values to dbg.declares is going to prove problematic.
There's no (AFAIK) good way to confirm that a dbg.value was originally _supposed_ to be a dbg.declare,
and so dbg.values that accidentally refer to an alloca might be converted.
There are all kinds of optimisations and salvaging that could make dbg.values point back at an alloca.

Probably, it is OK if dbg.value was not originally _supposed_ to be a dbg.declare but was converted in this place.
if SROA is able to do optimization then such dbg.declare would be converted into dbg.value later in PromoteMemToReg.
i.e. If after all optimizations dbg.value relates to valid slice of alloca - then it is OK to match it with that slice inserting dbg.declare,
which would be replaced with dbg.value.

This patch uses incorrect dbg.value for your example and finally debug information for "ptr" put in the result.
That("ptr") actually should be removed. If it would take another dbg.value as the basis then the result would be correct.

Let`s check IR :

before first SROA

  %retval = alloca i32, align 4
%result = alloca %struct.S1, align 8
%ptr = alloca %struct.S1*, align 8
%cleanup.dest.slot = alloca i32, align 4
%0 = bitcast %struct.S1* %result to i8*, !dbg !25
call void @llvm.lifetime.start.p0i8(i64 16, i8* %0) #5, !dbg !25
call void @llvm.dbg.declare(metadata %struct.S1* %result, metadata !12, metadata !DIExpression()), !dbg !26
%call = call { i32, i64 } @_Z3foov(), !dbg !27
%1 = bitcast %struct.S1* %result to { i32, i64 }*, !dbg !27
%2 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %1, i32 0, i32 0, !dbg !27
%3 = extractvalue { i32, i64 } %call, 0, !dbg !27
store i32 %3, i32* %2, align 8, !dbg !27
%4 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %1, i32 0, i32 1, !dbg !27
%5 = extractvalue { i32, i64 } %call, 1, !dbg !27
store i64 %5, i64* %4, align 8, !dbg !27
%6 = bitcast %struct.S1** %ptr to i8*, !dbg !28
call void @llvm.lifetime.start.p0i8(i64 8, i8* %6) #5, !dbg !28
call void @llvm.dbg.declare(metadata %struct.S1** %ptr, metadata !23, metadata !DIExpression()), !dbg !29
store %struct.S1* %result, %struct.S1** %ptr, align 8, !dbg !29, !tbaa !30
%call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* %result), !dbg !34
br i1 %call1, label %if.then, label %if.end, !dbg !3

after first SROA

%result = alloca %struct.S1, align 8
%0 = bitcast %struct.S1* %result to i8*, !dbg !25
call void @llvm.lifetime.start.p0i8(i64 16, i8* %0) #5, !dbg !25
call void @llvm.dbg.declare(metadata %struct.S1* %result, metadata !12, metadata !DIExpression()), !dbg !26
%call = call { i32, i64 } @_Z3foov(), !dbg !27
%1 = bitcast %struct.S1* %result to { i32, i64 }*, !dbg !27
%2 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %1, i32 0, i32 0, !dbg !27
%3 = extractvalue { i32, i64 } %call, 0, !dbg !27
store i32 %3, i32* %2, align 8, !dbg !27
%4 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %1, i32 0, i32 1, !dbg !27
%5 = extractvalue { i32, i64 } %call, 1, !dbg !27
store i64 %5, i64* %4, align 8, !dbg !27
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !23, metadata !DIExpression()), !dbg !28
%call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* %result), !dbg !29
br i1 %call1, label %if.then, label %if.end, !dbg !31

before second SROA

%result = alloca %struct.S1, align 8
%0 = bitcast %struct.S1* %result to i8*, !dbg !25
call void @llvm.lifetime.start.p0i8(i64 16, i8* nonnull %0) #4, !dbg !25
%call = call { i32, i64 } @_Z3foov(), !dbg !26
%1 = getelementptr inbounds %struct.S1, %struct.S1* %result, i64 0, i32 0, !dbg !26
%2 = extractvalue { i32, i64 } %call, 0, !dbg !26
store i32 %2, i32* %1, align 8, !dbg !26
%3 = getelementptr inbounds %struct.S1, %struct.S1* %result, i64 0, i32 1, !dbg !26
%4 = extractvalue { i32, i64 } %call, 1, !dbg !26
store i64 %4, i64* %3, align 8, !dbg !26
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !23, metadata !DIExpression()), !dbg !27
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !27
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !28, metadata !DIExpression()), !dbg !31
%cmp.i = icmp eq i32 %2, 0, !dbg !34
br i1 %cmp.i, label %if.then, label %if.end, !dbg !35

after second SROA

call void @llvm.dbg.declare(metadata !2, metadata !23, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32)), !dbg !25
%call = call { i32, i64 } @_Z3foov(), !dbg !26
%0 = extractvalue { i32, i64 } %call, 0, !dbg !26
call void @llvm.dbg.value(metadata i32 %0, metadata !23, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !25
%1 = extractvalue { i32, i64 } %call, 1, !dbg !26
call void @llvm.dbg.value(metadata i64 %1, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64)), !dbg !25
%cmp.i = icmp eq i32 %0, 0, !dbg !27
br i1 %cmp.i, label %if.then, label %if.end, !dbg !33

before second SROA it has three dbg.values pointing to the same alloca :

"ptr:bar" call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !23, metadata !DIExpression()), !dbg !27
"result:bar": call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !27
"this:foo" call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !28, metadata !DIExpression()), !dbg !31

"ptr:bar" and "this:foo" should be removed.
"result:bar" should be used as the alloca instruction for fragments. thus there should be inserted corresponding dbg.declare for fragments which would be promoted into dbg.values in PromoteMemToReg.

function which would select proper dbg.value should probably be based on variable, expression and fragment.
Probably it could refuse those dbg.values which do not dereference the alloca address and do not have an aggregate type as you suggested.

One potential workaround might be to examine the DIExpression to see whether or not it actually dereferences the alloca address and has an aggregate type. However that means digging through the DIExpression operands, which I believe can become complicated if a lot of optimization has happened. There might also be non-C/C++ frontends that expect to be able to temporarily bind variable names to memory, then move them later. Converting their dbg.values to dbg.declares would change the duration that the variable referred to a particular piece of memory.

I would give it a try. i.e. I will write a function which would examine DIExpression to decide whether dbg.value should be deleted or not.
If I understood correctly - converting dbg.gvalues into dbg.declare should not create a problem since all these dbg.declare should be changed in dbg.values in PromoteMemToReg.

Instead, maybe we could just not convert dbg.declare for structs to dbg.values during instcombine in the first place? While digging into this I noticed that the comment at [0] says only scalars are supposed to be dbg.declare-converted during instcombine, but the code allows structs to slip through. I don't know the history of LowerDbgDeclare though, do other debug-info people know why structs get converted there?

I think it would be good to not to limit conversions. There could probably be another optimizations(not only instcombine) which would produce dbg.values. SROA itself produces dbg.values.
Thus it should be able to work with them if apllied twice.

Probably, it is OK if dbg.value was not originally _supposed_ to be a dbg.declare but was converted in this place.

Ah, this is core of the matter I think. Converting a dbg.value into a dbg.declare implicitly extends the duration ("lifetime" in the docs [0]) of the variable location from "until the next dbg.value" to "the entire scope". Consider what happens if we make the modifications here [1] to your test case. Imagine an IR producer that wants to temporarily bind some variable to a field of 'result' (the dbg.value added with DW_OP_deref), and later assigns '1' to the variable with the second dbg.value, to cover the rest of the function.

(Note that I've also disabled lines 4394->4396 of SROA.cpp, which deletes newly created dbg.declares if there are any more to create).

With this patch (and mod), the first added dbg.value intrinsic becomes a dbg.declare, which extends the lifetime of the location to overlap the subsequent dbg.value(1,...). Mem2reg will later add dbg.value intrinsics where there were stores, giving us this:

define dso_local i32 @_Z3barv() #0 !dbg !7 {
entry:
  %call = call i32 @_Z3foov(), !dbg !23
  call void @llvm.dbg.value(metadata i64 1, metadata !21, metadata !DIExpression()), !dbg !24
  call void @llvm.dbg.value(metadata i32 %call, metadata !12, metadata !DIExpression()), !dbg !25
  call void @llvm.dbg.value(metadata i32 %call, metadata !21, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !25
  call void @llvm.dbg.value(metadata i32 %call, metadata !26, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !30
  %cmp.i = icmp eq i32 %call, 0, !dbg !33
  br i1 %cmp.i, label %if.then, label %if.end, !dbg !34

(!21 = the variable I added, !12 = "result", !26 = "this").

In this circumstance, the variable I added now points somewhere else (which might be fixable). However the duration of the variable location has changed too: the variable was supposed to take on the value "1" for the rest of the function, but instead it gets some other location. I really doubt this is fixable -- both dbg.declare and dbg.value are working as intented, it's converting one to another that changes the lifetimes.

Note that clang probably wouldn't produce that input code ever; however LLVM has to support arbitary IR producers. I would also expect the first dbg.value I added to be dropped by LLVM at some point anyway: baking lots of logic into a DIExpression makes it harder for subsequent optimisations to preserve dbg.values.

[0] https://llvm.org/docs/SourceLevelDebugging.html#debugger-intrinsic-functions
[1] https://reviews.llvm.org/P8157