Page MenuHomePhabricator

[Debuginfo][SROA] Need to handle dbg.value in SROA pass.
ClosedPublic

Authored by avl on Jul 11 2019, 2:00 PM.

Details

Summary

SROA pass processes debug info incorrecly if applied twice.
Specifically, after SROA works first time, instcombine converts dbg.declare
intrinsics into dbg.value. Inlining creates new opportunities for SROA,
so it is called again. This time it does not handle correctly previously
inserted dbg.value intrinsics.

That is IR coming to SROA pass before applied second time :

%result = alloca %struct.S1, align 4
%0 = bitcast %struct.S1* %result to i8*, !dbg !21
call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #3, !dbg !21
%call = call i32 @_Z3foov(), !dbg !22
%coerce.dive = getelementptr inbounds %struct.S1, %struct.S1* %result, i64 0, i32 0, !dbg !22
store i32 %call, i32* %coerce.dive, align 4, !dbg !22
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !23
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !24, metadata !DIExpression()), !dbg !28
%cmp.i = icmp eq i32 %call, 0, !dbg !31
br i1 %cmp.i, label %if.then, label %if.end, !dbg !32

note it has llvm.dbg.value.
That is the output of SROA pass after applied second time:

%call = call i32 @_Z3foov(), !dbg !21
call void @llvm.dbg.value(metadata %struct.S1* undef, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !22
call void @llvm.dbg.value(metadata %struct.S1* undef, metadata !23, metadata !DIExpression()), !dbg !27
%cmp.i = icmp eq i32 %call, 0, !dbg !30
br i1 %cmp.i, label %if.then, label %if.end, !dbg !31

That is the correct output:

%call = call i32 @_Z3foov(), !dbg !21
call void @llvm.dbg.value(metadata i32 %call, metadata !12, metadata !DIExpression()), !dbg !22
%cmp.i = icmp eq i32 %call, 0, !dbg !23
br i1 %cmp.i, label %if.then, label %if.end, !dbg !30

The fix is to remove llvm.dbg.value and insert dbg.declare for new fragments.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vsk added a subscriber: vsk.

Thanks for the patch!

I think the deleteDeadInstructions change looks good, but have some reservations about the way new dbg.declares are inserted. More inline --

llvm/lib/Transforms/Scalar/SROA.cpp
4250 ↗(On Diff #209322)

Why not filter the result from llvm::findDbgUsers?

4376 ↗(On Diff #209322)

Typically, container sizes are written in decimal, not hex.

4379 ↗(On Diff #209322)

Iterating over a DenseSet will give non-deterministic results. You need a vector-like container (or perhaps a SetVector) here.

4380 ↗(On Diff #209322)

Please avoid inserting vertical whitespace after open curly braces, and/or whitespace changes in code that is otherwise not being modified.

4392 ↗(On Diff #209322)

I'm not sure this fragment expr is valid for, say, dbg.values with OP_deref attached. I think this would result in a dbg.declare with an OP_deref, which doesn't seem to make sense. Currently you're uniquing debug intrinsics by variable name, so you might not hit the issue right away, but eventually it'll pop up.

Perhaps it'd help to start with simple expressions, i.e. empty expressions, and then incrementally build on that + add tests.

avl marked 3 inline comments as done.Jul 12 2019, 6:13 AM
avl added inline comments.
llvm/lib/Transforms/Scalar/SROA.cpp
4250 ↗(On Diff #209322)

In that case size of resulting container would be bigger. If that is not a problem - I would change to llvm::findDbgUsers.

4379 ↗(On Diff #209322)

It looks like vector-like container already used here - SmallVector.

4392 ↗(On Diff #209322)

Thanks, I would change to empty expression then.

Few nits inline

llvm/lib/Transforms/Scalar/SROA.cpp
4245 ↗(On Diff #209322)

s/specifing/specifying/

4246 ↗(On Diff #209322)

... are living ...

4248 ↗(On Diff #209322)

Spurious space

4249 ↗(On Diff #209322)

Spurious newline. I guess doxygen is still smart enough to connect it to the next function, but still better to remove it.

avl updated this revision to Diff 209791.Jul 15 2019, 4:04 AM

addressed style issues. created empty expression for dbg.value case.

jmorse added a subscriber: jmorse.Jul 15 2019, 9:15 AM

This is a great bug find -- I'd no idea SROA ran twice! Applying this patch gives a ~3% increase in variable locations covered on a clang-3.4 build.

However, IMHO converting dbg.values to dbg.declares is going to prove problematic. There's no (AFAIK) good way to confirm that a dbg.value was originally _supposed_ to be a dbg.declare, and so dbg.values that accidentally refer to an alloca might be converted. There are all kinds of optimisations and salvaging that could make dbg.values point back at an alloca.

For example, with your current patch applied to r365447, if I add a pointer "ptr" that points at the alloca, and another field to the struct:

struct S1 {
    int p1;
    long int p2; // jmorse
    
    bool IsNull (  ) {
        return p1 == 0;
    }
};
    
S1 foo ( void );
    
int bar (  ) {
    
    S1 result = foo();
    S1 * ptr = &result; //jmorse
    
    if ( result.IsNull() )
        return 0;
    
    return result.p1 + 1;
}

And run "./bin/clang++ -O2 -g test.c -o test.o -c", then objdump and llvm-dwarfdump, the alloca gets promoted:

0000000000000000 <_Z3barv>:
   0:   50                      push   %rax
   1:   e8 00 00 00 00          callq  6 <_Z3barv+0x6>
                        2: R_X86_64_PLT32       _Z3foov-0x4
   6:   8d 48 01                lea    0x1(%rax),%ecx
   9:   85 c0                   test   %eax,%eax
   b:   0f 45 c1                cmovne %ecx,%eax
   e:   59                      pop    %rcx
   f:   c3                      retq

But ptr isn't optimised out as it should be. According to dwarfdump:

0x000000b1: DW_TAG_variable
              DW_AT_location    (0x00000000
                 [0x0000000000000006,  0x000000000000000e): DW_OP_reg0 RAX, DW_OP_piece 0x4)
              DW_AT_name        ("ptr")
              DW_AT_decl_file   ("/faster/fs/release/test.c")
              DW_AT_decl_line   (15)
              DW_AT_type        (0x0000008f "S1*")

The location is given as $eax, which actually contains one of the struct fields, and would be misleading. This is down to its dbg.value becoming a dbg.declare, when really it should have been dropped. (This doesn't happen if you don't add the extra "p2" field due to ConvertDebugDeclareToDebugValue being confused at types that don't make sense).

One potential workaround might be to examine the DIExpression to see whether or not it actually dereferences the alloca address and has an aggregate type. However that means digging through the DIExpression operands, which I believe can become complicated if a lot of optimization has happened. There might also be non-C/C++ frontends that expect to be able to temporarily bind variable names to memory, then move them later. Converting their dbg.values to dbg.declares would change the duration that the variable referred to a particular piece of memory.

~

Instead, maybe we could just not convert dbg.declare for structs to dbg.values during instcombine in the first place? While digging into this I noticed that the comment at [0] says only scalars are supposed to be dbg.declare-converted during instcombine, but the code allows structs to slip through. I don't know the history of LowerDbgDeclare though, do other debug-info people know why structs get converted there?

~

For the avoidance of doubt, this is a great find with a big coverage win, but I think we need a safer way of finding the defective dbg.values (or; not converting them in the first place).

[0] https://github.com/llvm/llvm-project/blob/db101864bdc938deb1d63fe4f7da761bd38e5cae/llvm/lib/Transforms/Utils/Local.cpp#L1419

avl added a comment.Jul 16 2019, 2:29 PM

@jmorse Thank you for this case and explanations. I will evaluate it more and adapt fix accordingly.

However, IMHO converting dbg.values to dbg.declares is going to prove problematic.
There's no (AFAIK) good way to confirm that a dbg.value was originally _supposed_ to be a dbg.declare,
and so dbg.values that accidentally refer to an alloca might be converted.
There are all kinds of optimisations and salvaging that could make dbg.values point back at an alloca.

Probably, it is OK if dbg.value was not originally _supposed_ to be a dbg.declare but was converted in this place.
if SROA is able to do optimization then such dbg.declare would be converted into dbg.value later in PromoteMemToReg.
i.e. If after all optimizations dbg.value relates to valid slice of alloca - then it is OK to match it with that slice inserting dbg.declare,
which would be replaced with dbg.value.

This patch uses incorrect dbg.value for your example and finally debug information for "ptr" put in the result.
That("ptr") actually should be removed. If it would take another dbg.value as the basis then the result would be correct.

Let`s check IR :

before first SROA

  %retval = alloca i32, align 4
%result = alloca %struct.S1, align 8
%ptr = alloca %struct.S1*, align 8
%cleanup.dest.slot = alloca i32, align 4
%0 = bitcast %struct.S1* %result to i8*, !dbg !25
call void @llvm.lifetime.start.p0i8(i64 16, i8* %0) #5, !dbg !25
call void @llvm.dbg.declare(metadata %struct.S1* %result, metadata !12, metadata !DIExpression()), !dbg !26
%call = call { i32, i64 } @_Z3foov(), !dbg !27
%1 = bitcast %struct.S1* %result to { i32, i64 }*, !dbg !27
%2 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %1, i32 0, i32 0, !dbg !27
%3 = extractvalue { i32, i64 } %call, 0, !dbg !27
store i32 %3, i32* %2, align 8, !dbg !27
%4 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %1, i32 0, i32 1, !dbg !27
%5 = extractvalue { i32, i64 } %call, 1, !dbg !27
store i64 %5, i64* %4, align 8, !dbg !27
%6 = bitcast %struct.S1** %ptr to i8*, !dbg !28
call void @llvm.lifetime.start.p0i8(i64 8, i8* %6) #5, !dbg !28
call void @llvm.dbg.declare(metadata %struct.S1** %ptr, metadata !23, metadata !DIExpression()), !dbg !29
store %struct.S1* %result, %struct.S1** %ptr, align 8, !dbg !29, !tbaa !30
%call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* %result), !dbg !34
br i1 %call1, label %if.then, label %if.end, !dbg !3

after first SROA

%result = alloca %struct.S1, align 8
%0 = bitcast %struct.S1* %result to i8*, !dbg !25
call void @llvm.lifetime.start.p0i8(i64 16, i8* %0) #5, !dbg !25
call void @llvm.dbg.declare(metadata %struct.S1* %result, metadata !12, metadata !DIExpression()), !dbg !26
%call = call { i32, i64 } @_Z3foov(), !dbg !27
%1 = bitcast %struct.S1* %result to { i32, i64 }*, !dbg !27
%2 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %1, i32 0, i32 0, !dbg !27
%3 = extractvalue { i32, i64 } %call, 0, !dbg !27
store i32 %3, i32* %2, align 8, !dbg !27
%4 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %1, i32 0, i32 1, !dbg !27
%5 = extractvalue { i32, i64 } %call, 1, !dbg !27
store i64 %5, i64* %4, align 8, !dbg !27
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !23, metadata !DIExpression()), !dbg !28
%call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* %result), !dbg !29
br i1 %call1, label %if.then, label %if.end, !dbg !31

before second SROA

%result = alloca %struct.S1, align 8
%0 = bitcast %struct.S1* %result to i8*, !dbg !25
call void @llvm.lifetime.start.p0i8(i64 16, i8* nonnull %0) #4, !dbg !25
%call = call { i32, i64 } @_Z3foov(), !dbg !26
%1 = getelementptr inbounds %struct.S1, %struct.S1* %result, i64 0, i32 0, !dbg !26
%2 = extractvalue { i32, i64 } %call, 0, !dbg !26
store i32 %2, i32* %1, align 8, !dbg !26
%3 = getelementptr inbounds %struct.S1, %struct.S1* %result, i64 0, i32 1, !dbg !26
%4 = extractvalue { i32, i64 } %call, 1, !dbg !26
store i64 %4, i64* %3, align 8, !dbg !26
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !23, metadata !DIExpression()), !dbg !27
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !27
call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !28, metadata !DIExpression()), !dbg !31
%cmp.i = icmp eq i32 %2, 0, !dbg !34
br i1 %cmp.i, label %if.then, label %if.end, !dbg !35

after second SROA

call void @llvm.dbg.declare(metadata !2, metadata !23, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32)), !dbg !25
%call = call { i32, i64 } @_Z3foov(), !dbg !26
%0 = extractvalue { i32, i64 } %call, 0, !dbg !26
call void @llvm.dbg.value(metadata i32 %0, metadata !23, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !25
%1 = extractvalue { i32, i64 } %call, 1, !dbg !26
call void @llvm.dbg.value(metadata i64 %1, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64)), !dbg !25
%cmp.i = icmp eq i32 %0, 0, !dbg !27
br i1 %cmp.i, label %if.then, label %if.end, !dbg !33

before second SROA it has three dbg.values pointing to the same alloca :

"ptr:bar" call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !23, metadata !DIExpression()), !dbg !27
"result:bar": call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !27
"this:foo" call void @llvm.dbg.value(metadata %struct.S1* %result, metadata !28, metadata !DIExpression()), !dbg !31

"ptr:bar" and "this:foo" should be removed.
"result:bar" should be used as the alloca instruction for fragments. thus there should be inserted corresponding dbg.declare for fragments which would be promoted into dbg.values in PromoteMemToReg.

function which would select proper dbg.value should probably be based on variable, expression and fragment.
Probably it could refuse those dbg.values which do not dereference the alloca address and do not have an aggregate type as you suggested.

One potential workaround might be to examine the DIExpression to see whether or not it actually dereferences the alloca address and has an aggregate type. However that means digging through the DIExpression operands, which I believe can become complicated if a lot of optimization has happened. There might also be non-C/C++ frontends that expect to be able to temporarily bind variable names to memory, then move them later. Converting their dbg.values to dbg.declares would change the duration that the variable referred to a particular piece of memory.

I would give it a try. i.e. I will write a function which would examine DIExpression to decide whether dbg.value should be deleted or not.
If I understood correctly - converting dbg.gvalues into dbg.declare should not create a problem since all these dbg.declare should be changed in dbg.values in PromoteMemToReg.

Instead, maybe we could just not convert dbg.declare for structs to dbg.values during instcombine in the first place? While digging into this I noticed that the comment at [0] says only scalars are supposed to be dbg.declare-converted during instcombine, but the code allows structs to slip through. I don't know the history of LowerDbgDeclare though, do other debug-info people know why structs get converted there?

I think it would be good to not to limit conversions. There could probably be another optimizations(not only instcombine) which would produce dbg.values. SROA itself produces dbg.values.
Thus it should be able to work with them if apllied twice.

Probably, it is OK if dbg.value was not originally _supposed_ to be a dbg.declare but was converted in this place.

Ah, this is core of the matter I think. Converting a dbg.value into a dbg.declare implicitly extends the duration ("lifetime" in the docs [0]) of the variable location from "until the next dbg.value" to "the entire scope". Consider what happens if we make the modifications here [1] to your test case. Imagine an IR producer that wants to temporarily bind some variable to a field of 'result' (the dbg.value added with DW_OP_deref), and later assigns '1' to the variable with the second dbg.value, to cover the rest of the function.

(Note that I've also disabled lines 4394->4396 of SROA.cpp, which deletes newly created dbg.declares if there are any more to create).

With this patch (and mod), the first added dbg.value intrinsic becomes a dbg.declare, which extends the lifetime of the location to overlap the subsequent dbg.value(1,...). Mem2reg will later add dbg.value intrinsics where there were stores, giving us this:

define dso_local i32 @_Z3barv() #0 !dbg !7 {
entry:
  %call = call i32 @_Z3foov(), !dbg !23
  call void @llvm.dbg.value(metadata i64 1, metadata !21, metadata !DIExpression()), !dbg !24
  call void @llvm.dbg.value(metadata i32 %call, metadata !12, metadata !DIExpression()), !dbg !25
  call void @llvm.dbg.value(metadata i32 %call, metadata !21, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !25
  call void @llvm.dbg.value(metadata i32 %call, metadata !26, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !30
  %cmp.i = icmp eq i32 %call, 0, !dbg !33
  br i1 %cmp.i, label %if.then, label %if.end, !dbg !34

(!21 = the variable I added, !12 = "result", !26 = "this").

In this circumstance, the variable I added now points somewhere else (which might be fixable). However the duration of the variable location has changed too: the variable was supposed to take on the value "1" for the rest of the function, but instead it gets some other location. I really doubt this is fixable -- both dbg.declare and dbg.value are working as intented, it's converting one to another that changes the lifetimes.

Note that clang probably wouldn't produce that input code ever; however LLVM has to support arbitary IR producers. I would also expect the first dbg.value I added to be dropped by LLVM at some point anyway: baking lots of logic into a DIExpression makes it harder for subsequent optimisations to preserve dbg.values.

[0] https://llvm.org/docs/SourceLevelDebugging.html#debugger-intrinsic-functions
[1] https://reviews.llvm.org/P8157

avl updated this revision to Diff 211027.Jul 22 2019, 1:27 AM

added check for aggregate dbg.value.

avl added a comment.Jul 22 2019, 1:31 AM

@jmorse

Converting a dbg.value into a dbg.declare implicitly extends the duration ("lifetime" in the docs [0]) of the variable location from "until the next dbg.value" to "the entire scope". Consider what happens if we make the modifications here [1] to your test case. Imagine an IR producer that wants to temporarily bind some variable to a field of 'result' (the dbg.value added with DW_OP_deref), and later assigns '1' to the variable with the second dbg.value, to cover the rest of the function.

Agreed. Converting a dbg.value into a dbg.declare implicitly extends the duration of the variable location from "until the next dbg.value" to "the entire scope".

Though in this case, it looks like the scope of interest match with scope of dbg.declare for alloca load/stores and does not breake other non-memory values.

I updated a patch with a routine which refuses dbg.values which are not aggregate and do not have DW_OP_deref in expression.

previous version of the patch generates incorrect dbg.values intrinsics for above example. But the source of the problem here is not dbg.value->dbg.declare->dbg.value conversion by itself. The problem is in that there were used incorrect dbg.value as a source. Dbg.value for variable "bess", which is a pointer to "result", should be deleted(since the pointer is optimized out). Instead it was used to create dbg.declare. Which later converted to incorrect dbg.value :

call void @llvm.dbg.value(metadata i32 %call, metadata !21, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !25

New version of the patch implements checking expression for aggregate values and produces correct result for that test case:

entry:
  %call = call i32 @_Z3foov(), !dbg !23
  call void @llvm.dbg.value(metadata i64 1, metadata !21, metadata !DIExpression()), !dbg !24
  call void @llvm.dbg.value(metadata i32 %call, metadata !12, metadata !DIExpression()), !dbg !25
  %cmp.i = icmp eq i32 %call, 0, !dbg !26
  br i1 %cmp.i, label %if.then, label %if.end, !dbg !33

All dbg.values which relate to variable but not affecting alloca would be preserved after dbg.value->dbg.declare->dbg.value conversion(like metadata i64 1, metadata !21). They just would not be taken into account by findDeclarations function. mem2reg conversion (which is done for alloca) do not breake them. Inserting dbg.declare allows to convert memory accesses by mem2reg later. Please check additional test case - llvm/test/DebugInfo/X86/sroa-after-inlining2.ll. all constant expressions are used correctly there.

Assuming all inserted dbg.declare would be converted out in mem2reg - that is probably OK to insert dbg.declare here and location intervals looks correct.

If that is not OK, then as an alternative there could be implemented inserting dbg.addr for corresponding dbg.values.

What do you think ?

rnk added a subscriber: rnk.Jul 22 2019, 11:39 AM

There's a lot of text here, I'm sorry I haven't read it all, but I think we should try to avoid getting in this situation where dbg.value points to an alloca in the first place. In Chromium, we use -instcombine-lower-dbg-declare to prevent instcombine from conservatively demoting from the "high availability" dbg.declare to the "always accurate but often missing" dbg.value. I don't really have time to work on this or participate in the discussion, so I'll just drop my two cents in and run away... =S

NB, I haven't forgotten this (adding self as reviewer to keep track of it), but am out of office for a while. I think this hits a wider LLVM problem of values going in and out of memory (see discussion in [0] for example) and our ability to describe those circumstances.

[0] https://bugs.llvm.org/show_bug.cgi?id=34136

avl added a comment.Aug 8 2019, 7:03 AM

NB, I haven't forgotten this (adding self as reviewer to keep track of it), but am out of office for a while. I think this hits a wider LLVM problem of values going in and out of memory (see discussion in [0] for example) and our ability to describe those circumstances.

[0] https://bugs.llvm.org/show_bug.cgi?id=34136

It looks like D37768 was done to solve the problem of values going in and out of memory.
i.e. dbg.addr should point to memory locations, dbg.value for register or constant values.
AFAIU your previous recommendation to avoid converting dbg.declare in instcombiner as well as similar recommendation done by @rnk - they both based on the fact that dbg.addr solution is not fully implemented yet. For SROA case it means : there should not be dbg.value pointing into an alloca. There should be dbg.addr pointing to alloca. Code which do dbg.declare lowering in instcombiner is just temporarily solution until dbg.addr is properly implemented.

Thus, for the current moment I think to implement your original suggestion with avoiding dbg.declare lowering in instcombiner for aggregates.

Sorry for the long delay --

AFAIU your previous recommendation to avoid converting dbg.declare in instcombiner as well as similar recommendation done by @rnk - they both based on the fact that dbg.addr solution is not fully implemented yet. For SROA case it means : there should not be dbg.value pointing into an alloca. There should be dbg.addr pointing to alloca. Code which do dbg.declare lowering in instcombiner is just temporarily solution until dbg.addr is properly implemented.

That sounds about right -- using dbg.addr would avoid having to think about what's in the DIExpression, and is subject to flow control so would mix well with dbg.value's for the same variable. Alas, I don't know what kind of state dbg.addr is in :o

Thus, for the current moment I think to implement your original suggestion with avoiding dbg.declare lowering in instcombiner for aggregates.

I think it's worth exploring; at the very least, seeing what the other debug-info reviewers think of that path would be good.

avl updated this revision to Diff 215628.Aug 16 2019, 9:58 AM

I implemented the solution which avoids lowering dbg.declare for structures. Please check updated patch. Note, I marked two tests as XFAIL since they check for lowering which is not done with this patch. Though it looks to me that it would be better to make dbg.addr to work. I am thinking of changing LowerDbgDeclare in such a way that it would produce dbg.addr and dbg.value instead of dbg.value only. i.e. dbg.addr would be generated for cases when dbg.value with memory operand is generated currently...

Looking good, and this produces the ~3% increase in variable locations in clang-3.4 builds like the previous revisions did too. It looks like the two xfailed tests are testing for the behaviour you're explicitly disabling: it's probably better to just delete them, as this is a deliberate decision to change that behaviour.

Paging @aprantl and @rnk : this patch sounds great to me (avoids dropping struct locations after SROA of inlined functions, increases variable location coverage) by not lower dbg.declare of structs, which IMHO the comment in LowerDbgDeclare indicates wasn't supposed to happen anyway. However, this plays into the wider "escaped variables in memory" problem of PR34136... would either of you feel strongly that this shouldn't happen?

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)

Interesting -- are there circumstances where this happens (dbg.declare of something that isn't an alloca)? I suspect it would be IR that didn't make sense, we might be better off disallowing it in the IR Verifier instead.

dstenb added a subscriber: dstenb.Aug 19 2019, 9:38 AM
dstenb added inline comments.
llvm/test/DebugInfo/X86/sroa-after-inlining.ll
6 ↗(On Diff #215628)

Could you perhaps expand the last sentence a bit so that it describes how SROA did not handle it correctly?

Looking good, and this produces the ~3% increase in variable locations in clang-3.4 builds like the previous revisions did too. It looks like the two xfailed tests are testing for the behaviour you're explicitly disabling: it's probably better to just delete them, as this is a deliberate decision to change that behaviour.

Paging @aprantl and @rnk : this patch sounds great to me (avoids dropping struct locations after SROA of inlined functions, increases variable location coverage) by not lower dbg.declare of structs, which IMHO the comment in LowerDbgDeclare indicates wasn't supposed to happen anyway. However, this plays into the wider "escaped variables in memory" problem of PR34136... would either of you feel strongly that this shouldn't happen?

Stepping back a bit: Is the result of SROA multiple smaller allocas?

If SROA is only creating new allocas, then describing them with dbg.declares should do no harm, since a later call to LowerDbgDeclare would truncate their range by inserting new dbg.values at every load. But if SROA is inserting the load, we do need to make sure that the loaded value is tracked by a dbg.value (potentially in addition to tracking the alloca with a dbg.declare). So I guess my question is, where are loads for the SROAed allocas generated?

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)

I must admit I'm confused how this patch relates to the example in the review description. What is the instruction coming in here in the example?

(A bit related to aprantl's latest comment.)

Would it be possible to add (or perhaps better, adapt one of the attached test cases into) single-pass tests? I personally think that would make it a bit easier to see what's going in.

llvm/test/DebugInfo/X86/sroa-after-inlining2.ll
189–191 ↗(On Diff #215628)

Can the different column entries be merged into one entry? That would increase the SNR of the test case.

(For future reference: Clang can omit column information when passed -gno-column-info.)

rnk added a comment.Aug 19 2019, 1:42 PM
In D64595#1621077, @avl wrote:

Thus, for the current moment I think to implement your original suggestion with avoiding dbg.declare lowering in instcombiner for aggregates.

This more or less gets the behavior of -instcombine-lower-dbg-declare=0 for structs, which is really all I cared about when I added it. The common case this flag improved is cases like the one in your test, simple C++ classes (like std::vector) stored on the stack and passed to some non-inline method call (__push_back_slow). So, I'm in favor of your change as written. In fact, I'd go further and remove the cl::opt if this lands, but let me do that since then I have to go update chromium's build to not pass it. :)

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)

clang emits dbg.declare of things that are not allocas from time to time. Usually it is for C++ classes that, when passed by value at the source level, are instead passed indirectly by pointer as required by the ABI. Or, for POD structs passed using byval.

Also, after inlining, these arguments may resolve to pointers to non-alloca locations, as in this->field = getSRetThing();.

llvm/lib/Transforms/Utils/Local.cpp
1381 ↗(On Diff #215628)

Perhaps this should be updated to use getAllocatedType in favor of getType()->getElementType() for consistency and to address future typeless pointers in IR.

1386 ↗(On Diff #215628)

Surely getAllocatedType can never return null?

llvm/test/Transforms/InstCombine/lower-dbg-declare.ll
2 ↗(On Diff #215628)

I think you should update this test case to check the new correct dbg.declare that should remain.

avl marked an inline comment as done.Aug 19 2019, 1:48 PM

OK, I will modify testcase to be single pass.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)

That happens with this patch and llvm/test/DebugInfo/X86/sroa-after-inlining2.ll testcase if InstructionCombining.cpp:3163-3173 piece deleted.

opt sroa-after-inlining2.ll -sroa -instcombine -inline -instcombine -sroa -S -o -

before second instcombine pass there is following IR :

entry:
  %result = alloca i64, align 8
  %tmpcast = bitcast i64* %result to %struct.S1*     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<

after second instcombine pass it transformed into :

entry:
  %result = alloca i64, align 8
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  call void @llvm.dbg.declare(metadata i64* %result, metadata !12, metadata !DIExpression()), !dbg !25


if.end:                                           ; preds = %entry
  %tmpcast = bitcast i64* %result to %struct.S1*   <<<<<<<<<<<<<<<<<<<<<<<<<
  call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<

instcombine moves bitcast instruction and clones dbg.declare.

avl marked 2 inline comments as done.Aug 19 2019, 2:08 PM
avl added inline comments.
llvm/test/DebugInfo/X86/sroa-after-inlining.ll
6 ↗(On Diff #215628)

current SROA implementation while doing "Migrate debug information from the old alloca to the new alloca(s)" handles only dbg.declare intrinsic. In this case original dbg.declare was lowered by instcombine pass into dbg.value. When it comes into SROA second time, all dbg.value intrinsics, inserted by instcombine pass before second SROA, just not updated(though SROA was actually done).

llvm/test/DebugInfo/X86/sroa-after-inlining2.ll
189–191 ↗(On Diff #215628)

Ok.

avl marked an inline comment as done.Aug 19 2019, 2:15 PM
avl added inline comments.
llvm/lib/Transforms/Utils/Local.cpp
1386 ↗(On Diff #215628)

I did not get it... if AI->getAllocatedType() would be null then isStructure() would be false.

aprantl added inline comments.Aug 19 2019, 5:20 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)

(Looks like the !isa<AllocaInst>(I) allows you to detect the bitcast-of-alloc case.)

I'm not quite convinced by this example, but maybe there is something I'm missing, let me know!

The semantics of dbg.declare(alloca, var) are that the alloca is the definitive storage for var. There is no need to insert duplicate dbg.declares later in the code because a dbg.declare is valid throughout the function until the lifetime of alloca ends or var goes out of scope.
If we need more fine-grained tracking of locations we have dbg.value and dbg.addr at our disposal, which are valid until the next dbg.value/addr for the same (overlapping fragment of the) variable. LowerDbgDeclare is the pass that switches between these two modes by deleting the dbg.declare and inserting a dbg.value at every load from the alloca.

I suspect that the disconnect between my mental model and what this patch is about comes from that LowerDbgDeclare is good at describing the SSA values loaded from memory, but does a terrible job describing the values stored into alloca, as it is neither inserting a dbg.addr for store instuctions describing the alloca, nor are we inserting dbg.addr when the address is passed to function that takes a pointer aliasing with the variable.

To reiterate: I think it is a bug to have more than one dbg.declare describing the same variable fragment in the same alloca (see SourceLevelDebugging.rst, llvm.dbg.declare). If we need to duplicate something pointing to an alloca, it should be a dbg.value or dbg.addr. (Similarly, I also think it is a bug to have a dbg.declare and a dbg.value describing the same variable.)

For the quoted example, a single dbg.declare would be sufficient, assuming there are no other dbg.values for this variable. If there are dbg.values present outside of the quoted region, the dbg.declares should be dbg.value(%result, !DIExpression(DW_OP_deref)) instead.

Adrian wrote:

Stepping back a bit: Is the result of SROA multiple smaller allocas?

If SROA is only creating new allocas, then describing them with dbg.declares should do no harm, since a later call to LowerDbgDeclare would truncate their range by inserting new dbg.values at every load. But if SROA is inserting the load, we do need to make sure that the loaded value is tracked by a dbg.value (potentially in addition to tracking the alloca with a dbg.declare). So I guess my question is, where are loads for the SROAed allocas generated?

My understanding of SROA is that no new loads are generated, instead existing ones using GEPs and bitcasts are rewritten to use the smaller allocas. Any dbg.declare get fragmented into the smaller allocas in the usual way, then mem2reg / LowerDbgDeclare promotes in the usual way. In some circumstances the same load might get duplicated to enable promotion, such as:

bb1:
  %a1 = alloca i32
  store i32 0, i32 *%a
  br label %join
bb2:
  br label %join
join:
  %b = phi i32 [ %a1, %bb1 ], [ @someglobal, %bb2 ]
  %c = load i32, i32 *%b

The load to %c gets duplicated into a load from %a1 in bb1, and from @someglobal in bb2, then the phi operates on the loaded values (and the alloca can be promoted). I don't think the loaded value qualifies as being part of the alloca after the phi, so IMO this wouldn't count as the location of a variable / field changing.

avl marked an inline comment as done.Aug 20 2019, 6:47 AM
avl added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)

@aprantl Hi, I am sorry, quite a lot of text later. There are two problems here:

  1. Original problem with incorrect dbg.value if SROA optimization called twice.
  1. Problem with cloned dbg.declare because of additional bitcast instruction. That problem becomes visible if lowering for structures avoided.

//=============================================================================
For the point 1:

SROA and mem2reg conversion do not work with dbg.value. They work with only dbg.declare.
As described in your model: LowerDbgDeclare is the pass that switches from dbg.declare into dbg.value mode.
But SROA and mem2reg are not ready for that mode. This leads to incorrect(not-patched) dbg.value after second SROA.

IR before first instcombine for non-patched version :

opt llvm/test/DebugInfo/X86/sroa-after-inlining2.ll -sroa -S -o -

entry:
  %result = alloca %struct.S1, align 4
  %0 = bitcast %struct.S1* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %0), !dbg !24
  call void @llvm.dbg.declare(metadata %struct.S1* %result, metadata !12, metadata !DIExpression()), !dbg !25    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  %call = call i64 @_Z3foov(), !dbg !26
  %1 = bitcast %struct.S1* %result to i64*, !dbg !26
  store i64 %call, i64* %1, align 4, !dbg !26
  %call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* %result), !dbg !27
  br i1 %call1, label %if.then, label %if.end, !dbg !29

IR after first instcombine :

opt llvm/test/DebugInfo/X86/sroa-after-inlining2.ll -sroa -instcombine -S -o -

entry:
  %result = alloca i64, align 8
  %tmpcast = bitcast i64* %result to %struct.S1*
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  %call = call i64 @_Z3foov(), !dbg !25
  store i64 %call, i64* %result, align 8, !dbg !25
  call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !26 <<<<<<<<<<<<
  %call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* nonnull %tmpcast) #3, !dbg !27
  br i1 %call1, label %if.then, label %if.end, !dbg !29

Note dbg.declare for the result variable(!12) converted into dbg.value.

IR after second SROA :

opt llvm/test/DebugInfo/X86/sroa-after-inlining2.ll -sroa -instcombine -inline -instcombine -sroa -S -o -

define dso_local i32 @_Z3barv() #0 !dbg !7 {
entry:
  %call = call i64 @_Z3foov(), !dbg !24
  %result.sroa.0.0.extract.trunc = trunc i64 %call to i32, !dbg !24
  %result.sroa.6.0.extract.shift = lshr i64 %call, 32, !dbg !24
  %result.sroa.6.0.extract.trunc = trunc i64 %result.sroa.6.0.extract.shift to i32, !dbg !24
  call void @llvm.dbg.value(metadata i64* undef, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !25   <<<<<<<<<<
  call void @llvm.dbg.value(metadata i64* undef, metadata !26, metadata !DIExpression()), !dbg !30
  %cmp.i = icmp eq i32 %result.sroa.0.0.extract.trunc, 0, !dbg !33
  br i1 %cmp.i, label %if.then, label %if.end, !dbg !34

Note dbg.value with undef memory location for result variable. Instead of setting to undef value it should be set into SROA variable :

entry:
  %call = call i64 @_Z3foov(), !dbg !24
  %result.sroa.0.0.extract.trunc = trunc i64 %call to i32, !dbg !24
  call void @llvm.dbg.value(metadata i32 %result.sroa.0.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !25   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  %result.sroa.6.0.extract.shift = lshr i64 %call, 32, !dbg !24
  %result.sroa.6.0.extract.trunc = trunc i64 %result.sroa.6.0.extract.shift to i32, !dbg !24
  call void @llvm.dbg.value(metadata i32 %result.sroa.6.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32)), !dbg !25  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  call void @llvm.dbg.value(metadata i64* undef, metadata !26, metadata !DIExpression()), !dbg !30
  %cmp.i = icmp eq i32 %result.sroa.0.0.extract.trunc, 0, !dbg !33
  br i1 %cmp.i, label %if.then, label %if.end, !dbg !34

That problem could be fixed in following ways :

a) teach SROA and mem2reg to work with dbg.value. So that following :

call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref))

converted into :

call void @llvm.dbg.value(metadata i32 %result.sroa.0.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !25   
call void @llvm.dbg.value(metadata i32 %result.sroa.6.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32)), !dbg !25

b) do not do dbg.declare lowering for structures.

Last version of the patch implements not doing dbg.declare lowering for structures which was suggested by @jmorse .

LowerDbgDeclare has that comment [0]:

// If this is an alloca for a scalar variable, insert a dbg.value
// at each load and store to the alloca and erase the dbg.declare.
// The dbg.values allow tracking a variable even if it is not
// stored on the stack, while the dbg.declare can only describe
// the stack slot (and at a lexical-scope granularity). Later
// passes will attempt to elide the stack slot.

Structures would either be converted into scalars(by SROA) and then they could be lowered and able to be allocated not on the stack(thus need dbg.value),
either they do not need to be lowered and are correctly described by dbg.declare.
So it looks OK to avoid lowering of dbg.declare for structures.

//=============================================================================
For the point 2:

If lowering of dbg.declare for structures avoided then there generated two dbg.declare`s for the same variable which is incorrect :

before second instcombine pass there is following IR :

opt sroa-after-inlining2.ll -sroa -instcombine -inline -S -o -

entry:
  %result = alloca i64, align 8
  %tmpcast = bitcast i64* %result to %struct.S1*     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<

after second instcombine:

opt sroa-after-inlining2.ll -sroa -instcombine -inline -instcombine -S -o -

entry:
  %result = alloca i64, align 8
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  call void @llvm.dbg.declare(metadata i64* %result, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<<<


if.end:                                           ; preds = %entry
  %tmpcast = bitcast i64* %result to %struct.S1*   <<<<<<<<<<<<<<<<<<<<<<<<<
  call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<

That second dbg.declare created in instcombine when it moves instructions [1]. In this case the bitcast instruction was moved down. Since moved instruction is not an alloca then no need to copy dbg.declare to avoid duplication.
So it seems proper fix would be to not clone dbg.declare if moved instruction is not an alloca.
i.e. to achieve that state :" single dbg.declare would be sufficient" the cloning of dbg.declare should be avoided.

//=============================================================================

  1. generating dbg.addr in LowerDbgDeclare.

    yes, I think it would be good to teach LowerDbgDeclare to generate dbg.addr. But it seems to me now that it is not relevant for this concrete case.

All above in short :

  1. avoiding dbg.declare lowering for structures solves original problem of the patch. It allows to generate correct dbg.values after SROA pass.
  2. avoiding duplication of dbg.declare in instcombiner while moving instructions prevents several dbg.declare instrs generation.
  3. Improving LowerDbgDeclare to generate dbg.addr intrinsics looks like necessary thing. But not related to the problem of this patch directly.

[0] https://github.com/llvm/llvm-project/blob/llvmorg-9.0.0-rc2/llvm/lib/Transforms/Utils/Local.cpp#L1419
[1] https://github.com/llvm/llvm-project/blob/llvmorg-9.0.0-rc2/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp#L3162

avl added a comment.Aug 20 2019, 7:09 AM

So I guess my question is, where are loads for the SROAed allocas generated?

They are generated in splitAlloca [0] by inserting dbg.declare instructions for new alloca instructions.
Later they would be promoted out inside mem2reg[1]. So that finally only dbg.value intrinsics would be presented.

[0] https://github.com/llvm/llvm-project/blob/llvmorg-9.0.0-rc2/llvm/lib/Transforms/Scalar/SROA.cpp#L4398
[1] https://github.com/llvm/llvm-project/blob/llvmorg-9.0.0-rc2/llvm/lib/Transforms/Scalar/SROA.cpp#L4584

Adrian wrote:

Stepping back a bit: Is the result of SROA multiple smaller allocas?

If SROA is only creating new allocas, then describing them with dbg.declares should do no harm, since a later call to LowerDbgDeclare would truncate their range by inserting new dbg.values at every load. But if SROA is inserting the load, we do need to make sure that the loaded value is tracked by a dbg.value (potentially in addition to tracking the alloca with a dbg.declare). So I guess my question is, where are loads for the SROAed allocas generated?

My understanding of SROA is that no new loads are generated, instead existing ones using GEPs and bitcasts are rewritten to use the smaller allocas. Any dbg.declare get fragmented into the smaller allocas in the usual way, then mem2reg / LowerDbgDeclare promotes in the usual way. In some circumstances the same load might get duplicated to enable promotion, such as:

Side note: Should we make sure that when a load via a GEP/cast-chain is generated there is a dbg.value with a DW_OP_fragment emitted as well (possibly in LowerDbgDeclare)?

aprantl added inline comments.Aug 20 2019, 8:46 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)
entry:
  %result = alloca i64, align 8
  %tmpcast = bitcast i64* %result to %struct.S1*
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  %call = call i64 @_Z3foov(), !dbg !25
  store i64 %call, i64* %result, align 8, !dbg !25
  call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !26 <<<<<<<<<<<<
  %call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* nonnull %tmpcast) #3, !dbg !27
  br i1 %call1, label %if.then, label %if.end, !dbg !29

Who is deleting the dbg.declare and inserting the dbg.value? Does instcombine call LowerDbgDeclare, or is this implemented manually somewhere?

IR after second SROA :

and the problem ehere is that SROA only knows/cares about allocas/dbg.declares and thus doesn't update the dbg.value. So by cloning the dbg.declare in this patch, you make it work. I also image that if you ran SROA a third time, it might run into the same issue again.

Would it be feasible to teach SROA to also update dbg.values instead, or is this a loosing battle? I would imagine that when SROA is rewriting

store i64 %call, i64* %result, align 8, !dbg !25
call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !26

into %result.sroa.6.0.extract (this intermediate step isn't shown in your example, but I'm sure it must exist in the detailed SROA log) we could update the dbg.value to point to %result.sroa.6.0.extract and attach a DW_OP_fragment, like we do for dbg.declares. Then salvageDebugInfo should automatically update the throughout the remaining transformations. (A reference to an alloca and a bitcast of an alloca inside a dbg.value can be treated as equivalent.)

Have you investigated this option? I think this would be a much cleaner and more general solution that should give even better results than this patch.

avl marked an inline comment as done.Aug 20 2019, 1:57 PM
avl added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)

entry:
%result = alloca i64, align 8
%tmpcast = bitcast i64* %result to %struct.S1*
%0 = bitcast i64* %result to i8*, !dbg !24
call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
%call = call i64 @_Z3foov(), !dbg !25
store i64 %call, i64* %result, align 8, !dbg !25
call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !26 <<<<<<<<<<<<

%call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* nonnull %tmpcast) #3, !dbg !27
br i1 %call1, label %if.then, label %if.end, !dbg !29

Who is deleting the dbg.declare and inserting the dbg.value? Does instcombine call LowerDbgDeclare, or is this implemented manually >somewhere?

yes, it does, instcombine calls LowerDbgDeclare. LowerDbgDeclare deletes dbg.declare and inserts dbg.value.

IR after second SROA :

and the problem ehere is that SROA only knows/cares about allocas/dbg.declares and thus doesn't update the dbg.value. So by cloning the >dbg.declare in this patch, you make it work. I also image that if you ran SROA a third time, it might run into the same issue again.

Not by cloning dbg.declare. This patch makes SROA(which only knows/cares about allocas/dbg.declares) to work by avoiding lowering dbg.declare. instcombine calls LowerDbgDeclare but it does nothing for structures.

No problem with calling SROA third time. Structures will always be described by dbg.declare(LowerDbgDeclare does nothing for structures in this patch). If structure would be converted by previous SROA into an scalar then this scalar would not be subject for next SROA.
Thus it is OK to call SROA several times : all input structures would be described by dbg.declare. There would not be situation similar to current moment - when structure could be described by dbg.declare and by dbg.value depending on previous passes of SROA.

Would it be feasible to teach SROA to also update dbg.values instead, or is this a loosing battle? I would imagine that when SROA is rewriting
...
Have you investigated this option? I think this would be a much cleaner and more general solution that should give even better results than this patch.

First implementation of this patch was a kind of such solution(not exactly this, but it in that direction).
Now I think that avoiding dbg.declare lowering for structures is a better option.
Here are the arguments :

  1. Implementation becomes more complex without any additional benefit. It would be necessary to implement two similar processing for different representations : mem2reg for dbg.declare and patching dbg.value. To recognize proper dbg.value it would be necessary to parse DIExpression for dbg.value(For most cases it could be just check for DW_OP_deref and structure type, but in general - there could be much more complex expressions) .
  1. No reason to lowering structures. LowerDbgDeclare handles scalars since they could be allocated on registers and then could not have stack location. If structure fits in register then it would be converted into a scalar first and then it could be lowered. Otherwise no need to lower structure.
  1. arrays already done in that way - https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/Local.cpp#L1406
  1. as far as I understand current design decision for dbg.addr: if object located on the stack then it should be described by dbg.declare or dbg.addr. Otherwise dbg.value should be used. dbg.value pointing to an alloca should be avoided.
aprantl added inline comments.Aug 20 2019, 3:12 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)

Thanks for the explanation, I think I can follow your arguments for not lowering a non-scalar dbg.declare. Based on that I understand the part of this patch that concerns Local.cpp and I think it makes sense.

What I don't understand is the InstCombine part. The example output of this patch that you posted earlier

entry:
  %result = alloca i64, align 8
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  call void @llvm.dbg.declare(metadata i64* %result, metadata !12, metadata !DIExpression()), !dbg !25


if.end:                                           ; preds = %entry
  %tmpcast = bitcast i64* %result to %struct.S1*   <<<<<<<<<<<<<<<<<<<<<<<<<
  call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<

is incorrect LLVM IR as there is more than one dbg.declare for the same variable and alloca. Calling salvageDebugInfi on a bitcast to a dbg.declare is wrong for the same reason. Unless an alloca is elided or rewritten, there is no point in touching a dbg.declare.

The InstCombine part of this patch looks like it is hack to indirectly prevent LowerDbgDeclare from operating by putting a bitcast between the dbg.declare and the alloca, so the dyn_cast<AllocaInst> in llvm::LowerDbgDeclare() doesn't trigger.

Why is this necessary and why isn't the isStructure(AI) condition good enough?

avl marked an inline comment as done.Aug 21 2019, 1:32 AM
avl added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)

InstCombine part is for fixing problem with two dbg.declare, which become visible when lowering of dbg.declare for structures was avoided. Previously both dbg.declare was lowered and problem was not seen.

Following IR is for the case when InstCombine part is NOT done :

before second instcombine pass there is following IR :

opt sroa-after-inlining2.ll -sroa -instcombine -inline -S -o -

entry:
  %result = alloca i64, align 8
  %tmpcast = bitcast i64* %result to %struct.S1*     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<

after second instcombine:

opt sroa-after-inlining2.ll -sroa -instcombine -inline -instcombine -S -o -

entry:
  %result = alloca i64, align 8
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  call void @llvm.dbg.declare(metadata i64* %result, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<<<


if.end:                                           ; preds = %entry
  %tmpcast = bitcast i64* %result to %struct.S1*   <<<<<<<<<<<<<<<<<<<<<<<<<
  call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<

That second dbg.declare created in instcombine when it moves instructions [0] in function TryToSinkInstruction().
Cloning dbg.declare looks like a bug, since there should be only one dbg.declare.
Since moved instruction is not an alloca then no need to copy dbg.declare.
So it seems proper fix would be to not clone dbg.declare if moved instruction is not an alloca.
Thus I avoided cloning if moved instruction is not an alloca.

Following IR is for the case Instcombine part IS applied :

entry:
  %result = alloca i64, align 8
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  call void @llvm.dbg.declare(metadata i64* %result, metadata !12, metadata !DIExpression()), !dbg !25   


if.end:                                           ; preds = %entry
  %tmpcast = bitcast i64* %result to %struct.S1*

Now there is only one dbg.declare.

Why is this necessary and why isn't the isStructure(AI) condition good enough?

these are separate issues.
isStructure(AI) condition is good enough to resolve original problem of that patch.
Additionally there is a code in instcombiner which duplicates dbg.declare.
inscombine part is to prevent dbg.declare duplication done by TryToSinkInstruction().

[0] https://github.com/llvm/llvm-project/blob/llvmorg-9.0.0-rc2/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp#L3162

aprantl added inline comments.Aug 21 2019, 8:38 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)

That second dbg.declare created in instcombine when it moves instructions [0] in function TryToSinkInstruction().
Cloning dbg.declare looks like a bug, since there should be only one dbg.declare.
Since moved instruction is not an alloca then no need to copy dbg.declare.
So it seems proper fix would be to not clone dbg.declare if moved instruction is not an alloca.

I fully agree. The loop in TryToSinkInstruction (InstructionCombining.cpp:3162) should be fixed to ignore dbg.declare intrinsics. Would you be interested in creating a separate patch for that? Otherwise I can do it, too.

Why is this necessary and why isn't the isStructure(AI) condition good enough?

these are separate issues.
isStructure(AI) condition is good enough to resolve original problem of that patch.
Additionally there is a code in instcombiner which duplicates dbg.declare.
inscombine part is to prevent dbg.declare duplication done by TryToSinkInstruction().

Let's do this then:

  1. Create a patch that adds an if (isa<DbgDeclareInst>(DII)) continue; to TryToSinkInstruction.
  2. Create a separate patch to add isStructure(AI) to Local.cpp
  3. See what else is necessary to fix the issue you are after.
avl marked an inline comment as done.Aug 21 2019, 2:21 PM
avl added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3163–3164 ↗(On Diff #215628)

Let's do this then:

Create a patch that adds an if (isa<DbgDeclareInst>(DII)) continue; to TryToSinkInstruction.
Create a separate patch to add isStructure(AI) to Local.cpp
See what else is necessary to fix the issue you are after.

Sure, I will do all of this.

At the moment, I do not think that "if (isa<DbgDeclareInst>(DII)) continue;" will work, since dbg.declare should be updated to not to point to moved instruction. But let`s discuss it in separate patch. Thanks!

avl updated this revision to Diff 218287.Sun, Sep 1, 11:07 PM

My apologies for the delay. I was out of the office and I couldn 't finish it sooner.

finally, addressed comments :

  1. split patch in two parts:
    • do not lower structures(this patch)
    • do not clone dbg.declare in instcombiner(would create soon).
  2. simplified test case.
  3. rewrote lower-dbg-declare.ll to not use structure.
  4. mark simplify-dbg-declare-load.ll as XFAIL. @rnk would like to remove it after chromium's build updated.
aprantl added inline comments.Tue, Sep 3, 9:01 AM
llvm/test/DebugInfo/X86/sroa-after-inlining.ll
35 ↗(On Diff #218287)

RESULT?

89 ↗(On Diff #218287)

Do we actually need this function body or could this be converted to a forward-declaration?

lgtm with oustanding comments addressed!

avl marked 2 inline comments as done.Tue, Sep 3, 10:49 AM
avl added inline comments.
llvm/test/DebugInfo/X86/sroa-after-inlining.ll
35 ↗(On Diff #218287)

yes. actual IR looks like this :

before second SROA :

call void @llvm.dbg.declare(metadata %struct.S1* %result, metadata !12, metadata !DIExpression()), !dbg !21
%call = call i32 @_Z3foov(), !dbg !21
%coerce.dive = getelementptr inbounds %struct.S1, %struct.S1* %result, i64 0, i32 0, !dbg !21
store i32 %call, i32* %coerce.dive, align 4, !dbg !21

after second SROA:

%call = call i32 @_Z3foov(), !dbg !21
call void @llvm.dbg.value(metadata i32 %call, metadata !12, metadata !DIExpression()), !dbg !22
89 ↗(On Diff #218287)

yes, we do. it` body used during inlining :

opt %s -sroa -instcombine -inline -instcombine -sroa

without proper inlining second SROA pass does not have a chance to do optimization.

aprantl accepted this revision.Tue, Sep 3, 10:58 AM
aprantl added inline comments.
llvm/test/DebugInfo/X86/sroa-after-inlining.ll
35 ↗(On Diff #218287)

Sorry for being unclear: I meant to say: "What about renaming NAME to RESULT, would that make the test easier to read?

This revision is now accepted and ready to land.Tue, Sep 3, 10:58 AM
avl marked an inline comment as done.Tue, Sep 3, 11:19 AM
avl added inline comments.
llvm/test/DebugInfo/X86/sroa-after-inlining.ll
35 ↗(On Diff #218287)

Ah, I see. yes, it would. Will rename it.

This revision was automatically updated to reflect the committed changes.

Enjoyable leaps in variable-coverage / bytes-coverage on the far right of these graphs:

http://lnt.llvm.org/db_default/v4/nts/graph?plot.0=1357.1607043.4&highlight_run=128792
http://lnt.llvm.org/db_default/v4/nts/graph?plot.0=1357.1607042.4&highlight_run=128792

Thanks for digging into this!

Enjoyable leaps in variable-coverage / bytes-coverage on the far right of these graphs:

http://lnt.llvm.org/db_default/v4/nts/graph?plot.0=1357.1607043.4&highlight_run=128792
http://lnt.llvm.org/db_default/v4/nts/graph?plot.0=1357.1607042.4&highlight_run=128792

Thanks for digging into this!

Whoa. Indeed.

avl added a comment.Wed, Sep 4, 1:59 PM

@jmorse , @aprantl Thanks for helping me with this!

avl added a comment.Wed, Sep 11, 7:06 AM
In D64595#1635864, @rnk wrote:

This more or less gets the behavior of -instcombine-lower-dbg-declare=0 for structs, which is really all I cared about when I added it. The common case this flag improved is cases like the one in your test, simple C++ classes (like std::vector) stored on the stack and passed to some non-inline method call (__push_back_slow). So, I'm in favor of your change as written. In fact, I'd go further and remove the cl::opt if this lands, but let me do that since then I have to go update chromium's build to not pass it. :)

Hi @rnk, the patch is landed :-)