Page MenuHomePhabricator

[InstCombine] Remove dbg.values describing contents of dead allocas
ClosedPublic

Authored by vsk on Aug 7 2020, 2:15 PM.

Details

Summary

When InstCombine removes an alloca, it erases the dbg.{addr,declare}
instructions which refer to the alloca. It would be better to instead
remove all debug intrinsics which describe the contents of the dead
alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s.

This effectively undoes work performed in an InstCombine run earlier in
the pipeline by LowerDbgDeclare, which inserts DW_OP_deref dbg.values
before CallInst users of an alloca. The motivating example looks like:

define void @foo(i32 %0) {
  %a = alloca i32              ; This alloca is erased.
  store i32 %0, i32* %a
  dbg.value(i32 %0, "arg0")    ; This dbg.value survives.
  dbg.value(i32* %a, "arg0", DW_OP_deref)
  call void @trivially_inlinable_no_op(i32* %a)
  ret void
}

If the DW_OP_deref dbg.value is not erased, it becomes dbg.value(undef)
after inlining, making "arg0" unavailable. But we already have dbg.value
descriptions of the alloca's value (from LowerDbgDeclare), so the
DW_OP_deref dbg.value cannot serve its purpose of describing an
initialization of the alloca by some callee. It invalidates other useful
dbg.values, causing large gaps in location coverage, so we should delete
it (even though doing so may cause stale dbg.values to appear, if
there's a dead store to %a in @trivially_inlinable_no_op).

OTOH, it wouldn't be correct to delete all dbg.value descriptions of an
alloca. Note that it's possible to describe a variable that takes on
different pointer values, e.g.:

void use(int *);
void t(int a, int b) {
  int *local = &a;     // dbg.value(i32* %a.addr, "local")
  local = &b;          // dbg.value(i32* undef, "local")
  use(&a);             //           (note: %b.addr is optimized out)
  local = &a;          // dbg.value(i32* %a.addr, "local")
}

In this example, the alloca for "b" is erased, but we need to describe
the value of "local" as <unavailable> before the call to "use". This
prevents "local" from appearing to be equal to "&a" at the callsite.

rdar://66592859

Diff Detail

Event Timeline

vsk created this revision.Aug 7 2020, 2:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 7 2020, 2:15 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
vsk requested review of this revision.Aug 7 2020, 2:15 PM
vsk added a subscriber: debug-info.Aug 7 2020, 2:16 PM
dblaikie added inline comments.Aug 8 2020, 3:03 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2662

What if there was a later store to this alloca? Wouldn't removing the dbg.value with deref then let the %0 location persist, passing through the store (without reflecting that store in the dbg.value) - changing the values described in the DWARF, possibly incorrectly?

llvm/test/Transforms/InstCombine/erase-dbg-values-at-dead-alloc-site.ll
7–10

I don't think the test case should be running instcombine twice, nor should it be running the inliner - it should only be testing the specific behavior of "given IR1, this pass should produce IR2".

The context can be helpful in comments - but tests/design should be around "this is or is not a valid/good quality optimization, no matter where the IR comes from".

vsk added inline comments.Aug 10 2020, 9:42 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2662

If there was a store to the alloca after the call, InstCombine must have inserted a dbg.value after the store during LowerDbgDeclare. Removing the dbg.value(..., DW_OP_deref) before the call would extend the range of the initial %0 location until the dbg.value for the store.

llvm/test/Transforms/InstCombine/erase-dbg-values-at-dead-alloc-site.ll
7–10

The bug this patch tries to address fundamentally involves a sequence of optimizations. I think it's useful to run a sequence of passes in this case (and this is a widespread practice in llvm), since it gives a fuller picture of what problem is being addressed. I do also think a more targeted/standalone example that just runs instcombine once is useful (this is '@t1' in the test).

vsk updated this revision to Diff 284408.Aug 10 2020, 9:43 AM

Fix two check lines to use the right check prefix.

dblaikie added inline comments.Aug 10 2020, 2:27 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2662

Rather than what InstCombine would've done - I'm interested in what the IR semantics/contract are.

I guess dbg.value only would be used to represent the load from the alloca, not the value in the alloca itself? So a future store to the alloca is unrelated to any dbg.value derived from (what must be loads) from the alloca?

llvm/test/Transforms/InstCombine/erase-dbg-values-at-dead-alloc-site.ll
7–10

IR has a defined semantic - I think we should be thinking about passes in terms of whether they preserve the semantics of the IR or not - regardless of where the input IR comes from/what form it's in.

I assume tests that test a sequence of transformations are the minority in LLVM? (in the same way that we don't do end-to-end testing from Clang through IR when we can help it - and in middle end optimizations we generally can test a single pass in isolation & mostly do it that way, don't we?)

Except for analyses, which don't have a serialization, so we have to rely on testing both the analysis & either testing its dump output, or testing how the analysis behavior influences some transformation.

vsk added inline comments.Aug 11 2020, 11:36 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2662

dbg.value can be used to indicate the value inside of an alloca starting at a specific position in the instruction stream, and continuing until the the dbg.value is superseded (e.g. by a dbg.value intrinsic describing another load/store from/to the alloca, or by the register holding the loaded value being clobbered).

When a variable backed by an alloca stops being described by a dbg.declare, and transitions to being described by a series of dbg.value intrinsics, a store to the alloca would be unrelated to any dbg.value derived from the alloca. A problem with this approach is that unless specific effort is made to insert a dbg.value after each store-like instruction which modifies the alloca, the debug info becomes less precise.

There's a FIXME describing this problem above the comment describing ShouldLowerDbgDeclare. I don't think we can get away from LowerDbgDeclare in general, but we could push towards using it only when we know an alloca is to be deleted (i.e. do it lazily, instead of eagerly like we do today). I'm not sure whether this patch would still be necessary if LowerDbgDeclare were lazy, but will look into this.

vsk added inline comments.Aug 11 2020, 5:57 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2662

I looked into deferring LowerDbgDeclare to the points where llvm actually needs to erase an alloca, but found that doing so might not be practical. The problem is that llvm can delete stores into an alloca without deleting the alloca itself. Without lowering dbg.declares eagerly, we may lose track of updates to a variable that don't get stored back into an alloca. So I believe some version of this patch is necessary.

llvm/test/Transforms/InstCombine/erase-dbg-values-at-dead-alloc-site.ll
7–10

I don't understand the concern here. Is it that the current pass behavior (setting a dbg.value operand to undef) preserves semantics, while the proposed new behavior (deleting a dbg.value) doesn't? If so, I don't follow. I'd argue that dbg.value(<alloca>, DW_OP_deref) doesn't mean the same thing as dbg.value(undef, DW_OP_deref), and that the latter is actually ill-formed. (As it happens, check-llvm passes with that criterion added to the verifier -- I'll have to test this out on a larger codebase to see if anything new turns up.)

NB: on the topic of LowerDbgDeclare, it was discussed extensively in PR34136 (CC @rnk) , and there's actually a plan for eliminating its behaviour there [0] from about two years ago, which we were eventually going to look at (CC @chrisjackson).

Everything to do with leaked pointers seems to have a bunch of compromises -- I think this patch alters some of them, and I don't have a strong opinion as to whether it's good or not. With dbg.values placed at call sites:

  1. The variable might refer to a stack location containing a stale value, if a store was DSE'd,
  2. The stack location might be optimised out (which is what this patch addresses),
  3. But if the callee stores to the stack location, the variable location accurately reflects this.

By fixing 2, we lose some of 3: In the test case you add, if you add a:

store i32 1, i32* %addr, align 4

To the 'use' function, then t2 doesn't reflect this after -instcombine -inline -instcombine:

define void @t2(i32 %x) !dbg !17 {
  call void @llvm.dbg.value(metadata i32 %x, metadata !18, metadata !DIExpression()), !dbg !19
  ret void
}

Specifically, the "old" value %x would be presented as the variable value on the 'ret 'instruction, wheras at -O0 the variable location on the stack would contain the '1' stored by 'use'. Without this patch, the variable is reported as optimized-out.

I don't really see this as being a bug: we just don't cope well with leaked pointers, and stale values can appear as a result.

[0] https://bugs.llvm.org/show_bug.cgi?id=34136#c25

dblaikie added inline comments.Sep 28 2020, 5:12 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2662

I think I follow now - though I think it'd still be beneficial to describe this in terms of the IR contract rather than with reference to other passes in the pipeline. "Given this IR, doing X would produce this erroneous result (the IR would now describe the value of a variable as having some value it never had in the input IR, for instance)".

In any case, I'm likely not the best person to review the code change here, not being as familiar with debug variable location tracking - happy to bow out/leave it up to others even if they disagree with me on this point (though hope they take it under consideration). The test point I made elsewhere I still feel pretty strongly about.

llvm/test/Transforms/InstCombine/erase-dbg-values-at-dead-alloc-site.ll
7–10

Ah, no - my discussion here isn't about the correctness of the production code, but the suitableness of the test.

Test should be in terms of the contract - which execution of the pass is the one that has done the wrong thing? That and only that one is the one that should be tested, otherwise the other optimization passes may change in ways that cause the test to no longer be applicable. I think it's relatively rare to run more than one optimization pass in a given test ("useful to run a sequence of passes in this case (and this is a widespread practice in llvm)" - that is counter to my general understanding of widespread practice of writing LLVM lit tests)/more common to test only a single optimization pass.

vsk added a comment.Sep 30 2020, 3:27 PM

Everything to do with leaked pointers seems to have a bunch of compromises -- I think this patch alters some of them, and I don't have a strong opinion as to whether it's good or not. With dbg.values placed at call sites:

  1. The variable might refer to a stack location containing a stale value, if a store was DSE'd,
  2. The stack location might be optimised out (which is what this patch addresses),
  3. But if the callee stores to the stack location, the variable location accurately reflects this.

By fixing 2, we lose some of 3: In the test case you add, if you add a:

store i32 1, i32* %addr, align 4

To the 'use' function, then t2 doesn't reflect this after -instcombine -inline -instcombine:

define void @t2(i32 %x) !dbg !17 {
  call void @llvm.dbg.value(metadata i32 %x, metadata !18, metadata !DIExpression()), !dbg !19
  ret void
}

Specifically, the "old" value %x would be presented as the variable value on the 'ret 'instruction, wheras at -O0 the variable location on the stack would contain the '1' stored by 'use'. Without this patch, the variable is reported as optimized-out.

I don't really see this as being a bug: we just don't cope well with leaked pointers, and stale values can appear as a result.

[0] https://bugs.llvm.org/show_bug.cgi?id=34136#c25

Yes, I think that's correct. Without replacing LowerDbgDeclare I think we're forced into making compromises. And to replace LowerDbgDeclare, we might need a way to model the effect of any memory write on local variables (including memcpy's and stores that happen in a callee -- things that LowerDbgDeclare doesn't handle today).

Imo, this "dbg.value(<dead alloca>)-before-no-op-call" situation pops up enough that I think we'll want to do better than falling back to <optimized out>.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2662

I don't feel so strongly about keeping the full-pipeline test. I noticed that Jeremy was able to come up with a nice example based on it, so perhaps we can leave it around as a comment? It should be fairly compact (hopefully not too distracting from the more narrow instcombine-only test) and wouldn't actually run this way.

I believe that the following could work and be a self-contained local operation that doesn't need any contract with a previous optimization pass:

%a = alloca i32              ; This alloca is erased.
store i32 %0, i32* %a
dbg.value(i32* %a, "arg0", DW_OP_deref)
call void @trivially_inlinable_no_op(i32* %a)

We could define a new type salvage operation that looks at to-be-deleted store foo, a instructions that store to an alloca a that is used in a dbg.value(a, var, DIExpression(DW_OP_deref, expr)) that is following the store. It would salvage the store by replacing the dbg.value with dbg.value(foo, var, DIExpression(expr)).

This example would become

dbg.value(i32* %0, "arg0", DIExpression())
call void @trivially_inlinable_no_op(i32* %a)

That would also apply to any store to a inside the function after inlining.

vsk added a comment.EditedOct 1 2020, 12:35 PM

I believe that the following could work and be a self-contained local operation that doesn't need any contract with a previous optimization pass:

(A brief aside: as currently written, this patch doesn't rely on a special contract with a previous optimization pass either.)

%a = alloca i32              ; This alloca is erased.
store i32 %0, i32* %a
dbg.value(i32* %a, "arg0", DW_OP_deref)
call void @trivially_inlinable_no_op(i32* %a)

We could define a new type salvage operation that looks at to-be-deleted store foo, a instructions that store to an alloca a that is used in a dbg.value(a, var, DIExpression(DW_OP_deref, expr)) that is following the store. It would salvage the store by replacing the dbg.value with dbg.value(foo, var, DIExpression(expr)).

This example would become

dbg.value(i32* %0, "arg0", DIExpression())
call void @trivially_inlinable_no_op(i32* %a)

That would also apply to any store to a inside the function after inlining.

I think this more general approach could interact poorly with dead store elimination. For example, you might have:

%a = alloca i32
store i32 1, i32* %a ; DSE kills this store.
store i32 2, i32* %a
dbg.value(i32* %a, "var", DW_OP_deref)
call void @some_callee(i32* %a)
; Assume the alloca %a is not erased.

With the proposed salvage op, after DSE, we'd get:

%a = alloca i32
store i32 2, i32* %a
dbg.value(i32 1, "var", ())
call void @some_callee(i32* %a)

This results in an incorrect description of "var" both before the call to "some_callee" and potentially after, depending on whether "some_callee" stores into "var".

I don't think this addresses the problem Jeremy identified either. In his example, there's a store i32 1, i32* %addr in the inlined callee: under your proposal, this wouldn't get salvaged because it's /after/ the dbg.value(a, var, DIExpression(DW_OP_deref, expr)).

%a = alloca i32
store i32 1, i32* %a ; DSE kills this store.
store i32 2, i32* %a
dbg.value(i32* %a, "var", DW_OP_deref)
call void @some_callee(i32* %a)
; Assume the alloca %a is not erased.

The salvage operation I had in mind (but didn't specify!) wouldn't trigger here:

I wrote

We could define a new type salvage operation that looks at to-be-deleted store foo, a instructions that store to an alloca a that is used in a dbg.value(a, var, DIExpression(DW_OP_deref, expr)) that is following the store.

but this should be clarified to say

We could define a new type salvage operation that looks at to-be-deleted store foo, a instructions that store to an alloca a that is used in a dbg.value(a, var, DIExpression(DW_OP_deref, expr)) that is immediately following the store.

or more elaborate

We could define a new type salvage operation that looks at to-be-deleted store foo, a instructions that store to an alloca a that is used in a dbg.value(a, var, DIExpression(DW_OP_deref, expr)) that is between the store and the next store or call referencing the alloca, or another dbg.value(var).

vsk added a comment.Oct 1 2020, 1:31 PM
%a = alloca i32
store i32 1, i32* %a ; DSE kills this store.
store i32 2, i32* %a
dbg.value(i32* %a, "var", DW_OP_deref)
call void @some_callee(i32* %a)
; Assume the alloca %a is not erased.

The salvage operation I had in mind (but didn't specify!) wouldn't trigger here:

I wrote

We could define a new type salvage operation that looks at to-be-deleted store foo, a instructions that store to an alloca a that is used in a dbg.value(a, var, DIExpression(DW_OP_deref, expr)) that is following the store.

but this should be clarified to say

We could define a new type salvage operation that looks at to-be-deleted store foo, a instructions that store to an alloca a that is used in a dbg.value(a, var, DIExpression(DW_OP_deref, expr)) that is immediately following the store.

or more elaborate

We could define a new type salvage operation that looks at to-be-deleted store foo, a instructions that store to an alloca a that is used in a dbg.value(a, var, DIExpression(DW_OP_deref, expr)) that is between the store and the next store or call referencing the alloca, or another dbg.value(var).

It's possible there's some more general solution to this problem that I've missed, but I don't think this is it. My concern is that these salvage options could be fragile and expensive. Slight changes to clang's codegen, or to passes that run before the salvage occurs, might disable the salvage. E.g., if there's an alloca/bitcast in between the store and the DW_OP_deref dbg.value, the salvage couldn't proceed. You could work around this by doing a scan starting from the dead store to find the next instruction that writes to memory, but past attempts to do that have been piecemeal and fragile as well. For example, LowerDbgDeclare tries to reason about which instructions write to memory (it insert dbg.values after stores), but it misses a large set of instructions that write to memory (cmpxchg/memcpy/...).

Istm that the approach taken in this patch is simpler, and probably less fragile: it's cheap to locate stale DW_OP_deref dbg.values, and deleting them has known tradeoffs that we've accepted elsewhere (it's the same stale value leak problem as LowerDbgDeclare has).

I think *I* would be comfortable with landing this patch as is with a comment that explains that we should remove this if we change the LowerDbgDeclare behavior that inserts the dbg,value in the first place. Any objections to this?

aprantl added inline comments.Oct 9 2020, 3:06 PM
llvm/test/Transforms/InstCombine/erase-dbg-values-at-dead-alloc-site.ll
7–10

I think this could be resolved by having two tests — one that tests one run of instcombine to define expected input/output, and one "end-to-end" regression test (=this one) that ensures that the intended effect on the pipeline is preserved. Both are valuable.

dblaikie added inline comments.Oct 9 2020, 6:47 PM
llvm/test/Transforms/InstCombine/erase-dbg-values-at-dead-alloc-site.ll
7–10

I'd still discourage the addition of such an end-to-end test, for the same reason we don't write clang end-to-end (down through LLVM to machine code) tests if we can help it: the exponential complexity of testing all the combinations of things. Instead testing each feature in isolation.

FWIW, ignoring the test style discussion, the change LGTM.
AFAICT, prior to D80264 these dbg.value+derefs would've been silently dropped instead of made undef, so we're really just returning to previous behaviour with this patch. I've got a patch brewing which applies this fix to mem2reg which I'll upload shortly.

vsk updated this revision to Diff 299435.Oct 20 2020, 12:01 PM
  • Clarify a comment.
  • Remove the RUN-TWICE test.
vsk edited the summary of this revision. (Show Details)Oct 20 2020, 12:02 PM
vsk added a comment.Oct 20 2020, 12:04 PM

Thanks for all the careful reviews! I've removed the RUN-TWICE test but left the example as a comment. I've also shortened/clarified the in-source comment. I think this addresses all of the outstanding feedback. I plan to land this on Friday if there aren't any more concerns voiced.

dblaikie accepted this revision.Oct 20 2020, 2:33 PM

Generally good with the test case - appreciate the discussion here - thanks! (certainly great to have the context too for "why would this IR come up" to help folks maintaining the test understand the real world repercussions of the reduced example, etc)

This revision is now accepted and ready to land.Oct 20 2020, 2:33 PM
This revision was landed with ongoing or failed builds.Oct 22 2020, 10:00 AM
This revision was automatically updated to reflect the committed changes.