Page MenuHomePhabricator

[SafeStack][DebugInfo] Insert DW_OP_deref in correct location
ClosedPublic

Authored by leonardchan on Jan 27 2020, 8:23 PM.

Details

Summary

This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585 where a DW_OP_deref was placed at the end of a dwarf expression, resulting in corrupt symbols when debugging.

Diff Detail

Event Timeline

leonardchan created this revision.Jan 27 2020, 8:23 PM
aprantl added inline comments.Jan 28 2020, 9:31 AM
llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
758

This is surprising to me. Can you walk me through this as a sanity check to make sure this isn't papering over some other issue?

Both the before and after state should be equivalent:
Before your patch we emit a direct register location + DW_OP_deref afterwards we emit an indirect register location.

What am I missing?

leonardchan marked an inline comment as done.Jan 28 2020, 11:07 AM
leonardchan added inline comments.
llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
758

I'll admit I'm not an expert in this area of LLVM, but my understanding is that this would affect the order of the expression stack.

Before my patch, if this value was an indirection and the expression was DW_OP_constu, 8, DW_OP_minus, then we would append DW_OP_deref to the end of the stack and we would not actually deref the value immediately.

With this patch (which is essentially a revert of https://reviews.llvm.org/D68945#change-VaZel0Y6QQdl), the indirection happens first, then the expressions on the stack.

This would essentially turn something like:

DW_OP_breg6 RBP-40, DW_OP_constu 0x20, DW_OP_minus, DW_OP_deref

into

DW_OP_breg6 RBP-40, DW_OP_deref, DW_OP_constu 0x20, DW_OP_minus
aprantl added inline comments.Jan 28 2020, 5:06 PM
llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
758

I see. So this should have been DIExpression::prepend instead of append? Could we implement it that way instead?

jmorse added inline comments.Jan 29 2020, 7:21 AM
llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
758

Looking at pre-emission DBG_VALUEs, I see without this patch:

DIExpression(DW_OP_constu, 40, DW_OP_minus, DW_OP_constu, 32, DW_OP_minus, DW_OP_deref)

And with this patch:

DIExpression(DW_OP_constu, 40, DW_OP_minus, DW_OP_deref, DW_OP_constu, 32, DW_OP_minus)

Which correctly gets the deref in the middle of the minuses. I think with prepend we would end up with:

DIExpression(DW_OP_deref, DW_OP_constu, 40, DW_OP_minus, DW_OP_constu, 32, DW_OP_minus)

Which would still be broken.

More generally: D68945 was supposed to eliminate pre-regalloc uses of IsIndirect, and D69178 repurpose the field. With both applied this fault doesn't occur, however I didn't get around to landing D69178. Seeing how D68945 is in the release-10 branch and is unexpectedly broken, it might be better to:

  • Fully revert D68945 and add the regression test
  • cherry-pick the revert into release-10
  • I'll re-land D68945 when D69178 is ready (there were some increased sizes of location lists that need looking at).

Opinions?

More generally: D68945 was supposed to eliminate pre-regalloc uses of IsIndirect, and D69178 repurpose the field. With both applied this fault doesn't occur, however I didn't get around to landing D69178.

Ah. I had thought we eliminated this and was surprised to see it here.

leonardchan marked an inline comment as done.Jan 29 2020, 9:41 AM
leonardchan added inline comments.
llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
758

Thanks for checking. For our purposes, we don't mind just landing this patch, or landing D69178 on top of D68945, as long as the regression is addressed relatively quickly upstream.

If this is urgent for other people or others need this fix in the the next release, then reverting, cherry-picking, and relanding with the regression test sounds appropriate.

*ping* Can we land this or revert D68945 in the meantime?

phosek accepted this revision.Jan 30 2020, 3:50 PM
phosek added inline comments.
llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
758

We've tried reverting D68945, but since that one landed some time ago, it doesn't revert cleanly and it seems like we would need to revert multiple changes that landed after. Given that, and the fact that D69178 is not ready yet, I'd really prefer to go ahead and land this change to unblock SafeStack users, and land D69178 when ready as a proper solution.

This revision is now accepted and ready to land.Jan 30 2020, 3:50 PM
This revision was automatically updated to reflect the committed changes.
bjope added a subscriber: bjope.Feb 4 2020, 9:14 AM

This broke some of our downstream tests using vla:s. Those are compiled from C and executed in gdb. What happens is that DW_OP_deref disappears and then the debugger prints the wrong value.
In those tests there is no other operations in the DIExpressions (so I can't really tell if prepend would be more correct than append, but with the patch that landed we get no DW_OP_deref at all).

Just giving a heads up that something seems weird with this patch.

One indication of that might be that if I run

llc -O0 -mtriple=x86_64-unknown-linux-gnu -o -  test/DebugInfo/X86/safestack-deref.ll

we used to get things like

.Ltmp1:
        #DEBUG_VALUE: func:value <- [DW_OP_constu 8, DW_OP_minus, DW_OP_deref] $rdx
...
        movq    %rdx, (%rsp)           # 8-byte Spill
.Ltmp2:
        #DEBUG_VALUE: func:value <- [DW_OP_constu 8, DW_OP_minus, DW_OP_deref] [$rsp+0]
        #DEBUG_VALUE: func:value <- [DW_OP_constu 8, DW_OP_minus, DW_OP_deref] [$rsp+0]

So before the spill we say that func:value can be found by doing [$rdx-8],
and after the spill [[$rsp] - 8] (where [] denotes dereference).

That seems consistent for the three DEBUG_VALUE comments.

But after this patch we get

.Ltmp1:
        #DEBUG_VALUE: func:value <- [DW_OP_constu 8, DW_OP_minus] [$rdx+0]
...
        movq    %rdx, (%rsp)           # 8-byte Spill
.Ltmp2:
        #DEBUG_VALUE: func:value <- [DW_OP_constu 8, DW_OP_minus] [$rsp+0]
        #DEBUG_VALUE: func:value <- [DW_OP_deref, DW_OP_constu 8, DW_OP_minus] [$rsp+0]

Before the spill we say that func:value is given by [$rdx]-8.
After the spill we got two different variants, either [$rsp]-8 or [[$rsp]]-8.

If I for example change the triple to i686 we get a different spill slot, so then we get

        movl    %edx, 4(%esp)           # 4-byte Spill
.Ltmp1:
        #DEBUG_VALUE: func:value <- [DW_OP_plus_uconst 4, DW_OP_constu 8, DW_OP_minus] [$esp+0]
        #DEBUG_VALUE: func:value <- [DW_OP_plus_uconst 4, DW_OP_deref, DW_OP_constu 8, DW_OP_minus] [$esp+0]

and anything not starting with a deref of $esp+4 is ofcourse not making any sense. So that indicates some kind of improvement with this patch (since we used to get the deref at the end here).
So maybe my problem is related to the extra DBG_VALUE that is lacking DW_OP_deref. I haven't figured out where that comes from yet.

Btw, now we got a difference between the if (SD->isIndirect()) handling at line 708 and 756. Is that making sense? Or is that one wrong as well?

bjope added a comment.Feb 4 2020, 10:47 AM

This broke some of our downstream tests using vla:s. Those are compiled from C and executed in gdb. What happens is that DW_OP_deref disappears and then the debugger prints the wrong value.

Oh, maybe I should have mentioned that the regression I found was related to -O1 or higher.

If I run

llc -O1 -mtriple=x86_64-unknown-linux-gnu -o - test/DebugInfo/X86/safestack-deref.ll

then I guess I see the same regression:

Before this patch we get

.Ltmp0:
        #DEBUG_VALUE: func:value <- [DW_OP_constu 8, DW_OP_minus, DW_OP_deref] $rbx

After this patch we get

.Ltmp0:
        #DEBUG_VALUE: func:value <- [DW_OP_constu 8, DW_OP_minus] $rbx

So the deref has not only moved to the other side of the subtraction, it has disappeared completely.

Maybe there is some difference between FastISel and "normal" ISel here?

Btw, I agree that doing append (like the old code) also seems wrong in case the Expr isn't empty. What was the problem with doing prepend instead?

I'll be able to take a look at this on Friday, but in the meantime if there's no fixes for this and waiting until then is too long then it's probably better to revert this. Sorry for the breakage.

This broke some of our downstream tests using vla:s. Those are compiled from C and executed in gdb. What happens is that DW_OP_deref disappears and then the debugger prints the wrong value.

Separate from my previous comment, would it be possible to share your downstream test for debugging, or at least a "sanitized" version safe for upstream?

bjope added a comment.Feb 4 2020, 1:18 PM

The problem is that ever since we got https://reviews.llvm.org/D68945 LiveDebugVariables no longer expects that the DBG_VALUE is indirect (having an immeadiate as second operand) before DBG_VALUE is removed (prior to regalloc). So that patch changed the playground a little bit, as LiveDebugVariables no longer adds DW_OP_deref when adding back the DBG_VALUE instructions after regalloc.

That is why SelectionDAG etc was changed to add DW_OP_deref explicitly. And I suspect that it incorrectly used append instead of prepend in some places when doing that change.

A little bit interesting that D68945 says "I've stripped out the "IsIndirect" field throughout LiveDebugVariables as its use case is gone, and added an assertion that LiveDebugVariables doesn't see any incoming DBG_VALUEs with IsIndirect set." and after first have landed this patch (https://reviews.llvm.org/rGfff6a1b0f1fe57b46379001db75952d2a06eab1f) it looks like it was reverted, and then it was landed again as https://reviews.llvm.org/rG2d3174c4df6b5f4131346828d0a31675d80d6e2b), but since the revision wasn't re-opened (?) there was no "update to reflect the commited change". However, that commit was changing the condition in the assert which was supposed to protect this from happening. Bah... yuck... that was evil.

So maybe we should revert this again then. Because afaict this isn't playing by the rules stipulated by D68945. Simply changing the assert condition was definitely not enough to add back all the logic that was removed from LiveDebugVariables in D68945.

bjope added a comment.Feb 4 2020, 2:05 PM

This broke some of our downstream tests using vla:s. Those are compiled from C and executed in gdb. What happens is that DW_OP_deref disappears and then the debugger prints the wrong value.

Separate from my previous comment, would it be possible to share your downstream test for debugging, or at least a "sanitized" version safe for upstream?

As I wrote earlier, this is seen when running llc -O1 -mtriple=x86_64-unknown-linux-gnu -o - test/DebugInfo/X86/safestack-deref.ll.

If also adding -print-after-all you'll find

DBG_VALUE %2:gr64, 0, !"value", !DIExpression(DW_OP_constu, 8, DW_OP_minus), debug-location !7; /tmp/test3.cpp:8:5 line no:8 indirect

before regalloc, and after you get

DBG_VALUE $rbx, $noreg, !"value", !DIExpression(DW_OP_constu, 8, DW_OP_minus), debug-location !7; /tmp/test3.cpp:8:5 line no:8

so LiveDebugVariables simply drops the "indirect" information.

Is that good enough as a test case?

uabelho added a subscriber: uabelho.Feb 5 2020, 4:13 AM

Hmn -- it looks like a / the root cause of this is that the O0 pipeline has a different mechanism for handling DBG_VALUEs of spills [0], hence changing isIndirect for both compilation modes is causing trouble. The code path in [0] probably should have been updated in D68945, unfortunately I'm not familiar with O0, at all, and wasn't really aware of it. Fascinatingly it runs LiveDebugValues but not LiveDebugVariables...

Given the risk of something being broken in O0, I really want to get D68945 reverted in llvm-10; without objection I'll do that on master tomorrow then backport to -10. This will make safestack work again; and the regression test will keep it working in the future, once this is all ironed out.

(Apologies for the additional downstream merge pain, this should just be reverting to prior behaviour though).

[0] https://github.com/llvm/llvm-project/blob/9986b88e64f30f5d958eef113bae4c8a098eea93/llvm/lib/CodeGen/MachineInstr.cpp#L2073

jmorse added a comment.Feb 6 2020, 6:43 AM

Pushed up reverts in 6531a78ac4b5. I'll start looking at an actual fix for the -O0 path back in D68945.