This is an archive of the discontinued LLVM Phabricator instance.

[docs] Clarify role of DIExpressions within debug intrinsics
ClosedPublic

Authored by vsk on Jul 19 2018, 3:24 PM.

Details

Summary

This is an attempt to make the semantics of DIExpressions within
llvm.dbg.{addr, declare, value} easier to understand.

Diff Detail

Repository
rL LLVM

Event Timeline

vsk created this revision.Jul 19 2018, 3:24 PM
rnk added inline comments.Jul 19 2018, 4:14 PM
docs/LangRef.rst
4613 ↗(On Diff #156375)

What makes value operands to dbg.value implicit or concrete in LLVM IR? Are SSA values from local instructions concrete, and constants implicit? We could describe that here.

vsk added inline comments.Jul 19 2018, 4:26 PM
docs/LangRef.rst
4613 ↗(On Diff #156375)

Sure. The way I read it, it depends on the DIType of the described variable. The value operand is concrete iff it's is a pointer to an instance of that DIType. So, the value operand in dbg.value(const-ptr-null, "int *p") is implicit, but concrete in dbg.value(const-ptr-null, "int").

At least, that's the only consistent explanation I've thought of. I don't know how the backend actually determines this. IIUC D49454/D49520 is an example of the backend getting this wrong: it treats a pointer to a std::deque as the implicit location of the std::deque.

bjope added inline comments.Jul 20 2018, 6:55 AM
docs/LangRef.rst
4605 ↗(On Diff #156375)

Is it true that a debugger *must* be able to modify the variable for an llvm.dbg.addr? Any specific reason, or are we just trying to put limitations on the DIExpression in a llvm.dbg.addr intrinsic?

4613 ↗(On Diff #156375)

My interpretation (with very little experience of llvm.dbg.addr) has been that llvm.dbg.addr is the IR version of an *indirect* DBG_VALUE. And llvm.dbg.value is the IR version of an *non-indirect* DBG_VALUE. At least that seems to be the difference in SelectionDAG.
Afaict the first argument in a dbg.value, together with the DIExpression, describes the value of the variable. The first argument in dbg.addr, together with the DIExpression, describes the address of the variable. And I think the first argument in dbg.value should be treated as a value, and the first argument in dbg.addr should be treated as an indirect pointer.

A DIExpression might be used both in dbg.declare, dbg.addr, dbg.value, direct DBG_VALUE and indirect DBG_VALUE, and it could be both tricky and confusing how to interpret the DIExpression. Depending on which intrinsic that is used, or if the DBG_VALUE is direct/indirect, the DIExpression could have an implied DW_OP_stack_value, DW_OP_deref, at the end (or even at the front?).
As it might be hard to understand this, improving the documentation is a really nice initiative!

One question is if we need to be able to indicate that there is an indirect value operand in a dbg.value. Or isn't it enough that if you for example want to describe a variables !Y:s value as (X[0] + 5), then you need to include a DW_OP_deref such as

dbg.value(X, !Y, DIExpression(DW_OP_deref, DW_OP_constu 5, DW_OP_add))

The above will become a direct DBG_VALUE since dbg.value is used. The DW_OP_deref is needed since by default the first argument in dbg.value is treated as a value and not a pointer. The variable will be described using an "implicit location" (DWARF terminology).

Are you even saying that depending on !Y it might be wrong to have the DW_OP_deref here?

Btw, I think it is confusing to use "concrete" as terminology for the value operand. Isn't the question if the value operand is direct or indirect (if it is a value or a pointer)?

vsk updated this revision to Diff 156914.Jul 23 2018, 4:19 PM
vsk marked an inline comment as done.
vsk edited the summary of this revision. (Show Details)
vsk added inline comments.
docs/LangRef.rst
4605 ↗(On Diff #156375)

No, I'll walk this back. It's valid to describe a read-only memory location. After thinking about it some more, I don't think there's really an issue with DW_OP_stack_value inside of a llvm.dbg.addr either.

4613 ↗(On Diff #156375)

My first response to @rnk here was incorrect: implicit vs. concrete is not the same distinction as direct vs. indirect. The latter is the relevant distinction and it has nothing to do with DIType.

I consider @bjope's description here to be the "common sense" one we all *thought* was correct: interpreting a dbg.value should give a direct value, and interpreting a dbg.{addr,declare} should give an indirect value. I'll update this patch to make those definitions precise.

Basically, there should be exactly one way to interpret a DIExpression, without any implicit DW_OP_stack_value or DW_OP_deref added based on the context of which intrinsic / what type of location you have. Once we land the fix in D49454 I think we'll either *actually* have that model or be really close. Right now there is some special magic with non-empty DIExpressions, but I hope to eliminate that.

vsk updated this revision to Diff 156944.Jul 23 2018, 5:40 PM
  • Minor wordsmithing.
rnk accepted this revision.Jul 27 2018, 4:15 PM

lgtm

This revision is now accepted and ready to land.Jul 27 2018, 4:15 PM
This revision was automatically updated to reflect the committed changes.