This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Add a DW_OP_LLVM_entry_value operation
ClosedPublic

Authored by dstenb on Sep 12 2019, 4:30 AM.

Details

Summary

Internally in LLVM's metadata we use DW_OP_entry_value operations with
the same semantics as DWARF; that is, its operand specifies the number
of bytes that the entry value covers.

At the time of emitting entry values we don't know the emitted size of
the DWARF expression that the entry value will cover. Currently the size
is hardcoded to 1 in DIExpression, and other values causes the verifier
to fail. As the size is 1, that effectively means that we can only have
valid entry values for registers that can be encoded in one byte, which
are the registers with DWARF numbers 0 to 31 (as they can be encoded as
single-byte DW_OP_reg0..DW_OP_reg31 rather than a multi-byte
DW_OP_regx). It is a bit confusing, but it seems like llvm-dwarfdump
will print an operation "correctly", even if the byte size is less than
that, which may make it seem that we emit correct DWARF for registers
with DWARF numbers > 31. If you instead use readelf for such cases, it
will interpret the number of specified bytes as a DWARF expression. This
seems like a limitation in llvm-dwarfdump.

As suggested in D66746, a way forward would be to add an internal
variant of DW_OP_entry_value, DW_OP_LLVM_entry_value, whose operand
instead specifies the number of operations that the entry value covers,
and we then translate that into the byte size at the time of emission.

In this patch that internal operation is added. This patch keeps the
limitation that a entry value can only be applied to simple register
locations, but it will fix the issue with the size operand being
incorrect for DWARF numbers > 31.

Diff Detail

Event Timeline

dstenb created this revision.Sep 12 2019, 4:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2019, 4:30 AM

@dstenb Thanks for this! This makes sense.

Having suggested this, I think this is a good idea ;-)

llvm/docs/LangRef.rst
4784–4785

It's ambiguous where in the expression DW_OP_LLVM_entry_value can appear. Perhaps add an example or improve the wording? I assume it is still legal to combine a DW_OP_LLVM_entry_value with other operations?

4784–4785

not "DWARF expression" but in terms of std::distance of DIExpression operation iterators, right?

4784–4785

may appear -> is introduced by

4784–4785

is also introduced by

llvm/lib/IR/DebugInfoMetadata.cpp
893–898

Can someone remind me what the reason for this limitation was again? Do we still need it after this patch?

dstenb added inline comments.Sep 12 2019, 1:47 PM
llvm/docs/LangRef.rst
4784–4785

In this patch DW_OP_LLVM_entry_value has the same semantics as the DWARF variant, so the operand still specifies the number of bytes.

llvm/lib/IR/DebugInfoMetadata.cpp
893–898

I think that the main (or perhaps only?) reason for this limitation is that we, at emission, currently don't have a good way to calculate the size of the block that the entry value covers, and emit that size as a ULEB128 operand before the block. This patch will not change that.

Normally I would insist on a bitcode upgrade, but since this feature is so new I think we can skip that. However, because of bitcode compatability, we might want to get the semantics right from the start. Otherwise we might have to allocate a new opcode and implement an upgrade when we need to change it in the future.

llvm/lib/IR/DebugInfoMetadata.cpp
893–898

I see. Could you please add this to the comment here, so it's clear why the restriction exists and how to lift it in the future?

djtodoro added inline comments.Sep 12 2019, 11:23 PM
llvm/lib/IR/DebugInfoMetadata.cpp
893–898

I think that the main (or perhaps only?) reason for this limitation is that we, at emission, currently don't have a good way to calculate the size of the block that the entry value covers, and emit that size as a ULEB128 operand before the block.

We also wanted to ensure that LLVM currently only generates entry values of size 1.

aprantl added inline comments.Sep 13 2019, 8:23 AM
llvm/lib/IR/DebugInfoMetadata.cpp
893–898

Is there another reason for this other than that it happened to be hard to implement in LLVM at the time? Perhaps on the consumer side?

djtodoro added inline comments.Sep 16 2019, 7:58 AM
llvm/lib/IR/DebugInfoMetadata.cpp
893–898

Actually no. We found the simple register location as an entry value is very basic case (and easy to implement as the initial patch). We also knew that supporting the other types of entry values will take some time and we just left it as a future work.
The consumer (only GDB at the time) supports parsing various types of entry values.

djtodoro added inline comments.Sep 17 2019, 6:38 AM
llvm/lib/IR/DebugInfoMetadata.cpp
893–898

In addition to this, I work on extending the debug entry values to support complex expressions (more precisely, expressions other than simple register locations).

vsk added a comment.Sep 17 2019, 11:06 AM

Looks reasonable to me, I just suggest clarifying one comment. @aprantl any other outstanding concerns?

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
718–720

-> "will contain 1 operation (e.g. a register op, or a constant literal)" ?

dstenb marked an inline comment as done.Sep 17 2019, 11:57 AM
dstenb added inline comments.
llvm/lib/IR/DebugInfoMetadata.cpp
893–898

The consumer (only GDB at the time) supports parsing various types of entry values.

It seems like GDB only supports a few operations inside a DW_OP_entry_value. I have tried some examples with more advanced expressions, but then got hit with the following error from its expression evaluator:

DWARF-2 expression error: DW_OP_entry_value is supported only for single DW_OP_reg* or for DW_OP_breg*(0)+DW_OP_deref*

That error message seems present in GDB 8.3 at least. I might overlook something here though.

dstenb marked an inline comment as done.Sep 17 2019, 12:10 PM
dstenb added inline comments.
llvm/lib/IR/DebugInfoMetadata.cpp
893–898

In addition to this, I work on extending the debug entry values to support complex expressions (more precisely, expressions other than simple register locations).

Oh, okay! Sorry, I have been working on adding support for DW_OP_regx entry values, by extending DwarfExpression so that it can count the size of the entry value block, and emit that size operand before the block itself. I guess that overlaps with the complex expressions a bit?

I uploaded a WIP patch for that: D67674.

Do you think that would be usable for your work?

djtodoro added inline comments.Sep 17 2019, 11:19 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
718–720

Currently, we do not support expressions e.g. dw_op_entry_value $constant.

llvm/lib/IR/DebugInfoMetadata.cpp
893–898

It seems like GDB only supports a few operations inside a DW_OP_entry_value. I have tried some examples with more advanced expressions, but then got hit with the following error from its expression evaluator:

DWARF-2 expression error: DW_OP_entry_value is supported only for single DW_OP_reg* or for DW_OP_breg*(0)+DW_OP_deref*

That error message seems present in GDB 8.3 at least. I might overlook something here though.

You are right. Even GDB does not support every type of the debug entry value.

Oh, okay! Sorry, I have been working on adding support for DW_OP_regx entry values, by extending DwarfExpression so that it can count the size of the entry value block, and emit that size operand before the block itself. I guess that overlaps with the complex expressions a bit?

I uploaded a WIP patch for that: D67674.

Do you think that would be usable for your work?

No problem. :) Please continue with this work, I will focus on reconstructing debug values of modified parameters (and local variables) that can be expressed in terms of the debug entry values (e.g. dw_op_entry_value dw_op_reg; dw_op_plus const).

dstenb updated this revision to Diff 220875.Sep 19 2019, 9:04 AM
dstenb edited the summary of this revision. (Show Details)

Update patch after comments, and change the semantics of the new operation. Rather than only introducing an internal variant that has the same behavior as DW_OP_entry_value, the new operation's operand specifies the number of operations (including the value/address operand) that are covered by the entry value, and that value is then converted to the byte size when we emit the entry value in DwarfExpression.

To keep the changes reasonably large, this patch keeps the limitation that the operand can only be set to one, and entry values still only works for register debug values. To support DW_OP_LLVM_entry_values covering more than one operation, you would need to add more finalizeEntryValue logic to DwarfExpression.

I opted for the operand describing the number of operations that the entry values cover, rather than the std::distance in the DIExpression, under the assumption that this will make it slightly cleaner to implement things. I can change it to std::distance though if that is preferred.

To make the patch reasonably small and reviewable I split the code that adds the ability to count the size of the DWARF block in DwarfExpression into a parent patch.

aprantl added inline comments.Sep 20 2019, 9:22 AM
llvm/test/Verifier/diexpression-entry-value.ll
7

This is testing that *any* of these expressions are malformed. You either want a more specific CHECK line or more tests.

dstenb updated this revision to Diff 221333.Sep 23 2019, 7:52 AM

Make the diexpression-entry-value.ll test case FileCheck all DIExpressions.

dstenb marked 3 inline comments as done.Sep 23 2019, 7:54 AM
dstenb added inline comments.
llvm/test/Verifier/diexpression-entry-value.ll
7

Yes, thanks! I added checks for all three DIExpressions now.

djtodoro added inline comments.Sep 24 2019, 6:20 AM
llvm/test/Verifier/diexpression-valid-entry-value.ll
5

I am just curious. Having the D67768, do we really still need the size of following expression at this point?

dstenb marked 2 inline comments as done.Sep 24 2019, 12:35 PM
dstenb added inline comments.
llvm/test/Verifier/diexpression-valid-entry-value.ll
5

I guess that the size operand is currently redundant, and we can omit it. However, I have made the assumption that specifying the size will simplify things when we get to describing more complex variables like local in the example below using entry values:

int foo(int param) {
  int local = param + 1;
  use_of_param = param;
  clobber();
  return 0;
}

Here local will have the entry value of $edi + 1. GDB will not understand a DW_OP_entry_value whose block is that complex, so I imagine that we want to emit the same expression as GCC:

(DW_OP_entry_value: (DW_OP_reg5 (rdi)); DW_OP_plus_uconst: 1; DW_OP_stack_value)

I guess that the DW_OP_LLVM_entry_value operation could cover the whole expression in the DIExpression world, and we then wrap only the parts that actually needs to be covered by entry values when we lower that to DW_OP_entry_value in DwarfExpression. However, I'm a bit afraid that that would further complicate the DwarfExpression state machine, which is already quite complex.

Looking further onwards, I guess that we want to transform the current dbg.value/DBG_VALUE/DIExpression framework to something that allows us to describe a variable using multiple register / memory values. For example, like local in the example below:

int bar(int a, int b) {
  int local = a + b;
  [...]
}

I assume that we want to be able to describe such variables using entry values for only a subset of the registers. GCC is able to emit such expressions. I guess that having a DW_OP_LLVM_entry_value operation that only applies for a part of an DIExpression will be helpful when we get there, but that's pure speculation as I don't know how the debug framework will look.

aprantl added inline comments.Sep 24 2019, 5:55 PM
llvm/test/Verifier/diexpression-valid-entry-value.ll
5

IMO, we should make the IR support as generic as possible and accept more complex uses of DW_OP_LLVM_entry_value that LLVM currently cannot produce itself and generate the correct DWARF for it.

In order to generate that particular example though, it might be useful to add another operand (let's call it DW_OP_LLVM_dbg_value) to make the implicit first operand explicit, so this could be

!DIExpression(DW_OP_LLVM_entry_value, 1, DW_OP_LLVM_dbg_value, DW_OP_plus, DW_OP_uconst, 1, DW_OP_stack_value).

and the backend substitutes DW_OP_breg5 for DW_OP_dbg_value. In the future, we could further generalize llvm.dbg.value to take extra arguments that would show up as DW_OP_LLVM_dbg_value_[0-9]+ to allow binding more than one SSA value to a DIExpression.

aprantl added inline comments.Sep 24 2019, 6:00 PM
llvm/test/Verifier/diexpression-valid-entry-value.ll
5

(and of course to confuse everybody I accidentally flipped the order of DW_OP_plus and DW_OP_uconst in my example...)

djtodoro added inline comments.Sep 24 2019, 11:27 PM
llvm/test/Verifier/diexpression-valid-entry-value.ll
5

Here local will have the entry value of $edi + 1. GDB will not understand a DW_OP_entry_value whose block is that complex, so I imagine that we want to emit the same expression as GCC:

Sure. Not only local variables, I am currently extending the LiveDebugValues to support such scenario when parameters are part of such expression and I will share that in the following days. Let's keep it for now.

Let me say something about extending the llvm.dbg.value, I already got similar idea to extend the salvageDebugInfo() functionality when we are removing an instruction operating on two registers (rather then reg and constant), so I think that having that type of instruction will give us a lot of benefits, not only in situations like that. :)

djtodoro added inline comments.Sep 25 2019, 6:14 AM
llvm/test/Verifier/diexpression-valid-entry-value.ll
5

I guess that the DW_OP_LLVM_entry_value operation could cover the whole expression in the DIExpression world, and we then wrap only the parts that actually needs to be covered by entry values when we lower that to DW_OP_entry_value in DwarfExpression. However, I'm a bit afraid that that would further complicate the DwarfExpression state machine, which is already quite complex.

Having this pre-calculation interface at this point for entry values expressions won't be desirable for situations where entry value is just part of complex expression like you described, since that the DwarfExpression won't be able to distinguish what is entry value part and what is the rest of the complex expression (e.g. (DW_OP_entry_value: (DW_OP_reg5 (rdi)); DW_OP_plus_uconst: 1; DW_OP_stack_value) where the DIExpression would look as DIExpression(DW_OP_LLVM_entry_value, 1, DW_OP_plus_uconst, 1, DW_OP_stack_value)).
I think it would be ideally if we were to somehow create the right size operand from the entry value DIExpression when the entry value gets created during the LiveDebugValues (we agree that it is very hard, or even impossible), but we would know what part of a complex expression is the "entry value" part and what part is not. Therefore, in the current situation I don't think that the size operand (from entry value DIExpresion) has any impact on simplifying the support for the complex expressions like we described. In that situation we should just see if the DIExpression contains any additional DWARF operands and mark it as "non-entry value" part and somehow make another buffer for that until the entry value part gets printed via current interface.

dstenb marked 2 inline comments as done.Sep 25 2019, 7:21 AM
dstenb added inline comments.
llvm/test/Verifier/diexpression-valid-entry-value.ll
5

won't be able to distinguish what is entry value part and what is the rest of the complex expression

I think that is possible with this semantics. In this patch the operand specifies that the value operand and the (N - 1) number of operations are covered by the entry value, so for N > 1 shouldn't we be able to finish the entry value after emitting those remaining (N - 1) operations in DwarfExpression::addExpression()?

In your example N is 1, so we should only emit the entry value around the register. I think we need to modify the current implementation of DwarfExpression::addMachineRegExpression() a bit so that it emits a DW_OP_regx rather than taking the code path where the expression is optimized into a DW_OP_breg $reg, 1 [0]. We also need to make sure that the DW_OP_stack_value is emitted at the end of the whole expression, rather than after the entry value, which is currently done [1]. I think that with those changes it should be possible to emit your example using the semantics that this patch introduces, at the risk of overlooking some other issue.

I still haven't been able to think of an example where the DW_OP_LLVM_entry_value's size operand being larger than 1 would actually be useful though.

[0] https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp#L249
[1] https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp#L258

dstenb marked an inline comment as done.Sep 25 2019, 9:43 AM
dstenb added inline comments.
llvm/test/Verifier/diexpression-valid-entry-value.ll
5

Perhaps something along the lines of this: D68034. The location kind handling likely needs more thought though.

This looks good to me, thanks!

llvm/test/Verifier/diexpression-valid-entry-value.ll
5

I have not switched my self to the new semantics yet.. I see, this makes sense then :)

dstenb marked an inline comment as done.Sep 26 2019, 8:14 AM
dstenb added inline comments.
llvm/test/Verifier/diexpression-valid-entry-value.ll
5

Okay! I have hopefully not oversimplified what it what it would take to handle the expressions in DwarfExpression, but I at least think that it should be doable.

dstenb marked an inline comment as done.Sep 26 2019, 8:20 AM
dstenb added inline comments.
llvm/test/Verifier/diexpression-valid-entry-value.ll
5

Adding something like a DW_OP_LLVM_dbg_value, which makes the entry value's use of the value operand explicit, sounds reasonable, @aprantl! As I assume that that is a rather large change, which needs to be further fleshed out, could we perhaps consider introducing this patch's semantics as an intermediate step? When a DW_OP_LLVM_dbg_value is later on introduced, I assume that the bitcode reader would be able to upgrade the metadata by inserting such operations after the DW_OP_LLVM_entry_values to make the use of the value operand explicit?

aprantl added inline comments.Sep 27 2019, 10:48 AM
llvm/docs/LangRef.rst
4784–4785

The documentation currently doesn't make clear whether DW_OP_LLVM_entry_value considers the implicit SSA value bound by a dbg.value / the argument of a DBG_VALUE to be part of the entry value or the surrounding expression and/or whether it expects a hardcoded location such as DW_OP_breg0 inside the DW_OP_LLVM_entry_value. It also isn't clear whether DW_OP_LLVM_entry_value is only legal in MIR, or also in LLVM IR and what the semantics in LLVM IR are if it is allowed.

I think a few examples would go a long way here.

llvm/test/Verifier/diexpression-valid-entry-value.ll
5

Yes that should be a separate discussion. We need to make clear what the semantics without this addition are though. See my comment in LangRef above.

dstenb updated this revision to Diff 222445.Sep 30 2019, 8:51 AM

Update LangRef.

dstenb marked an inline comment as done.Sep 30 2019, 8:54 AM
dstenb added inline comments.
llvm/docs/LangRef.rst
4784–4785

I have now updated the text to hopefully make it more descriptive.

I have updated the documentation to say that the operation is only legal in MIR, mainly as the operation has not been tested in LLVM IR, and I personally don't see why anyone would want to emit it there. Any comments on that, @djtodoro and @NikolaPrica?

aprantl added inline comments.Sep 30 2019, 9:03 AM
llvm/docs/LangRef.rst
4784–4785

Thanks, this is more clear now! The fact that this is only legal in MIR also absolves us from bitcode compatibility issues. Can you add a verifier checks that asserts that no DbgInfoInstrinsic references a DIExpression with this opcode to make sure it's illegal in LLVM IR? This way we'll be free to experiment as much as we want with the semantics of this operator.

djtodoro added inline comments.Sep 30 2019, 11:35 PM
llvm/docs/LangRef.rst
4784–4785

This looks nice! Thanks!

If we face situation where the entry value operation is useful for the LLVM IR we can revisit this, but at the moment I can not imagine such situation.

dstenb updated this revision to Diff 222620.Oct 1 2019, 8:02 AM

Add IR verifier check.

dstenb marked 7 inline comments as done.Oct 11 2019, 6:52 AM

Any more comments on this or D67768?

aprantl accepted this revision.Oct 11 2019, 3:11 PM
aprantl added inline comments.
llvm/lib/IR/Verifier.cpp
5024 ↗(On Diff #222620)

"only allowed in MIR" is perhaps more helpful.

This revision is now accepted and ready to land.Oct 11 2019, 3:11 PM
dstenb marked an inline comment as done.Oct 15 2019, 4:20 AM

Thanks for the reviews! I'll merge this shortly.

llvm/lib/IR/Verifier.cpp
5024 ↗(On Diff #222620)

Yes, that's more helpful. I'll update that.

This revision was automatically updated to reflect the committed changes.