This is an archive of the discontinued LLVM Phabricator instance.

llvm rejects DWARF operator DW_OP_push_object_address.
ClosedPublic

Authored by alok on May 3 2020, 10:45 AM.

Details

Summary

llvm rejects DWARF operator DW_OP_push_object_address.
This DWARF operator is needed for Flang to support allocatable array support .

Summary

Currently llvm rejects DWARF operator DW_OP_push_object_address.
below error is produced when llvm finds this operator.
. . . . . .

invalid expression
!DIExpression(151)
warning: ignoring invalid debug info in pushobj.ll

. . . . . .
There are some parts missing in support of this operator, need to
be completed.

Testing

-added a unit testcase
-check-debuginfo
-check-llvm

Diff Detail

Event Timeline

alok created this revision.May 3 2020, 10:45 AM

I don't see you are touching the DIExpression::ExprOperand::getSize().

Can we verify the operation within DWARFVerifier ?

alok added a comment.May 4 2020, 2:03 AM

I don't see you are touching the DIExpression::ExprOperand::getSize().

Can we verify the operation within DWARFVerifier ?

Thanks for your review. DIExpression::ExprOperand::getSize() does special handling for operators with size >1, default case is 1. Since DW_OP_push_object_address doesnt have argument, its size is 1. it need not special mention.

switch (Op) {
case dwarf::DW_OP_LLVM_convert:
case dwarf::DW_OP_LLVM_fragment:
case dwarf::DW_OP_bregx:
  return 3;
case dwarf::DW_OP_constu:
case dwarf::DW_OP_consts:
case dwarf::DW_OP_deref_size:
case dwarf::DW_OP_plus_uconst:
case dwarf::DW_OP_LLVM_tag_offset:
case dwarf::DW_OP_LLVM_entry_value:
case dwarf::DW_OP_regx:
  return 2;
default:
  return 1;
}

I did not have any special verification for this (may be like other single sized operators), Please do let me know if have any verification in mind.

I don't see you are touching the DIExpression::ExprOperand::getSize().

Can we verify the operation within DWARFVerifier ?

Thanks for your review. DIExpression::ExprOperand::getSize() does special handling for operators with size >1, default case is 1. Since DW_OP_push_object_address doesnt have argument, its size is 1. it need not special mention.

switch (Op) {
case dwarf::DW_OP_LLVM_convert:
case dwarf::DW_OP_LLVM_fragment:
case dwarf::DW_OP_bregx:
  return 3;
case dwarf::DW_OP_constu:
case dwarf::DW_OP_consts:
case dwarf::DW_OP_deref_size:
case dwarf::DW_OP_plus_uconst:
case dwarf::DW_OP_LLVM_tag_offset:
case dwarf::DW_OP_LLVM_entry_value:
case dwarf::DW_OP_regx:
  return 2;
default:
  return 1;
}

OK, good, thanks!
I think you can add a testcase with an invalid use of push_object_address operation (such as !DIExpression(DW_OP_push_object_address, DW_OP_something)).

I did not have any special verification for this (may be like other single sized operators), Please do let me know if have any verification in mind.

llvm/test/DebugInfo/dwarfdump-pushobjectaddress.ll
2

The recommendation is (I've learned this recently) using double ; for high-level comments.

8

Likewise.

14

Likewise.

Can you explain in more detail how flang is going to use this? In LLVM DIExpressions expressions we currently never refer to the SSA values that are bound to the expression via a dbg.value/dbg.declare from within the expression. We've had discussions about how to do this in the past (see https://reviews.llvm.org/D70642 and the discussions preceding it) and I would like to make sure that this fits well into the design.

It would be particularly helpful if you could post example LLVM IR that would be produced by flang to make use of this feature and the DWARF that should be generated from it.

alok added a comment.May 5 2020, 1:25 AM

Can you explain in more detail how flang is going to use this? In LLVM DIExpressions expressions we currently never refer to the SSA values that are bound to the expression via a dbg.value/dbg.declare from within the expression. We've had discussions about how to do this in the past (see https://reviews.llvm.org/D70642 and the discussions preceding it) and I would like to make sure that this fits well into the design.

It would be particularly helpful if you could post example LLVM IR that would be produced by flang to make use of this feature and the DWARF that should be generated from it.

I apologies as the testcase I added is a little misleading. Please treat that as a meaning less test case just to check whether the operator is accepted in DIExpression. I dont plan to use the DW_OP_push_object_address inside dbg.value/dbg.addr/dbg.declare. Instead I plan to use it in Bounds (lowerBound, upperBound or count, stride) of DISubrange (in DWARF).

Let me explain how I plan to use it for allocatable arrays.

Fortran allocatable arrays have bound information in array descriptor, which can be accessed using certain offsets. I plan to generate IR something like below.

. . . . . . . . . . . . .
!11 = !DICompositeType(tag: DW_TAG_array_type, baseType: !12, size: 32, align: 32, elements: !13)
!13 = !{!14}
!14 = !DIFortranSubrange(lowerBound: !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 80, DW_OP_deref), upperBound: !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 120, DW_OP_deref), stride: !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 112, DW_OP_deref, DW_OP_plus_uconst, 4, DW_OP_mul))
. . . . . . . . . . . . . .

which would be translated to DWARF as

. . . . . . . . . . .
0x0000004d: DW_TAG_variable

DW_AT_location      (DW_OP_fbreg +168)
DW_AT_name  ("arr")
DW_AT_type  (0x00000065 "integer[]")

0x00000065: DW_TAG_array_type

DW_AT_type      (0x00000086 "integer")

0x0000006e: DW_TAG_subrange_type

DW_AT_type    (0x0000008d "__ARRAY_SIZE_TYPE__")
DW_AT_lower_bound     (DW_OP_push_object_address, DW_OP_plus_uconst 0x50, DW_OP_deref)
DW_AT_upper_bound     (DW_OP_push_object_address, DW_OP_plus_uconst 0x78, DW_OP_deref)
DW_AT_byte_stride     (DW_OP_push_object_address, DW_OP_plus_uconst 0x70, DW_OP_deref, DW_OP_plus_uconst 0x4, DW_OP_mul)

. . . . . . . . . . .

Here DW_OP_push_object_address refers to the value of the DW_AT_location of DW_TAG_variable.

Please let me know if you need more details ?

alok marked an inline comment as done.May 5 2020, 1:31 AM
alok added inline comments.
llvm/test/DebugInfo/dwarfdump-pushobjectaddress.ll
2

Thanks for your comment. sorry I did not understand it clearly. Do you mean double semi colon ";;" in place of ";"?

alok marked 6 inline comments as done.May 5 2020, 1:37 AM
alok added inline comments.
llvm/test/DebugInfo/dwarfdump-pushobjectaddress.ll
2

Thanks for your comment, I shall update my patch.

8

Thanks for your comment, I shall update my patch.

14

Thanks for your comment, I shall update my patch.

alok marked 2 inline comments as done.May 5 2020, 1:39 AM
djtodoro added inline comments.May 5 2020, 1:41 AM
llvm/test/DebugInfo/dwarfdump-pushobjectaddress.ll
2

Yes. Thanks.

alok updated this revision to Diff 262027.May 5 2020, 2:05 AM

update for comments from djtodoro

djtodoro added inline comments.May 5 2020, 2:10 AM
llvm/test/DebugInfo/dwarfdump-pushobjectaddress.ll
2

Sorry for the confusion, but this should be:

;; This test checks whether DWARF operator DW_OP_push_object_address
;; is accepted and processed.
22
;; Below is the original test case this IR is generated from
;;---------------------------
;;int main() {
;;int var;
;;return var;
;;}
;;---------------------------
;; step 1: generate IR using -g -O0 -S -emit-llvm
;; step 2: insert DW_OP_push_object_address in dbg.declare instruction
;; This is meaningless test case focused to test DW_OP_push_object_address.
alok marked 4 inline comments as done.May 5 2020, 3:07 AM
alok added inline comments.
llvm/test/DebugInfo/dwarfdump-pushobjectaddress.ll
2

Thanks a lot for clarifying this. I shall update.

22

Thanks for clarification. I shall update this.

alok updated this revision to Diff 262050.May 5 2020, 3:08 AM
alok marked 2 inline comments as done.

Updated for comments from @djtodoro .

Thanks a lot for the example!

If DW_OP_push_object address is *only* going to be used inside a DISubrange and if it *always* appears at the beginning of a DIExpression bound there, we could make it implicit, the same way the first operator of a "normal" DIExpression in a dbg.value is implicitly specified by the dbg.value.

My motivation behind this is to keep the surface area of DIExpression small. We don't necessarily want to include all DWARF operators in DIExpression, because it would make it harder to emit other debug info formats from it, transform the expression during optimizations, and generally reason about them.

Would changing the backend to emit an automatic DW_OP_push_object before each DIExpression that represents a bound in a DISubrange work for flang, or are there other kinds of bound expressions / reasons why that would be a bad idea?

DavidTruby resigned from this revision.May 6 2020, 12:06 PM
alok added a comment.May 6 2020, 12:09 PM

Thanks a lot for the example!

If DW_OP_push_object address is *only* going to be used inside a DISubrange and if it *always* appears at the beginning of a DIExpression bound there, we could make it implicit, the same way the first operator of a "normal" DIExpression in a dbg.value is implicitly specified by the dbg.value.

My motivation behind this is to keep the surface area of DIExpression small. We don't necessarily want to include all DWARF operators in DIExpression, because it would make it harder to emit other debug info formats from it, transform the expression during optimizations, and generally reason about them.

Would changing the backend to emit an automatic DW_OP_push_object before each DIExpression that represents a bound in a DISubrange work for flang, or are there other kinds of bound expressions / reasons why that would be a bad idea?

I understand the concerns for keeping DIExpression small. But considering the DW_OP_push_object_address implicit would prevent the freedom to use expression which don't use this (DW_OP_push_object_address) DWARF operator. If it is big a 'no' for adding DW_OP_push_object_address to DIExpression, then as you suggested I would need DW_OP_LLVM_arg0 from https://reviews.llvm.org/D70642 .

Thanks a lot for the example!

If DW_OP_push_object address is *only* going to be used inside a DISubrange and if it *always* appears at the beginning of a DIExpression bound there, we could make it implicit, the same way the first operator of a "normal" DIExpression in a dbg.value is implicitly specified by the dbg.value.

My motivation behind this is to keep the surface area of DIExpression small. We don't necessarily want to include all DWARF operators in DIExpression, because it would make it harder to emit other debug info formats from it, transform the expression during optimizations, and generally reason about them.

Would changing the backend to emit an automatic DW_OP_push_object before each DIExpression that represents a bound in a DISubrange work for flang, or are there other kinds of bound expressions / reasons why that would be a bad idea?

I understand the concerns for keeping DIExpression small. But considering the DW_OP_push_object_address implicit would prevent the freedom to use expression which don't use this (DW_OP_push_object_address) DWARF operator. If it is big a 'no' for adding DW_OP_push_object_address to DIExpression, then as you suggested I would need DW_OP_LLVM_arg0 from https://reviews.llvm.org/D70642 .

My goal is to find the most consistent and useful design for LLVM IR.

Currently (without D70642) DIExpressions modify a location specified by the context in which they appear. A DIExpression bound to a dbg.value() might modify a DW_OP_reg, a DIExpression bound by a DIGlobalVariableExpression a DW_OP_addr, etc. Based on that prior art, it would seem logical to say that a DIExpressions inside a DISubrange implicitly starts with a DW_OP_push_object_address.

Here are a couple of questions to help decide whether that argument makes sense:

  1. Can there be expressions in a DISubrange that don't contain a DW_OP_push_object_address?
  2. Can there be expressions in a DISubrange where DW_OP_push_object_address is not the first operator?
  3. Can DW_OP_push_object_address appear outside of a DISubrange?

If the answer to all questions in no, then I argue we should make it implicit. If the answer to any is yes, I would like to see the counterexample, to decide what the best course of action is. (For example, if the primary concern is (2), the more general DW_OP_LLVM_arg0 seems like a superior solution over adding the very DWARF-specific DW_OP_push_object_address).

I can see that DW_OP_push_object_address would be the norm for a Fortran array with a descriptor, but I wouldn't expect it to be used for all subranges. For example a VLA would need to point to a separate entity containing the upper bound, but that separate entity wouldn't be tied to the object address.

alok added a comment.May 11 2020, 1:35 PM

Thanks a lot for the example!

If DW_OP_push_object address is *only* going to be used inside a DISubrange and if it *always* appears at the beginning of a DIExpression bound there, we could make it implicit, the same way the first operator of a "normal" DIExpression in a dbg.value is implicitly specified by the dbg.value.

My motivation behind this is to keep the surface area of DIExpression small. We don't necessarily want to include all DWARF operators in DIExpression, because it would make it harder to emit other debug info formats from it, transform the expression during optimizations, and generally reason about them.

Would changing the backend to emit an automatic DW_OP_push_object before each DIExpression that represents a bound in a DISubrange work for flang, or are there other kinds of bound expressions / reasons why that would be a bad idea?

I understand the concerns for keeping DIExpression small. But considering the DW_OP_push_object_address implicit would prevent the freedom to use expression which don't use this (DW_OP_push_object_address) DWARF operator. If it is big a 'no' for adding DW_OP_push_object_address to DIExpression, then as you suggested I would need DW_OP_LLVM_arg0 from https://reviews.llvm.org/D70642 .

My goal is to find the most consistent and useful design for LLVM IR.

Currently (without D70642) DIExpressions modify a location specified by the context in which they appear. A DIExpression bound to a dbg.value() might modify a DW_OP_reg, a DIExpression bound by a DIGlobalVariableExpression a DW_OP_addr, etc. Based on that prior art, it would seem logical to say that a DIExpressions inside a DISubrange implicitly starts with a DW_OP_push_object_address.

Thanks for the explanation and pointers.

Here are a couple of questions to help decide whether that argument makes sense:

  1. Can there be expressions in a DISubrange that don't contain a DW_OP_push_object_address?

Assumed shape arrays can skip using DW_OP_push_object_address, but probably that can use artificial variable reference in place of expression (not sure if all cases can do so), other than that few bounds like stride of first dimension can use constant expression, though we can probably avoid that by using integer in place of expression depending on which way we want to go.

  1. Can there be expressions in a DISubrange where DW_OP_push_object_address is not the first operator?

I can not think of anything where it comes other than as first operator.

  1. Can DW_OP_push_object_address appear outside of a DISubrange?

what I can think of is DW_AT_data_location, DW_AT_allocated, DW_AT_associated.

If the answer to all questions in no, then I argue we should make it implicit. If the answer to any is yes, I would like to see the counterexample, to decide what the best course of action is. (For example, if the primary concern is (2), the more general DW_OP_LLVM_arg0 seems like a superior solution over adding the very DWARF-specific DW_OP_push_object_address).

From Paul's comment, it sounds like we do want to allow DW_OP_push_object_address in LLVM IR. Can you please add documentation for it to LangRef.rst or SourceLevelDebugging.rst (wherever the other operations are explained)?

llvm/test/DebugInfo/dwarfdump-pushobjectaddress.ll
5

RUN: %llc_dwarf %s -O2 -filetype=obj -o - | llvm-dwarfdump - | FileCheck %s

alok marked 2 inline comments as done.May 13 2020, 10:31 AM

From Paul's comment, it sounds like we do want to allow DW_OP_push_object_address in LLVM IR. Can you please add documentation for it to LangRef.rst or SourceLevelDebugging.rst (wherever the other operations are explained)?

Thanks for your comment. I shall incorporate it in next version of patch.

llvm/test/DebugInfo/dwarfdump-pushobjectaddress.ll
5

Thanks for suggestion. It will be incorporated in next version.

alok updated this revision to Diff 263774.May 13 2020, 10:34 AM
alok marked an inline comment as done.

Addressed comments from @aprantl and re-based.

aprantl accepted this revision.May 14 2020, 10:35 AM
This revision is now accepted and ready to land.May 14 2020, 10:35 AM

Thanks for your patience :-)

This revision was automatically updated to reflect the committed changes.
broadwaylamb added inline comments.
llvm/test/DebugInfo/dwarfdump-pushobjectaddress.ll
5

It looks like this breaks one of our bots: http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/7378

The issue is that our bot only builds the ARM target.

Probably just a matter of annotating this test as REQUIRES: x86_64-linux. Could you please fix it?

It looks like this breaks one of our bots: http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/7378
The issue is that our bot only builds the ARM target.
Probably just a matter of annotating this test as REQUIRES: x86_64-linux. Could you please fix it?

Sorry for the delay, Fixed in 2c81508728168f119023cc697300d3ca7489b3d2