This is an archive of the discontinued LLVM Phabricator instance.

[LLVM] Change to Verifier to allow Fortran CHARACTER types in debug information
AbandonedPublic

Authored by schweitz on Aug 8 2017, 12:08 PM.

Details

Summary

Adds DW_TAG_string_type as a valid tag type. This allows flang to generate the proper DWARF tag type for source-level CHARACTER variables without an assertion failure.

Diff Detail

Event Timeline

schweitz created this revision.Aug 8 2017, 12:08 PM

Generally fine, but could you please add a testcase?

Generally, we have test cases in flang that would exercise this, so it's a little chicken and egg in that regard. I'll try to whittle something down.

schweitz updated this revision to Diff 110290.Aug 8 2017, 3:24 PM

Adding a test case.

schweitz updated this revision to Diff 110293.Aug 8 2017, 3:30 PM

Attach both change and test.

aprantl accepted this revision.Aug 8 2017, 4:39 PM

I guess having this test in test/DebugInfo makes more sense than a "positive" test in test/Verifier.
Somewhat related question: Do you also need/want to add a DIBuilder API to create the type?

string-type.ll
16 ↗(On Diff #110293)

shouldn't this be something.f90? :-)

This revision is now accepted and ready to land.Aug 8 2017, 4:39 PM
aprantl added inline comments.Aug 8 2017, 4:39 PM
string-type.ll
2 ↗(On Diff #110293)

=debug-dump=info -> -debug-dump=info

schweitz updated this revision to Diff 110309.Aug 8 2017, 5:10 PM

Fixes for Adrian's comments and cut-n-paste errors. (I'm learning Phabricator.)

It looks like the proper protocol is to note here that I don't have commit privileges and need someone to do that for me. Thanks in advance.

Hmm.. when I'm running your test I get:

0x00000056:   DW_TAG_string_type [4]  
                DW_AT_name [DW_FORM_strp]	( .debug_str[0x00000026] = "character")
                DW_AT_encoding [DW_FORM_data1]	(DW_ATE_signed)
                DW_AT_byte_size [DW_FORM_data1]	(0x01)

0x0000005d:   NULL

But the standard says in 5.11:

A string type entry may have a DW_AT_type attribute describing how each
character is encoded and is to be interpreted. The value of this attribute is a
reference to a DW_TAG_base_type base type entry. If the attribute is absent, then
the character is encoded using the system default.

schweitz added a comment.EditedAug 8 2017, 5:48 PM

Hmm.. when I'm running your test I get:

0x00000056:   DW_TAG_string_type [4]  
                DW_AT_name [DW_FORM_strp]	( .debug_str[0x00000026] = "character")
                DW_AT_encoding [DW_FORM_data1]	(DW_ATE_signed)
                DW_AT_byte_size [DW_FORM_data1]	(0x01)

0x0000005d:   NULL

But the standard says in 5.11:

A string type entry may have a DW_AT_type attribute describing how each
character is encoded and is to be interpreted. The value of this attribute is a
reference to a DW_TAG_base_type base type entry. If the attribute is absent, then
the character is encoded using the system default.

Well, this isn't a DW_TAG_base_type, which is what section 5.1 concerns. The test overrides the tag to be DW_TAG_string_type. (Section 5.11 regards DW_TAG_set_type.)

It is true that we really don't need DW_AT_encoding for DW_TAG_string_type.

Ultimately, a new !DIStringType will be added. But for now baby steps to get off the ground.

string-type.ll
2 ↗(On Diff #110293)

Fixed in last attached diff.

16 ↗(On Diff #110293)

Added a more snazzy name as well. ;)

With this patch's encoding I'm a little worried by the discrepancy between the size and tag.
What about representing strings as !DIDerivedType(tag: DW_TAG_string_type, baseType: !DIBasicType(name: "character", size: 8, align: 8, encoding: DW_ATE_signed) instead?
The patch would be just as simple (allowing the string type tag in DIDerivedType instead), but the representation would be closer to what will be emitted in the end.

aprantl requested changes to this revision.Aug 9 2017, 8:41 AM
This revision now requires changes to proceed.Aug 9 2017, 8:41 AM

With this patch's encoding I'm a little worried by the discrepancy between the size and tag.
What about representing strings as !DIDerivedType(tag: DW_TAG_string_type, baseType: !DIBasicType(name: "character", size: 8, align: 8, encoding: DW_ATE_signed) instead?
The patch would be just as simple (allowing the string type tag in DIDerivedType instead), but the representation would be closer to what will be emitted in the end.

CHARACTER(n) is an intrinsic type in Fortran, not a derived type. The CHARACTER type has a length type parameter.

Am I understanding correctly that character is the name of the DW_TAG_string_type and not it's single-character base type? To make sure we are on the same page, could you provide a pseudo-dwarfdump-output example of what you would like the correct/ideal/best representation of a Fortran string to look like?

CHARACTER(3) foo = 'abc'


DW_TAG_string_type:

DW_AT_name = "character(3)"
DW_AT_byte_size = 3

It is understood that LLVM will add a superfluous DW_AT_encoding attribute as well. This has not been a real issue when testing existing debuggers with our Fortran compilers. It is generally expected that debuggers will ignore attributes that they don't understand in context. This patch enables the display of CHARACTER type variables under gdb when using flang, for example.

(gdb) p foo
$1 = 'abc'
(gdb)

I made quick experiment using this representation:

!21 = !DIDerivedType(tag: DW_TAG_string_type, name: "character(3)", size: 24, baseType: !DIBasicType(size: 8, align: 8, encoding: DW_ATE_signed))

which comes out as

0x00000056:   DW_TAG_string_type
                DW_AT_type	(cu + 0x0060 => {0x00000060})
                DW_AT_name	( .debug_str[0x00000026] = "character(3)")
                DW_AT_byte_size	(0x03)

0x00000060:   DW_TAG_base_type
                DW_AT_encoding	(DW_ATE_signed)
                DW_AT_byte_size	(0x00)

This still has the superfluous base type, but it appears to conform to the DWARF specification. Would something like this work for you?

If not we might be able to model it as a DICompositeType instead. Are you interested in supporting DW_AT_string_length at a future point in time? If so it would make more sense to introduce a DIStringType now so can model it correctly.

I made quick experiment using this representation:

!21 = !DIDerivedType(tag: DW_TAG_string_type, name: "character(3)", size: 24, baseType: !DIBasicType(size: 8, align: 8, encoding: DW_ATE_signed))

which comes out as

0x00000056:   DW_TAG_string_type
                DW_AT_type	(cu + 0x0060 => {0x00000060})
                DW_AT_name	( .debug_str[0x00000026] = "character(3)")
                DW_AT_byte_size	(0x03)

0x00000060:   DW_TAG_base_type
                DW_AT_encoding	(DW_ATE_signed)
                DW_AT_byte_size	(0x00)

This still has the superfluous base type, but it appears to conform to the DWARF specification. Would something like this work for you?

No, that's even more verbose and not what's desired.

I just looked at how C arrays are represented.

for char s[3]; we currently generate:

!6 = !DICompositeType(tag: DW_TAG_array_type, baseType: !7, size: 24, elements: !8)
!7 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
!8 = !{!9}
!9 = !DISubrange(count: 3)

We could similarly use DICompositeType as a vehicle for the Fortran strings, with the baseType field being optional, so we can represent a nonstandard encoding if necessary, but don't have to when the default encoding is used.

With DICompositeType our character(3) would look like this:
!DICompositeType(tag: DW_TAG_string_type, name: "character(3)", size: 24, align: 8)

Having separate names for each length of string seems to be a bit wasteful, but probably isn't too bad.
How you are planning to represent variable-length strings?

Some notes:
Section 5.10 of the DWARF standard specifies the correct, vendor-neutral handling of Fortran's CHARACTER string type.
Section 4.4.3 of the Fortran standard explains the semantics of the CHARACTER type.
For what it's worth, the DW_AT_encoding, while not listed in Appendix A of DWARF (Ver. 4), might actually be quite desirable for Fortran compilers and debuggers for encoding the KIND of the CHARACTER type (in particular when supporting unicode).

What I was getting at was that if you are planning to support variable-length strings with a DW_AT_string_length attribute anytime in the near future, then it would be best to think about this now so we don't have to migrate the string representation later when it becomes necessary. Otherwise I'm fine with either:

  1. representing Fortran strings as DIDerivedType (with the extra base type displaying the encoding)
  2. representing them as DICompositeType (similar to arrays)

Note that the second option may be interesting because of the fact that IIRC Fortran supports also variable-length arrays with a length (and for assumed-rank arrays even a dimensionality) field that has to be represented in LLVM debug info metadata similarly to the length of the string.

Form your last reply I was not quite sure whether you prefer either of these options or if you think that they are both unsuitable, so please let me know what you think about this.

The argument that the DWARF presentation to existing Fortran debug vendors must be changed doesn't seem too compelling.

The argument that the DWARF presentation to existing Fortran debug vendors must be changed doesn't seem too compelling.

I'm having a little difficulty to understand what you are saying, so let me attempt to rephrase this in my own words.
Am I understanding correctly that you want the DWARF representation of fixed-size Fortran strings to be:

DW_TAG_string_type:
  DW_AT_name = "character(3)"
  DW_AT_byte_size = 3

?

As I said in my previous reply, I see two reasonable options for representing Fortran strings in *LLVM IR*, we could either using DIDerivedType and extending DICompositeType. In both cases, we can find a solution for dropping the base-DW_AT_type from DW_TAG_string when it isn't necessary. The decision for which of the two IR representations to use mostly depends on what plans you have for variable-length strings and arrays in your frontend.

To further clarify: regardless of whether we represent Fortran strings as DICompositeTypes or as DIDerivedType, this LLVM-internal representation would not affect the DWARF representation generated from it.

schweitz abandoned this revision.Aug 14 2017, 3:20 PM