This is an archive of the discontinued LLVM Phabricator instance.

Adding debug info to support Fortran (part 2)
AbandonedPublic

Authored by schweitz on Nov 5 2018, 9:49 AM.

Details

Summary
  1. Fortran Type Support

2.1 CHARACTER Intrinsic Type
There is no analog in C for the Fortran CHARACTER type. The Fortran CHARACTER type maps to the DWARF tag, DW_TAG_string_type. We have added a new named DI to LLVM to generate this DWARF information.
!21 = !DIStringType(name: “character(5)”, size: 40)

This produces the following DWARF information.
DW_TAG_string_type:

DW_AT_name: “character(5)”
DW_AT_byte_size: 5

CHARACTER types can also have deferred length. This is supported in the new metadata as follows.
!22 = !DIStringType(name: “character(*)!1”, size: 32, stringLength: !23, stringLengthExpression: !DIExpression())
!23 = !DILocalVariable(scope: !3, arg: 4, file: !4, type: !5, flags: DIFlagArtificial)

This will generate the following DWARF information.
DW_TAG_string_type:

DW_AT_name: character(*)!1

DW_AT_string_length: 0x9b (location list)
DW_AT_byte_size: 4

2.2 Fortran Array Types and Bounds
In this section we refer to the DWARF tag, DW_TAG_array_type, which is used to describe Fortran arrays.
However in Fortran, arrays are not types but are rather runtime data objects, a multidimensional rectangular set of scalar data of homogeneous type. An array object has dimensions (rank and corank) and extents in those dimensions. The rank and ranges of the extents of an array may not be known until runtime. Arrays may be reshaped, acted upon in whole or in part, or otherwise be referenced (perhaps even in reverse order) non-contiguously. Furthermore arrays may be allocated and deallocated at runtime and aliased through other POINTER objects. In short, Fortran array objects are not readily mappable to the C family of languages model of arrays, and more expressive DWARF information is required.
2.2.1 Explicit array dimensions
An array may be given a constant size as in the following example. The example shows a two-dimensional array, named array, that has indices from 1 to 10 for the rows and 2 to 11 for the columns.
TYPE(t) :: array(10,2:11)

For this declaration, the compiler generates the following LLVM metadata.
!100 = !DIFortranArrayType(baseType: !7, elements: !101)
!101 = !{ !102, !103 }
!102 = !DIFortranSubrange(constLowerBound: 1, constUpperBound: 10)
!103 = !DIFortranSubrange(constLowerBound: 2, constUpperBound: 11)

The DWARF generated for this is as follows. (DWARF asserts in the standard that arrays are interpreted as column-major.)
DW_TAG_array_type:

DW_AT_name: array
DW_AT_type: 4d08 ;TYPE(t)

DW_TAG_subrange_type:
DW_AT_type: int
DW_AT_lower_bound: 1
DW_AT_upper_bound: 10
DW_TAG_subrange_type:
DW_AT_type: int
DW_AT_lower_bound: 2
DW_AT_upper_bound: 11

2.2.2 Adjustable arrays
By adjustable arrays, we mean that an array may have its size passed explicitly as another argument.
SUBROUTINE subr2(array2,N)

INTEGER :: N

TYPE(t) :: array2(N)

In this case, the compiler expresses the !DISubrange as an expression that references the dummy argument, N.
call void @llvm.dbg.declare(metadata i64* %N, metadata !113, metadata !DIExpression())

!110 = !DIFortranArrayType(baseType: !7, elements: !111)
!111 = !{ !112 }
!112 = !DIFortranSubrange(lowerBound: 1, upperBound: !113, upperBoundExpression: !DIExpression(DW_OP_deref))
!113 = !DILocalVariable(scope: !2, name: “zb1”, file: !3, type: !4, flags: DIFlagArtificial)

It turned out that gdb didn’t properly interpret location lists or variable references in the DW_AT_lower_bound and DW_AT_upper_bound attribute forms, so the compiler must generate either a constant or a block with the DW_OP operations for each of them.
DW_TAG_array_type:

DW_AT_name: array2
DW_AT_type: 4d08 ;TYPE(t)

DW_TAG_subrange_type:
DW_AT_type: int
DW_AT_lower_bound: 1
DW_AT_upper_bound: 2 byte block: 91 70

2.2.3 Assumed size arrays
An assumed size array leaves the last dimension of the array unspecified.
SUBROUTINE subr3(array3)

TYPE(t) :: array3(*)

The compiler generates DWARF information without an upper bound, such as in this snippet.
DW_TAG_array_type
DW_AT_name: array3
DW_TAG_subrange_type
DW_AT_type = int
DW_AT_lower_bound = 1

This DWARF is produced by omission of the upper bound information.
!122 = !DIFortranSubrange(lowerBound: 1)

2.2.4 Assumed shape arrays
Fortran also has assumed shape arrays, which allow extra state to be passed into the procedure to describe the shape of the array dummy argument. This extra information is the array descriptor, generated by the compiler, and passed as a hidden argument.
SUBROUTINE subr4(array4)

TYPE(t) :: array4(:,:)

In this case, the compiler generates DWARF expressions to access the results of the procedure’s usage of the array descriptor argument when it computes the lower bound (DW_AT_lower_bound) and upper bound (DW_AT_upper_bound).

call void @llvm.dbg.declare(metadata i64* %4, metadata !134, metadata !DIExpression())
call void @llvm.dbg.declare(metadata i64* %8, metadata !136, metadata !DIExpression())
call void @llvm.dbg.declare(metadata i64* %9, metadata !137, metadata !DIExpression())
call void @llvm.dbg.declare(metadata i64* %13, metadata !139, metadata !DIExpression())

!130 = !DIFortranArrayType(baseType: !80, elements: !131)
!131 = !{ !132, !133 }
!132 = !DISubrange(lowerBound: !134, lowerBoundExpression: !DIExpression(DW_OP_deref), upperBound: !136, upperBoundExpression: !DIExpression(DW_OP_deref))
!133 = !DISubrange(lowerBound: !137, lowerBoundExpression: !DIExpression(DW_OP_deref), upperBound: !139, upperBoundExpression: !DIExpression(DW_OP_deref))
!134 = !DILocalVariable(scope: !2, file: !3, type: !9, flags: DIArtificial)
!136 = !DILocalVariable(scope: !2, file: !3, type: !9, flags: DIArtificial)
!137 = !DILocalVariable(scope: !2, file: !3, type: !9, flags: DIArtificial)
!139 = !DILocalVariable(scope: !2, file: !3, type: !9, flags: DIArtificial)

The DWARF generated for this is as follows.
DW_TAG_array_type:
DW_AT_name: array4
DW_AT_type: 4d08 ;TYPE(t)
DW_TAG_subrange_type:
DW_AT_type: int
DW_AT_lower_bound: 2 byte block: 91 78
DW_AT_upper_bound: 2 byte block: 91 70
DW_TAG_subrange_type:
DW_AT_type: int
DW_AT_lower_bound: 2 byte block: 91 68
DW_AT_upper_bound: 2 byte block: 91 60

2.2.5 Assumed rank arrays and coarrays
This changeset does not address DWARF 5 extensions to support assumed rank arrays or coarrays.

Diff Detail

Repository
rL LLVM

Event Timeline

schweitz created this revision.Nov 5 2018, 9:49 AM

I'm not a Fortran expert. The last Fortran I used was 77 and didn't have all these newfangled features. :) Some questions come up on our end so I'll relay them.

Does this support handle non-contiguous arrays? Arrays can have "holes" in them. I believe the dope vectors ("array descriptors") hold that information but the description makes it sound like only upper and lower bounds are supported.

Fortran CHARACTER can have multi-byte characters so "byte size" is not sufficient. I don't know if any implementations support this, though.

As I commented in the dev email list, these "Fortran" features are actually useful for other languages. We use similar DWARF for Pascal "schema types", BASIC string types, etc. on OpenVMS Itanium and we'll want to leverage all of these for our x86 target. I suggested that the names be make more neutral or folded into the matching metadata tag. I'll give inline comments next.

JohnReagan added inline comments.Nov 5 2018, 1:12 PM
include/llvm/IR/DIBuilder.h
491

So what's the difference here between createArrayType and createFortranArrayType? They both take subranges for subscripts, yes? The size would be a run-time value if any of the subranges have non-constant values but your createFortranArrayType handles that case and can be folded in.

582

Why not

getOrCreateSubrangeWithCount(Metadata *Lo, Metadata *CountNode);
getOrCreateSubrangeWithUpper(Metadata *Lo, Metadata *Up);

or something like this? I can imagine where the Upper value is easier to describe than the Count. On VMS, our array descriptors have bounds information inside of them so the Lo/Up values are just "descriptor-base + offset". Trying to describe the count would involve generating code into a temp variable or using a more complex DWARF expression to fetch the Lo and Up and doing the arithmetic that way.

And your comments don't describe the arguments so perhaps I'm missing something.

aprantl requested changes to this revision.Nov 5 2018, 1:59 PM

Thanks. I have a couple of high-level points before we can go into more specific details:

The LLVM IR debug info metadata is effectively DWARF with a different syntax. Why is a Fortran-specific array / bounds node necessary oruseful, when DWARF doesn't have one?

DISubrange is the wrong place to hold a DIExpression; this won't work with optimized code. Instead, there should be a DIFlagArtificial variable for the lower and upper bound that is described with a llvm.dbg.value / llvm.dbg.declare and the dbg.value/declare should reference the DIExpression. This way the expression can be updated when the code is transformed by the optimizer.

This revision now requires changes to proceed.Nov 5 2018, 1:59 PM

I'm not a Fortran expert. The last Fortran I used was 77 and didn't have all these newfangled features. :) Some questions come up on our end so I'll relay them.

Does this support handle non-contiguous arrays? Arrays can have "holes" in them. I believe the dope vectors ("array descriptors") hold that information but the description makes it sound like only upper and lower bounds are supported.

This patch does not handle strided accesses.

Fortran CHARACTER can have multi-byte characters so "byte size" is not sufficient. I don't know if any implementations support this, though.

We don't know of any implementations either.

As I commented in the dev email list, these "Fortran" features are actually useful for other languages. We use similar DWARF for Pascal "schema types", BASIC string types, etc. on OpenVMS Itanium and we'll want to leverage all of these for our x86 target. I suggested that the names be make more neutral or folded into the matching metadata tag. I'll give inline comments next.

It may be possible to merge the C-language metadata with the Fortran metadata. It just wasn't our objective. It has been more helpful to keep them distinct.

schweitz added inline comments.Nov 6 2018, 9:50 AM
include/llvm/IR/DIBuilder.h
491

We've been propagating changes to LLVM for Fortran compilers for many LLVM releases now off-stream. There are a number of reasons why it was easier to make a "higher level" break like this. If these changes can be successfully upstreamed, then many of those arguments become historical.

582

I think the information in the summary section shows some of the details concerning the properties our Fortran compilers need. Specifically, both a lower bound and an upper bound, either of which can be constant or computed from a dope vector or omitted iff it is an upper bound in the right-most rank.

Our compilers don't use count (extent) to imply an upper bound.

[Sorry for the delay, my Dad broke a hip and I had to travel to Tennessee.]

Our dope vectors (descriptors) have a lower/upper/stride per dimension to handle non-contiguous array. We haven't used multipliers since the VAX days. I think we need to set the byte_stride via some artificial variable (or just a field from the dope vector). Did I miss that?

Thank you for your patience regarding the protracted schedule here. We're presently reworking some of the support for Fortran debug with respect to type information.

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2019, 7:34 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript

Sorry for being late to the party... I'm probably repeating some other comments but wanted to get these thoughts down.
Existing DI should handle an array with compile-time-constant bounds, so would not need anything new.
Variable-bound arrays are available in a variety of languages, not excepting C (VLAs). Again I'd have thought that existing DI would be able to handle this case, although I'm less sure of that.
DWARF did add a bunch of stuff for the newer Fortran array features, I'll have to re-acquaint myself with those before commenting.

schweitz abandoned this revision.Apr 30 2020, 8:11 AM

This code has become very stale and we will have another go at this with flang development.