This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Use DW_ATE_signed encoding when creating a Fortran array index type.
ClosedPublic

Authored by cchen15 on Mar 28 2022, 6:58 AM.

Details

Reviewers
aprantl
jdoerfert
Summary

An array index type is currently encoded as DW_ATE_unsigned regardless of the source language. That doesn't work for Fortran as it allows a negative array index. This change corrects that.

Diff Detail

Unit TestsFailed

Event Timeline

cchen15 created this revision.Mar 28 2022, 6:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 6:58 AM
cchen15 requested review of this revision.Mar 28 2022, 6:58 AM
Herald added a project: Restricted Project. · View Herald Transcript
aprantl requested changes to this revision.Mar 28 2022, 9:48 AM

I'm not sure this would work in LTO when a Fortran function is inlined into a C function. I think it would be more general to add an optional base type (or encoding) to DISubrange.

This revision now requires changes to proceed.Mar 28 2022, 9:48 AM

I'm not sure this would work in LTO when a Fortran function is inlined into a C function. I think it would be more general to add an optional base type (or encoding) to DISubrange.

I /think/ this would still work - types' scope chains would lead back to the fortran CU as their root & so that'd be correct? Worth an experiment/validation though (but may not need a committed test case covering this).

Thank you @aprantl and @dblaikie for the feedback. I will do some experiments with LTO and provide an update.

For my experiment, I use the two files in the attachment: test-c.cpp is the main program. It calls the Fortran function DegCtoF in test.f90.

I follow the test strategy in test/DebugInfo/Generic/cross-cu-inlining.ll, and compile the program files this way:

$ clang++ -g -c -emit-llvm test-c.cpp
$ fortran-comp -g -c -emit-llvm test.f90

Since in test.bc the Fortran subroutine has the attribute noinline and optnone, I manually modify the .bc to replace noinline with alwaysinline and remove optnone. Then

$ llvm-link test-c.bc test.bc -o com.bc
$ opt -inline com.bc -o com-opt.bc
$ llc --filetype=obj com-opt.bc
$ clang++ -g -o com-opt.exe com-opt.o

I verify with gdb that the Fortran function is inlined and debugging works as expected.

gdb) b 23
warning: Could not recognize version of Intel Compiler in: "Intel(R) Fortran 22.0-1478"
Breakpoint 1 at 0x401179: file test-c.cpp, line 23.
(gdb) r
Starting program: /localdisk2/cchen15/examples/tests/signed-encoding/lto/new/com-opt.exe 
C/C++ and Fortran together!

Breakpoint 1, main (argc=1, argv=0x7fffffff8c28) at test-c.cpp:23
23	    DegCtoF(DegreesC, DegreesF, &N);
(gdb) x/20i $pc
=> 0x401179 <main(int, char**)+73>:	lea    -0x30(%rbp),%rax
   0x40117d <main(int, char**)+77>:	lea    -0x60(%rbp),%rcx
   0x401181 <main(int, char**)+81>:	mov    %rax,-0x48(%rbp)
   0x401185 <main(int, char**)+85>:	mov    -0x48(%rbp),%rax
   0x401189 <main(int, char**)+89>:	mov    %rcx,-0x40(%rbp)
   0x40118d <main(int, char**)+93>:	mov    -0x40(%rbp),%rcx
   0x401191 <main(int, char**)+97>:	lea    -0x14(%rbp),%rdx
   0x401195 <main(int, char**)+101>:	mov    %rdx,-0x38(%rbp)
   0x401199 <main(int, char**)+105>:	mov    -0x38(%rbp),%rdx
   0x40119d <main(int, char**)+109>:	mov    (%rdx),%esi
   0x40119f <main(int, char**)+111>:	mov    %esi,-0x10(%rbp)
   0x4011a2 <main(int, char**)+114>:	movslq -0x10(%rbp),%rsi
   0x4011a6 <main(int, char**)+118>:	mov    %rsi,-0x78(%rbp)
   0x4011aa <main(int, char**)+122>:	movslq -0x10(%rbp),%rsi
   0x4011ae <main(int, char**)+126>:	mov    %rsi,-0x70(%rbp)
   0x4011b2 <main(int, char**)+130>:	mov    (%rdx),%edx
   0x4011b4 <main(int, char**)+132>:	mov    %edx,-0xc(%rbp)
   0x4011b7 <main(int, char**)+135>:	movl   $0x1,-0x8(%rbp)
   0x4011be <main(int, char**)+142>:	cmpl   $0x1,-0xc(%rbp)
   0x4011c2 <main(int, char**)+146>:	
    jl     0x401207 <main(int, char**)+215>
(gdb) s
DegCtoF (degc=..., degf=..., n=2) at test.f90:14
14	subroutine DegCtoF(degC, degF, n)&
(gdb) n
24	    do i = 1, n
(gdb) p/x $pc
$1 = 0x4011b2

Last but not least, I do a dwarfdump on com-opt.exe and the signed encoding for the Fortran array index type is preserved:

0x00000166:     DW_TAG_formal_parameter
                  DW_AT_name	("degc")
                  DW_AT_decl_file	("/iusers/cchen15/examples/tests/signed-encoding/lto/new/test.f90")
                  DW_AT_decl_line	(14)
                  DW_AT_type	(0x000001ac "REAL*8[]")
...
0x000001ac:   DW_TAG_array_type
                DW_AT_type	(0x000001b7 "REAL*8")

0x000001b1:     DW_TAG_subrange_type
                  DW_AT_type	(0x000001be "__ARRAY_SIZE_TYPE__")

0x000001b6:     NULL

0x000001b7:   DW_TAG_base_type
                DW_AT_name	("REAL*8")
                DW_AT_encoding	(DW_ATE_float)
                DW_AT_byte_size	(0x08)

0x000001be:   DW_TAG_base_type
                DW_AT_name	("__ARRAY_SIZE_TYPE__")
                DW_AT_byte_size	(0x08)
                DW_AT_encoding	(DW_ATE_signed)

The complete dwarfdump is in com-opt.exe.ddump.

Thanks for showing me a counterexample! I still don't think it's great if AsmPrinter needs to know language-specific details so I would still prefer to have this decision made in the fronted and passing to the backend in DISubrange. @dblaikie what do you think?

IMHO, it's a bit excessive to go through DISubrange for this simple language-specific attribute; plus, doing that would require a DIBuilder API change. DwarfUnit::getDefaultLowerBound probably went through a similar discussion and it's currently using getLanguage. Perhaps these can all be overhauled based on your suggestion when support is needed for some language that allows non-integer array index type, and we go forward with this simple change for now?

Interesting. I wasn't aware of the prior art in DwarfUnit::getLowerBound(). What do you think about adding a getArrayIndexEncoding(language) function to llvm/BinaryFormat/Dwarf.h? This would group the per-language defaults in the DWARF library instead of the backend. Otherwise we could land this patch as is, too.

cchen15 updated this revision to Diff 420838.Apr 6 2022, 7:18 AM

@aprantl: Thanks for the feedback. I have updated the patch as you suggested. Please review.

Thanks for showing me a counterexample! I still don't think it's great if AsmPrinter needs to know language-specific details so I would still prefer to have this decision made in the fronted and passing to the backend in DISubrange. @dblaikie what do you think?

Hey, sorry for the delay. I totally get where you're coming from - might be worth double checking the discussion from the previous instance of this/prior art to see if this is consistent or not, whether there's a breaking point where we go back and overhaul a bunch of this stuff - but yeah, it's probably OK-ish?

@dblaikie: A quick git blame shows that the original code for getDefaultLowerBound was added to DwarfCompileUnit.cpp in the following commit in 2012:

commit 28fe9e7a3675795aff7fb0f22b95ffe0c682b5de
Author: Bill Wendling <isanbard@gmail.com>
Date: Thu Dec 6 07:38:10 2012 +0000

Handle non-default array bounds.

Some languages, e.g. Ada and Pascal, allow you to specify that the array bounds
are different from the default (1 in these cases). If we have a lower bound
that's non-default, then we emit the lower bound. We also calculate the correct
upper bound in those cases.

llvm-svn: 169484

In Nov/Dec 2013, this piece of code was migrated to DwarfUnit.cpp.

Is there a way to look at the discussion in 'llvm-svn: 169484'?

@dblaikie: A quick git blame shows that the original code for getDefaultLowerBound was added to DwarfCompileUnit.cpp in the following commit in 2012:

commit 28fe9e7a3675795aff7fb0f22b95ffe0c682b5de
Author: Bill Wendling <isanbard@gmail.com>
Date: Thu Dec 6 07:38:10 2012 +0000

Handle non-default array bounds.

Some languages, e.g. Ada and Pascal, allow you to specify that the array bounds
are different from the default (1 in these cases). If we have a lower bound
that's non-default, then we emit the lower bound. We also calculate the correct
upper bound in those cases.

llvm-svn: 169484

In Nov/Dec 2013, this piece of code was migrated to DwarfUnit.cpp.

Is there a way to look at the discussion in 'llvm-svn: 169484'?

Oh, this goes back in time a bit of a ways indeed (thanks for the archaeology) - so there was probably no pre-commit review, and the post-commit review doesn't seem to have discussed this aspect of the design, just some other naming details: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20121203/thread.html#158001

aprantl accepted this revision.Apr 6 2022, 3:52 PM

Thanks!

This revision is now accepted and ready to land.Apr 6 2022, 3:52 PM
cchen15 closed this revision.Apr 7 2022, 4:54 AM

Closed by commit rGc226a5c4d7ea: [DebugInfo] Use DW_ATE_signed encoding when creating a Fortran.

Oh, this goes back in time a bit of a ways indeed (thanks for the archaeology) - so there was probably no pre-commit review, and the post-commit review doesn't seem to have discussed this aspect of the design, just some other naming details: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20121203/thread.html#158001

FWIW: My memory is that he first committed it to use zero always, then I pointed out the default lower bound was language-dependent. This is specified in DWARF itself.

Whether the bound should be signed or unsigned.... I expect there are other languages that allow negative bounds, I feel pretty sure I've used PL/I code that did that. If I'd noticed this review earlier I might have voted for making the frontend do something, but I'm okay with it being a language-code based decision as done here.

Pascal allows for negative bounds as well.

ARRAY [-12..-5] OF INTEGER

for example

JohnReagan added inline comments.Apr 7 2022, 6:19 AM
llvm/include/llvm/BinaryFormat/Dwarf.h
324 ↗(On Diff #420838)

I don't like picking the type encoding based on the language. Pascal also allows negative array bounds. The encoding should be derived from the type specified or from some additional attribute set by the frontend. I'll want this for my Pascal (and Fortran) compilers.

cchen15 added inline comments.Apr 7 2022, 7:44 AM
llvm/include/llvm/BinaryFormat/Dwarf.h
324 ↗(On Diff #420838)

Correctly describing a Pascal's array index type provides a strong argument for adding an 'index type' field to DISubrange.

The target language of this PR is Fortran, so my apologies for the narrow scope of this change to get what Fortran needs while keeping the index type encoding for other languages the same as before the change.

Correctly describing a Pascal's array index type provides a strong argument for adding an 'index type' field to DISubrange.

Since this was my original suggestion anyway I support this and would welcome any patches to implement this!