An array index type is currently encoded as DW_ATE_unsigned regardless of the source language. That doesn't work for Fortran as it allows a negative array index. This change corrects that.
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
60,030 ms | x64 debian > MLIR.Examples/standalone::test.toy |
Event Timeline
I'm not sure this would work in LTO when a Fortran function is inlined into a C function. I think it would be more general to add an optional base type (or encoding) to DISubrange.
I /think/ this would still work - types' scope chains would lead back to the fortran CU as their root & so that'd be correct? Worth an experiment/validation though (but may not need a committed test case covering this).
For my experiment, I use the two files in the attachment: test-c.cpp is the main program. It calls the Fortran function DegCtoF in test.f90.
I follow the test strategy in test/DebugInfo/Generic/cross-cu-inlining.ll, and compile the program files this way:
$ clang++ -g -c -emit-llvm test-c.cpp $ fortran-comp -g -c -emit-llvm test.f90
Since in test.bc the Fortran subroutine has the attribute noinline and optnone, I manually modify the .bc to replace noinline with alwaysinline and remove optnone. Then
$ llvm-link test-c.bc test.bc -o com.bc $ opt -inline com.bc -o com-opt.bc $ llc --filetype=obj com-opt.bc $ clang++ -g -o com-opt.exe com-opt.o
I verify with gdb that the Fortran function is inlined and debugging works as expected.
gdb) b 23 warning: Could not recognize version of Intel Compiler in: "Intel(R) Fortran 22.0-1478" Breakpoint 1 at 0x401179: file test-c.cpp, line 23. (gdb) r Starting program: /localdisk2/cchen15/examples/tests/signed-encoding/lto/new/com-opt.exe C/C++ and Fortran together! Breakpoint 1, main (argc=1, argv=0x7fffffff8c28) at test-c.cpp:23 23 DegCtoF(DegreesC, DegreesF, &N); (gdb) x/20i $pc => 0x401179 <main(int, char**)+73>: lea -0x30(%rbp),%rax 0x40117d <main(int, char**)+77>: lea -0x60(%rbp),%rcx 0x401181 <main(int, char**)+81>: mov %rax,-0x48(%rbp) 0x401185 <main(int, char**)+85>: mov -0x48(%rbp),%rax 0x401189 <main(int, char**)+89>: mov %rcx,-0x40(%rbp) 0x40118d <main(int, char**)+93>: mov -0x40(%rbp),%rcx 0x401191 <main(int, char**)+97>: lea -0x14(%rbp),%rdx 0x401195 <main(int, char**)+101>: mov %rdx,-0x38(%rbp) 0x401199 <main(int, char**)+105>: mov -0x38(%rbp),%rdx 0x40119d <main(int, char**)+109>: mov (%rdx),%esi 0x40119f <main(int, char**)+111>: mov %esi,-0x10(%rbp) 0x4011a2 <main(int, char**)+114>: movslq -0x10(%rbp),%rsi 0x4011a6 <main(int, char**)+118>: mov %rsi,-0x78(%rbp) 0x4011aa <main(int, char**)+122>: movslq -0x10(%rbp),%rsi 0x4011ae <main(int, char**)+126>: mov %rsi,-0x70(%rbp) 0x4011b2 <main(int, char**)+130>: mov (%rdx),%edx 0x4011b4 <main(int, char**)+132>: mov %edx,-0xc(%rbp) 0x4011b7 <main(int, char**)+135>: movl $0x1,-0x8(%rbp) 0x4011be <main(int, char**)+142>: cmpl $0x1,-0xc(%rbp) 0x4011c2 <main(int, char**)+146>: jl 0x401207 <main(int, char**)+215> (gdb) s DegCtoF (degc=..., degf=..., n=2) at test.f90:14 14 subroutine DegCtoF(degC, degF, n)& (gdb) n 24 do i = 1, n (gdb) p/x $pc $1 = 0x4011b2
Last but not least, I do a dwarfdump on com-opt.exe and the signed encoding for the Fortran array index type is preserved:
0x00000166: DW_TAG_formal_parameter DW_AT_name ("degc") DW_AT_decl_file ("/iusers/cchen15/examples/tests/signed-encoding/lto/new/test.f90") DW_AT_decl_line (14) DW_AT_type (0x000001ac "REAL*8[]") ... 0x000001ac: DW_TAG_array_type DW_AT_type (0x000001b7 "REAL*8") 0x000001b1: DW_TAG_subrange_type DW_AT_type (0x000001be "__ARRAY_SIZE_TYPE__") 0x000001b6: NULL 0x000001b7: DW_TAG_base_type DW_AT_name ("REAL*8") DW_AT_encoding (DW_ATE_float) DW_AT_byte_size (0x08) 0x000001be: DW_TAG_base_type DW_AT_name ("__ARRAY_SIZE_TYPE__") DW_AT_byte_size (0x08) DW_AT_encoding (DW_ATE_signed)
The complete dwarfdump is in com-opt.exe.ddump.
Thanks for showing me a counterexample! I still don't think it's great if AsmPrinter needs to know language-specific details so I would still prefer to have this decision made in the fronted and passing to the backend in DISubrange. @dblaikie what do you think?
IMHO, it's a bit excessive to go through DISubrange for this simple language-specific attribute; plus, doing that would require a DIBuilder API change. DwarfUnit::getDefaultLowerBound probably went through a similar discussion and it's currently using getLanguage. Perhaps these can all be overhauled based on your suggestion when support is needed for some language that allows non-integer array index type, and we go forward with this simple change for now?
Interesting. I wasn't aware of the prior art in DwarfUnit::getLowerBound(). What do you think about adding a getArrayIndexEncoding(language) function to llvm/BinaryFormat/Dwarf.h? This would group the per-language defaults in the DWARF library instead of the backend. Otherwise we could land this patch as is, too.
@aprantl: Thanks for the feedback. I have updated the patch as you suggested. Please review.
Hey, sorry for the delay. I totally get where you're coming from - might be worth double checking the discussion from the previous instance of this/prior art to see if this is consistent or not, whether there's a breaking point where we go back and overhaul a bunch of this stuff - but yeah, it's probably OK-ish?
@dblaikie: A quick git blame shows that the original code for getDefaultLowerBound was added to DwarfCompileUnit.cpp in the following commit in 2012:
commit 28fe9e7a3675795aff7fb0f22b95ffe0c682b5de
Author: Bill Wendling <isanbard@gmail.com>
Date: Thu Dec 6 07:38:10 2012 +0000
Handle non-default array bounds. Some languages, e.g. Ada and Pascal, allow you to specify that the array bounds are different from the default (1 in these cases). If we have a lower bound that's non-default, then we emit the lower bound. We also calculate the correct upper bound in those cases. llvm-svn: 169484
In Nov/Dec 2013, this piece of code was migrated to DwarfUnit.cpp.
Is there a way to look at the discussion in 'llvm-svn: 169484'?
Oh, this goes back in time a bit of a ways indeed (thanks for the archaeology) - so there was probably no pre-commit review, and the post-commit review doesn't seem to have discussed this aspect of the design, just some other naming details: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20121203/thread.html#158001
Closed by commit rGc226a5c4d7ea: [DebugInfo] Use DW_ATE_signed encoding when creating a Fortran.
FWIW: My memory is that he first committed it to use zero always, then I pointed out the default lower bound was language-dependent. This is specified in DWARF itself.
Whether the bound should be signed or unsigned.... I expect there are other languages that allow negative bounds, I feel pretty sure I've used PL/I code that did that. If I'd noticed this review earlier I might have voted for making the frontend do something, but I'm okay with it being a language-code based decision as done here.
llvm/include/llvm/BinaryFormat/Dwarf.h | ||
---|---|---|
324 ↗ | (On Diff #420838) | I don't like picking the type encoding based on the language. Pascal also allows negative array bounds. The encoding should be derived from the type specified or from some additional attribute set by the frontend. I'll want this for my Pascal (and Fortran) compilers. |
llvm/include/llvm/BinaryFormat/Dwarf.h | ||
---|---|---|
324 ↗ | (On Diff #420838) | Correctly describing a Pascal's array index type provides a strong argument for adding an 'index type' field to DISubrange. The target language of this PR is Fortran, so my apologies for the narrow scope of this change to get what Fortran needs while keeping the index type encoding for other languages the same as before the change. |
Correctly describing a Pascal's array index type provides a strong argument for adding an 'index type' field to DISubrange.
Since this was my original suggestion anyway I support this and would welcome any patches to implement this!
clang-format: please reformat the code