This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/AsmPrinter/
-
CodeGen/
-
AsmPrinter/
-
DwarfUnit.cpp
-
test/DebugInfo/X86/
-
DebugInfo/
-
X86/
-
fortran-array-index-type.ll

Differential D122584

[DebugInfo] Use DW_ATE_signed encoding when creating a Fortran array index type.
ClosedPublic

Authored by cchen15 on Mar 28 2022, 6:58 AM.

Download Raw Diff

Details

Reviewers

aprantl
jdoerfert

Summary

An array index type is currently encoded as DW_ATE_unsigned regardless of the source language. That doesn't work for Fortran as it allows a negative array index. This change corrects that.

Diff Detail

Unit TestsFailed

	Time	Test
	60,030 ms	x64 debian > MLIR.Examples/standalone::test.toy

Event Timeline

cchen15 created this revision.Mar 28 2022, 6:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 6:58 AM

Herald added subscribers: ormris, arphaman, hiraditya. · View Herald Transcript

cchen15 requested review of this revision.Mar 28 2022, 6:58 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptMar 28 2022, 6:58 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, sstefan1. · View Herald Transcript

I'm not sure this would work in LTO when a Fortran function is inlined into a C function. I think it would be more general to add an optional base type (or encoding) to DISubrange.

This revision now requires changes to proceed.Mar 28 2022, 9:48 AM

aprantl added a subscriber: dblaikie.Mar 28 2022, 9:48 AM

In D122584#3411769, @aprantl wrote:

I'm not sure this would work in LTO when a Fortran function is inlined into a C function. I think it would be more general to add an optional base type (or encoding) to DISubrange.

I /think/ this would still work - types' scope chains would lead back to the fortran CU as their root & so that'd be correct? Worth an experiment/validation though (but may not need a committed test case covering this).

Harbormaster completed remote builds in B156554: Diff 418573.Mar 29 2022, 11:20 AM

Thank you @aprantl and @dblaikie for the feedback. I will do some experiments with LTO and provide an update.

For my experiment, I use the two files in the attachment: test-c.cpp is the main program. It calls the Fortran function DegCtoF in test.f90.

I follow the test strategy in test/DebugInfo/Generic/cross-cu-inlining.ll, and compile the program files this way:

$ clang++ -g -c -emit-llvm test-c.cpp
$ fortran-comp -g -c -emit-llvm test.f90

Since in test.bc the Fortran subroutine has the attribute noinline and optnone, I manually modify the .bc to replace noinline with alwaysinline and remove optnone. Then

$ llvm-link test-c.bc test.bc -o com.bc
$ opt -inline com.bc -o com-opt.bc
$ llc --filetype=obj com-opt.bc
$ clang++ -g -o com-opt.exe com-opt.o

I verify with gdb that the Fortran function is inlined and debugging works as expected.

gdb) b 23
warning: Could not recognize version of Intel Compiler in: "Intel(R) Fortran 22.0-1478"
Breakpoint 1 at 0x401179: file test-c.cpp, line 23.
(gdb) r
Starting program: /localdisk2/cchen15/examples/tests/signed-encoding/lto/new/com-opt.exe 
C/C++ and Fortran together!

Breakpoint 1, main (argc=1, argv=0x7fffffff8c28) at test-c.cpp:23
23	    DegCtoF(DegreesC, DegreesF, &N);
(gdb) x/20i $pc
=> 0x401179 <main(int, char**)+73>:	lea    -0x30(%rbp),%rax
   0x40117d <main(int, char**)+77>:	lea    -0x60(%rbp),%rcx
   0x401181 <main(int, char**)+81>:	mov    %rax,-0x48(%rbp)
   0x401185 <main(int, char**)+85>:	mov    -0x48(%rbp),%rax
   0x401189 <main(int, char**)+89>:	mov    %rcx,-0x40(%rbp)
   0x40118d <main(int, char**)+93>:	mov    -0x40(%rbp),%rcx
   0x401191 <main(int, char**)+97>:	lea    -0x14(%rbp),%rdx
   0x401195 <main(int, char**)+101>:	mov    %rdx,-0x38(%rbp)
   0x401199 <main(int, char**)+105>:	mov    -0x38(%rbp),%rdx
   0x40119d <main(int, char**)+109>:	mov    (%rdx),%esi
   0x40119f <main(int, char**)+111>:	mov    %esi,-0x10(%rbp)
   0x4011a2 <main(int, char**)+114>:	movslq -0x10(%rbp),%rsi
   0x4011a6 <main(int, char**)+118>:	mov    %rsi,-0x78(%rbp)
   0x4011aa <main(int, char**)+122>:	movslq -0x10(%rbp),%rsi
   0x4011ae <main(int, char**)+126>:	mov    %rsi,-0x70(%rbp)
   0x4011b2 <main(int, char**)+130>:	mov    (%rdx),%edx
   0x4011b4 <main(int, char**)+132>:	mov    %edx,-0xc(%rbp)
   0x4011b7 <main(int, char**)+135>:	movl   $0x1,-0x8(%rbp)
   0x4011be <main(int, char**)+142>:	cmpl   $0x1,-0xc(%rbp)
   0x4011c2 <main(int, char**)+146>:	
    jl     0x401207 <main(int, char**)+215>
(gdb) s
DegCtoF (degc=..., degf=..., n=2) at test.f90:14
14	subroutine DegCtoF(degC, degF, n)&
(gdb) n
24	    do i = 1, n
(gdb) p/x $pc
$1 = 0x4011b2

Last but not least, I do a dwarfdump on com-opt.exe and the signed encoding for the Fortran array index type is preserved:

0x00000166:     DW_TAG_formal_parameter
                  DW_AT_name	("degc")
                  DW_AT_decl_file	("/iusers/cchen15/examples/tests/signed-encoding/lto/new/test.f90")
                  DW_AT_decl_line	(14)
                  DW_AT_type	(0x000001ac "REAL*8[]")
...
0x000001ac:   DW_TAG_array_type
                DW_AT_type	(0x000001b7 "REAL*8")

0x000001b1:     DW_TAG_subrange_type
                  DW_AT_type	(0x000001be "__ARRAY_SIZE_TYPE__")

0x000001b6:     NULL

0x000001b7:   DW_TAG_base_type
                DW_AT_name	("REAL*8")
                DW_AT_encoding	(DW_ATE_float)
                DW_AT_byte_size	(0x08)

0x000001be:   DW_TAG_base_type
                DW_AT_name	("__ARRAY_SIZE_TYPE__")
                DW_AT_byte_size	(0x08)
                DW_AT_encoding	(DW_ATE_signed)

The complete dwarfdump is in com-opt.exe.ddump.

test-c.cpp554 BDownload

com-opt.exe.ddump10 KBDownload

test.f90590 BDownload

Thanks for showing me a counterexample! I still don't think it's great if AsmPrinter needs to know language-specific details so I would still prefer to have this decision made in the fronted and passing to the backend in DISubrange. @dblaikie what do you think?

IMHO, it's a bit excessive to go through DISubrange for this simple language-specific attribute; plus, doing that would require a DIBuilder API change. DwarfUnit::getDefaultLowerBound probably went through a similar discussion and it's currently using getLanguage. Perhaps these can all be overhauled based on your suggestion when support is needed for some language that allows non-integer array index type, and we go forward with this simple change for now?

Interesting. I wasn't aware of the prior art in DwarfUnit::getLowerBound(). What do you think about adding a getArrayIndexEncoding(language) function to llvm/BinaryFormat/Dwarf.h? This would group the per-language defaults in the DWARF library instead of the backend. Otherwise we could land this patch as is, too.

@aprantl: Thanks for the feedback. I have updated the patch as you suggested. Please review.

Harbormaster completed remote builds in B158212: Diff 420838.Apr 6 2022, 8:28 AM

In D122584#3423262, @aprantl wrote:

Thanks for showing me a counterexample! I still don't think it's great if AsmPrinter needs to know language-specific details so I would still prefer to have this decision made in the fronted and passing to the backend in DISubrange. @dblaikie what do you think?

Hey, sorry for the delay. I totally get where you're coming from - might be worth double checking the discussion from the previous instance of this/prior art to see if this is consistent or not, whether there's a breaking point where we go back and overhaul a bunch of this stuff - but yeah, it's probably OK-ish?

@dblaikie: A quick git blame shows that the original code for getDefaultLowerBound was added to DwarfCompileUnit.cpp in the following commit in 2012:

commit 28fe9e7a3675795aff7fb0f22b95ffe0c682b5de
Author: Bill Wendling <isanbard@gmail.com>
Date: Thu Dec 6 07:38:10 2012 +0000

Handle non-default array bounds.

Some languages, e.g. Ada and Pascal, allow you to specify that the array bounds
are different from the default (1 in these cases). If we have a lower bound
that's non-default, then we emit the lower bound. We also calculate the correct
upper bound in those cases.

llvm-svn: 169484

In Nov/Dec 2013, this piece of code was migrated to DwarfUnit.cpp.

Is there a way to look at the discussion in 'llvm-svn: 169484'?

In D122584#3433987, @cchen15 wrote:
@dblaikie: A quick git blame shows that the original code for getDefaultLowerBound was added to DwarfCompileUnit.cpp in the following commit in 2012:

commit 28fe9e7a3675795aff7fb0f22b95ffe0c682b5de
Author: Bill Wendling <isanbard@gmail.com>
Date: Thu Dec 6 07:38:10 2012 +0000
Handle non-default array bounds.

Some languages, e.g. Ada and Pascal, allow you to specify that the array bounds
are different from the default (1 in these cases). If we have a lower bound
that's non-default, then we emit the lower bound. We also calculate the correct
upper bound in those cases.

llvm-svn: 169484
In Nov/Dec 2013, this piece of code was migrated to DwarfUnit.cpp.

Is there a way to look at the discussion in 'llvm-svn: 169484'?

Oh, this goes back in time a bit of a ways indeed (thanks for the archaeology) - so there was probably no pre-commit review, and the post-commit review doesn't seem to have discussed this aspect of the design, just some other naming details: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20121203/thread.html#158001

Thanks!

This revision is now accepted and ready to land.Apr 6 2022, 3:52 PM

Closed by commit rGc226a5c4d7ea: [DebugInfo] Use DW_ATE_signed encoding when creating a Fortran.

In D122584#3434184, @dblaikie wrote:

Oh, this goes back in time a bit of a ways indeed (thanks for the archaeology) - so there was probably no pre-commit review, and the post-commit review doesn't seem to have discussed this aspect of the design, just some other naming details: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20121203/thread.html#158001

FWIW: My memory is that he first committed it to use zero always, then I pointed out the default lower bound was language-dependent. This is specified in DWARF itself.

Whether the bound should be signed or unsigned.... I expect there are other languages that allow negative bounds, I feel pretty sure I've used PL/I code that did that. If I'd noticed this review earlier I might have voted for making the frontend do something, but I'm okay with it being a language-code based decision as done here.

Pascal allows for negative bounds as well.

ARRAY [-12..-5] OF INTEGER

for example

JohnReagan added inline comments.Apr 7 2022, 6:19 AM

llvm/include/llvm/BinaryFormat/Dwarf.h
324 ↗	(On Diff #420838)	I don't like picking the type encoding based on the language. Pascal also allows negative array bounds. The encoding should be derived from the type specified or from some additional attribute set by the frontend. I'll want this for my Pascal (and Fortran) compilers.

cchen15 added inline comments.Apr 7 2022, 7:44 AM

llvm/include/llvm/BinaryFormat/Dwarf.h
324 ↗	(On Diff #420838)	Correctly describing a Pascal's array index type provides a strong argument for adding an 'index type' field to DISubrange. The target language of this PR is Fortran, so my apologies for the narrow scope of this change to get what Fortran needs while keeping the index type encoding for other languages the same as before the change.

Correctly describing a Pascal's array index type provides a strong argument for adding an 'index type' field to DISubrange.

Since this was my original suggestion anyway I support this and would welcome any patches to implement this!

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

AsmPrinter/

DwarfUnit.cpp

3 lines

test/

DebugInfo/

X86/

fortran-array-index-type.ll

36 lines

Diff 418573

llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp

	Show First 20 Lines • Show All 1,431 Lines • ▼ Show 20 Lines
	DIE *DwarfUnit::getIndexTyDie() {			DIE *DwarfUnit::getIndexTyDie() {
	if (IndexTyDie)			if (IndexTyDie)
	return IndexTyDie;			return IndexTyDie;
	// Construct an integer type to use for indexes.			// Construct an integer type to use for indexes.
	IndexTyDie = &createAndAddDIE(dwarf::DW_TAG_base_type, getUnitDie());			IndexTyDie = &createAndAddDIE(dwarf::DW_TAG_base_type, getUnitDie());
	StringRef Name = "__ARRAY_SIZE_TYPE__";			StringRef Name = "__ARRAY_SIZE_TYPE__";
	addString(*IndexTyDie, dwarf::DW_AT_name, Name);			addString(*IndexTyDie, dwarf::DW_AT_name, Name);
	addUInt(*IndexTyDie, dwarf::DW_AT_byte_size, None, sizeof(int64_t));			addUInt(*IndexTyDie, dwarf::DW_AT_byte_size, None, sizeof(int64_t));
	addUInt(*IndexTyDie, dwarf::DW_AT_encoding, dwarf::DW_FORM_data1,			addUInt(*IndexTyDie, dwarf::DW_AT_encoding, dwarf::DW_FORM_data1,
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - ? dwarf::DW_ATE_signed : dwarf::DW_ATE_unsigned); + ? dwarf::DW_ATE_signed + : dwarf::DW_ATE_unsigned); Lint: Pre-merge checks: clang-format: please reformat the code ``` - ? dwarf::DW_ATE_signed : dwarf…
	dwarf::DW_ATE_unsigned);			dwarf::isFortran((dwarf::SourceLanguage)getLanguage())
				? dwarf::DW_ATE_signed : dwarf::DW_ATE_unsigned);
	DD->addAccelType(CUNode, Name, IndexTyDie, /Flags/ 0);			DD->addAccelType(CUNode, Name, IndexTyDie, /Flags/ 0);
	return IndexTyDie;			return IndexTyDie;
	}			}

	/// Returns true if the vector's size differs from the sum of sizes of elements			/// Returns true if the vector's size differs from the sum of sizes of elements
	/// the user specified. This can occur if the vector has been rounded up to			/// the user specified. This can occur if the vector has been rounded up to
	/// fit memory alignment constraints.			/// fit memory alignment constraints.
	static bool hasVectorBeenPadded(const DICompositeType *CTy) {			static bool hasVectorBeenPadded(const DICompositeType *CTy) {
	▲ Show 20 Lines • Show All 404 Lines • Show Last 20 Lines

llvm/test/DebugInfo/X86/fortran-array-index-type.ll

This file was added.

				; This test checks that the array index type has signed encoding when
				; the source language is Fortran.

				; RUN: llc -O0 -filetype=obj -o %t < %s
				; RUN: llvm-dwarfdump %t \| FileCheck %s

				; CHECK: DW_TAG_subrange_type
				; CHECK-NEXT: DW_AT_type ([[INDEX_TYPE:0x[0-9a-f]+]] "__ARRAY_SIZE_TYPE__")
				; CHECK-NEXT: DW_AT_lower_bound (-2)
				; CHECK-NEXT: DW_AT_upper_bound (2)

				; CHECK: [[INDEX_TYPE]]: DW_TAG_base_type
				; CHECK-NEXT: DW_AT_name ("__ARRAY_SIZE_TYPE__")
				; CHECK-NEXT: DW_AT_byte_size (0x08)
				; CHECK-NEXT: DW_AT_encoding (DW_ATE_signed)

				source_filename = "test/DebugInfo/X86/fortran-array-index-type.ll"
				target triple = "x86_64-unknown-linux-gnu"

				@"test_$ARRAY_1D" = internal global [5 x i32] zeroinitializer, align 16, !dbg !0

				!llvm.module.flags = !{!13, !14}
				!llvm.dbg.cu = !{!6}
				!omp_offload.info = !{}

				!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
				!1 = distinct !DIGlobalVariable(name: "array_1d", linkageName: "test_$ARRAY_1D", scope: !6, file: !3, line: 2, type: !9, isLocal: true, isDefinition: true)
				!3 = !DIFile(filename: "test.f90", directory: "/tests")
				!6 = distinct !DICompileUnit(language: DW_LANG_Fortran95, file: !3, producer: "Fortran Compiler", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, globals: !7, splitDebugInlining: false, nameTableKind: None)
				!7 = !{!0}
				!9 = !DICompositeType(tag: DW_TAG_array_type, baseType: !10, elements: !11)
				!10 = !DIBasicType(name: "INTEGER*4", size: 32, encoding: DW_ATE_signed)
				!11 = !{!12}
				!12 = !DISubrange(lowerBound: -2, upperBound: 2)
				!13 = !{i32 2, !"Debug Info Version", i32 3}
				!14 = !{i32 2, !"Dwarf Version", i32 4}