This is an archive of the discontinued LLVM Phabricator instance.

Adding debug info to support Fortran (part 3)
ClosedPublic

Authored by schweitz on Nov 9 2018, 9:17 AM.

Details

Summary
  1. Fortran COMMON Block

COMMON blocks are a feature of Fortran that has no direct analog in C languages, but they are similar to data sections in assembly language programming. A COMMON block is a named area of memory that holds a collection of variables. Fortran subprograms may map the COMMON block memory area to their own, possibly distinct, non-empty list of variables. A Fortran COMMON block might look like the following example.

COMMON /ALPHA/ I, J

For this construct, the compiler generates a new scope-like DI construct (!DICommonBlock) into which variables (see I, J above) can be placed. As the common block implies a range of storage with global lifetime, the !DICommonBlock refers to a !DIGlobalVariable. The Fortran variable that comprise the COMMON block are also linked via metadata to offsets within the global variable that stands for the entire common block.

@alpha_ = common global %alphabytes_ zeroinitializer, align 64, !dbg !27, !dbg !30, !dbg !33
!14 = distinct !DISubprogram(…)
!20 = distinct !DICommonBlock(scope: !14, declaration: !25, name: "alpha")
!25 = distinct !DIGlobalVariable(scope: !20, name: "common alpha", type: !24)
!27 = !DIGlobalVariableExpression(var: !25, expr: !DIExpression())
!29 = distinct !DIGlobalVariable(scope: !20, name: "i", file: !3, type: !28)
!30 = !DIGlobalVariableExpression(var: !29, expr: !DIExpression())
!31 = distinct !DIGlobalVariable(scope: !20, name: "j", file: !3, type: !28)
!32 = !DIExpression(DW_OP_plus_uconst, 4)
!33 = !DIGlobalVariableExpression(var: !31, expr: !32)

The DWARF generated for this is as follows.

DW_TAG_common_block:
DW_AT_name: alpha
DW_AT_location: @alpha_+0
DW_TAG_variable:
DW_AT_name: common alpha
DW_AT_type: array of 8 bytes
DW_AT_location: @alpha_+0
DW_TAG_variable:
DW_AT_name: i
DW_AT_type: integer*4
DW_AT_location: @Alpha+0
DW_TAG_variable:
DW_AT_name: j
DW_AT_type: integer*4
DW_AT_location: @Alpha+4

Diff Detail

Repository
rL LLVM

Event Timeline

schweitz created this revision.Nov 9 2018, 9:17 AM

I don't understand the need for the "common alpha" global variable. For VMS Fortran on Itanium (not LLVM), we generate the following DWARF for

COMMON /ALPHA/ I,J

0x0000008f:     DW_TAG_common_block [5] *
                  DW_AT_name [DW_FORM_string]	("ALPHA")
                  DW_AT_location [DW_FORM_block1]	(<0x09> 03 00 00 00 00 00 00 00 00 )

0x000000a0:       DW_TAG_base_type [6]  
                    DW_AT_name [DW_FORM_string]	("integer*4")
                    DW_AT_encoding [DW_FORM_data1]	(0x05)
                    DW_AT_byte_size [DW_FORM_udata]	(4)

0x000000ad:       DW_TAG_variable [4]  
                    DW_AT_name [DW_FORM_string]	("I")
                    DW_AT_type [DW_FORM_ref4]	(cu + 0x00a0 => {0x000000a0})
                    DW_AT_location [DW_FORM_block1]	(<0x09> 03 00 00 00 00 00 00 00 00 )

0x000000be:       DW_TAG_variable [4]  
                    DW_AT_name [DW_FORM_string]	("J")
                    DW_AT_type [DW_FORM_ref4]	(cu + 0x00a0 => {0x000000a0})
                    DW_AT_location [DW_FORM_block1]	(<0x09> 03 00 00 00 00 00 00 00 00 )

0x000000cf:       NULL

0x000000d0:     DW_TAG_common_inclusion [7]  
                  DW_AT_common_reference [DW_FORM_ref4]	(cu + 0x008f => {0x0000008f})

with 3 DIRLSB64s relocations for the DW_AT_locations

[I don't have an Itanium obdump so forgive our platform-specific format]

                   Relocation Entry 3.
00000AE8 00000048    Sec. 21. offset affected:   0000000000000098                                       rela$pq_r_offset
                     Relocation Entry Info:      0000000700000027                                       rela$q_r_info
00000AF0 00000050      Relocation Type:                  00000027          DIR64LSB                     rela$l_r_type
00000AF4 00000054      Sec 12. Sym Index:                00000007       7.                              rela$l_r_sym
                       Symbol Bound to Section:          00000008       8. "ALPHA"
00000AF8 00000058    Relocation addend:          0000000000000000                                       rela$q_r_addend
                   Relocation Entry 4.
00000B00 00000060    Sec. 21. offset affected:   00000000000000B6                                       rela$pq_r_offset
                     Relocation Entry Info:      0000000700000027                                       rela$q_r_info
00000B08 00000068      Relocation Type:                  00000027          DIR64LSB                     rela$l_r_type
00000B0C 0000006C      Sec 12. Sym Index:                00000007       7.                              rela$l_r_sym
                       Symbol Bound to Section:          00000008       8. "ALPHA"
00000B10 00000070    Relocation addend:          0000000000000000                                       rela$q_r_addend
                   Relocation Entry 5.
00000B18 00000078    Sec. 21. offset affected:   00000000000000C7                                       rela$pq_r_offset
                     Relocation Entry Info:      0000000700000027                                       rela$q_r_info
00000B20 00000080      Relocation Type:                  00000027          DIR64LSB                     rela$l_r_type
00000B24 00000084      Sec 12. Sym Index:                00000007       7.                              rela$l_r_sym
                       Symbol Bound to Section:          00000008       8. "ALPHA"
00000B28 00000088    Relocation addend:          0000000000000004                                       rela$q_r_addend

This will also need assembler roundtrip tests in tests/Assembler.

lib/IR/LLVMContextImpl.h
798

Why does a common block have an alignment?

I don't understand the need for the "common alpha" global variable.

It's not strictly required to have it explicitly named. Our implementation allocates a COMMON block as an "untyped" block of memory and each subprogram can, of course, remap the storage units to its own list of variable names. The DW_TAG_common_block does require a DW_AT_location, and the !DIGlobalVariable fulfills that requirement.

schweitz added inline comments.Nov 9 2018, 1:20 PM
lib/IR/LLVMContextImpl.h
798

Good point. It seems like the alignment of the global storage would suffice.

dblaikie added inline comments.
include/llvm/IR/DebugInfoMetadata.h
2684

Probably skip this as it's the implicit default?

(I see this seems to have been defaulted all over the Metadata hierarchy - so far as I can tell, still unnecessary in all those places & no need to propagate this around further?)

I don't understand the need for the "common alpha" global variable.

It's not strictly required to have it explicitly named. Our implementation allocates a COMMON block as an "untyped" block of memory and each subprogram can, of course, remap the storage units to its own list of variable names. The DW_TAG_common_block does require a DW_AT_location, and the !DIGlobalVariable fulfills that requirement.

Ah, yes I see. I forgot that for us, we have a platform-specific ELF section flag for "overlay" sections which are common blocks. We don't need the extra hidden weak variable to be the intermediate allocator.

schweitz updated this revision to Diff 174456.Nov 16 2018, 2:42 PM

Remove alignment (not needed for DW_TAG_common_block).
Remove default dtor. (not needed)
Add both an llc test and a llvm-as | llvm-dis loop test.

I believe all the comments have been addressed. Is there anything else required on our end with this one?

I am curious how debug info for common blocks plays with LTO. I think we would not want different common-block descriptions to be de-duplicated just because the global name was the same.

I am curious how debug info for common blocks plays with LTO. I think we would not want different common-block descriptions to be de-duplicated just because the global name was the same.

Our existing compilers do not make use of LLVM's LTO. Is there a suggestion on how to correctly deal with LTO's de-duplication?

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2019, 11:11 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript

I am curious how debug info for common blocks plays with LTO. I think we would not want different common-block descriptions to be de-duplicated just because the global name was the same.

Our existing compilers do not make use of LLVM's LTO. Is there a suggestion on how to correctly deal with LTO's de-duplication?

You can simulate the relevant effect of LTO by producing two .ll files and running llvm-link on them. Alternatively you can also run llvm-lto/llvm-lto2 on them for a more complete simulation.

You can simulate the relevant effect of LTO by producing two .ll files and running llvm-link on them. Alternatively you can also run llvm-lto/llvm-lto2 on them for a more complete simulation.

We admit that we're not entirely sure what we're looking for exactly, and we are not familiar with llvm-lto. However, we conducted an experiment using llvm-lto as follows.

  1. We wrote a Fortran program using separate source files. 2 of these files contained subroutines that have COMMON /foo/ specifications. These COMMON blocks define a series of variables under different names. (For example, variable a1 and a2 map to the same memory location in the same /foo/ COMMON block but each is from a different SUBROUTINE scopes.)
  1. We ran llvm-lto on the resulting .bc files with -O3 and -O0.
  1. Debugging the resulting program, we see that the variables in one subroutine are present and can be queried. The variables in the second subroutine are present but have no location information. We see the same behavior regardless of the optimization level.

This is a different result than we obtain when we do not manually intervene with LTO. When compiling with our Fortran compiler and no LTO, we can set breakpoints in both of the aforementioned routines and all the expected COMMON variables are present, can be queried as to value, and have location information in the DWARF.

We are not clear if this is a problem versus working as intended, and if it is the former what, if anything, we ought to do about it.

schweitz updated this revision to Diff 192481.Mar 27 2019, 11:04 AM

Updated the diff ahead in time.

Just wondering if there are any more to-dos with this one. Thanks in advance.

I think @probinson had some questions about common blocks and LTO?

Mechanically, this patch LGTM.

Regarding your LTO-like experiments, that result is more or less what I thought would happen. In a normal compilation, each CU gets its own description of the types, and the fact that they are different in different CUs doesn't matter. Within each CU's context, the common-block description is complete and will be used correctly by the debugger.
In LTO, however, the IR linker wants to de-duplicate debug information, to avoid multiple copies of the same type in the final object file. I suspect that de-duplication (which IIRC is founded on the C++ "one definition rule") probably should not apply to Fortran, or at least not to common blocks.
This is not something you need to solve today, but I think after this lands, you should file a bug to the effect that LTO with Fortran can produce incorrect debug information, specifically for common blocks. That will at least keep track of this factoid somewhere that we can find it again.

So: One test comment, you'll need to file that bug, and I am embarrassed to say I haven't looked at the code part of the patch. But if @dblaikie or @aprantl are happy that's good enough for me.

test/DebugInfo/Generic/DICommonBlock.ll
13

Remove the 'target triple' line so this test will still work on bots that don't build the X86 target.

schweitz marked an inline comment as done.Apr 3 2019, 8:53 AM

Thanks, Paul. I can certainly file a bug after this lands for tracking purposes.

Process question: should this review be updated with the bug information/link?

I'll remove the line in the test ...

test/DebugInfo/Generic/DICommonBlock.ll
13

Thanks for catching that, Paul. I'll remove that line.

schweitz updated this revision to Diff 193530.Apr 3 2019, 9:56 AM

Removes the target triple from the test.

Thanks, Paul. I can certainly file a bug after this lands for tracking purposes.

Process question: should this review be updated with the bug information/link?

Yes, please post the url for the bug after you file it. You don't need to mention it in the commit log for the patch though.

aprantl accepted this revision.Apr 3 2019, 4:08 PM
aprantl added inline comments.
test/DebugInfo/Generic/DICommonBlock.ll
3

; RUN: %llc_dwarf -O0 -filetype=obj %s -o - | \
; RUN: llvm-dwarfdump -v -debug-info

is the verbose dump (-v) needed?

This revision is now accepted and ready to land.Apr 3 2019, 4:08 PM

Thanks for accepting the changes, Adrian.

I just tried the test without the -v option, and it passed. It appears to not be required.

As a reminder, I don't have permissions to land changes myself.

The change to the LLParser broke two tests:

Failing Tests (2):
    LLVM :: Assembler/invalid-diglobalvariable-empty-name.ll
    LLVM :: Assembler/invalid-diglobalvariable-missing-name.ll

  Expected Passes    : 22549
  Expected Failures  : 66
  Unsupported Tests  : 7854
  Unexpected Failures: 2

Can you please take a look?

aprantl requested changes to this revision.Apr 4 2019, 6:08 PM
This revision now requires changes to proceed.Apr 4 2019, 6:08 PM

Right, those failures make sense to me.

It seems okay for our purposes here to tweak this patch slightly to not depend on support for anonymous global variables.

I'll do that and post a new patch.

schweitz updated this revision to Diff 193976.Apr 5 2019, 3:07 PM

A couple small changes to correct the issues.

  • Testing: 1 tests, single process --

PASS: LLVM :: Assembler/invalid-diglobalvariable-empty-name.ll (1 of 1)
Testing Time: 0.18s

Expected Passes    : 1
  • Testing: 1 tests, single process --

PASS: LLVM :: Assembler/invalid-diglobalvariable-missing-name.ll (1 of 1)
Testing Time: 0.05s

Expected Passes    : 1

I just tried the test without the -v option, and it passed. It appears to not be required.

I also took the -v out of this latest patch set.

aprantl accepted this revision.Apr 8 2019, 12:11 PM
This revision is now accepted and ready to land.Apr 8 2019, 12:11 PM
This revision was automatically updated to reflect the committed changes.

Thanks, Paul. I can certainly file a bug after this lands for tracking purposes.

Process question: should this review be updated with the bug information/link?

Yes, please post the url for the bug after you file it. You don't need to mention it in the commit log for the patch though.

@schweitz Please remember to file this bug. If you have, please post the pointer here.