This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][NFC] add a new DIE type to represent label + offset
AbandonedPublic

Authored by shchenz on Jan 14 2021, 4:12 AM.

Details

Reviewers
ikudrin
MaskRay
hubert.reinterpretcast
jasonliu
echristo
Group Reviewers
Restricted Project
Summary

Add one more DIE type DIELabelPlusOffset to represent label + offset.

In AsmPrinter class, we already have emitLabelPlusOffset interface, so this new added class is some kind of a wrapper.

This new DIE type will be used in following patches which are going to support dwarf on XCOFF.

Diff Detail

Event Timeline

shchenz created this revision.Jan 14 2021, 4:12 AM
shchenz requested review of this revision.Jan 14 2021, 4:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2021, 4:12 AM
nemanjai added a subscriber: nemanjai.

Added Eric as the owner of debug info.

ikudrin added inline comments.Jan 14 2021, 7:31 AM
llvm/include/llvm/CodeGen/AsmPrinter.h
583 ↗(On Diff #316616)

Can this change be separated and its own justification be added?

shchenz updated this revision to Diff 316841.Jan 14 2021, 9:44 PM
shchenz marked an inline comment as done.

fix according to comments:
1: unchanged the Offset type
2: fix Lint warnings

llvm/include/llvm/CodeGen/AsmPrinter.h
583 ↗(On Diff #316616)

Hmm, it should be ok to set Offset as int64_t or uint64_t. Two users in emitLabelPlusOffset for Offset. One is EmitCOFFSecRel32(Label, Offset)(unsigned user) and the other is MCConstantExpr::create(Offset...)(signed user).

I need a negative offset for XCOFF dwarf, so even the offset is not implicitly converted as emitLabelPlusOffset parameter, it will be converted to int64_t in MCConstantExpr::create(Offset...). For now I left it as unchanged.

I originally changed it because I think it makes more sense to use a signed integer for an offset. Do you think should we change it to signed?

Can you provide a little more detail on the motivation here? Thanks!

-eric

Can you provide a little more detail on the motivation here? Thanks!

-eric

Hi Eric, thanks very much for looking into this.

Background of this patch:
1: we are going to support dwarf for XCOFF(running on AIX).
2: On AIX, the assembler does not need the assembly file contains the dwarf sections length info in the dwarf section header(if the dwarf section has header.) Instead, the assembler will insert the calculated length into dwarf sections header of the final object according to DWARF type. Namely AIX assembler will insert 4 bytes in each section header for DWARF32 and 12 bytes for DWARF64.

For now, in compiler, dwarf section refers to other section by emit other section label in the required place. For example, in .debug_info section, it needs a relocation to indicate where the .debug_line section is. So normally the output assembly file is like:

        .section        .debug_info
.Ldebug_info:
        .long   (length of .debug_info) # Length of Unit
        .short version_number
        .....
        .long   .Ldebug_line             # DW_AT_stmt_list  #      refer to .debug_line table

        .section        .debug_line
.Ldebug_line:
        .long  (length of .debug_line)
        .short version_number

For now, the reference to .debug_line always has no offset, because on all current supported arch, .debug_line contains the length info.

But for XCOFF, as assembler does not need the length and it will insert the length info at the start of .debug_line, so the reference to .debug_line is not accurate in the final object if we still use .Ldebug_line. We need to consider the assembler insertion bytes when we add the reference in compiler. The final assembly on XCOFF is like, taking DWARF32 as example:

       .section        .debug_info
.Ldebug_info:
        ###### .long   (length of .debug_info) # Length of Unit, this is not required any more
        .short version_number
        .....
        .long   .Ldebug_line-4            # DW_AT_stmt_list  ### refer to .debug_line table, but we need to refer to the previous 4 bytes as assembler will insert 4 bytes for the length field at the front of .debug_line.

        .section        .debug_line
.Ldebug_line:
        ######.long  (length of .debug_line)      #this is not required any more
        .short version_number

So here we need the DIE type label + offset and the offset is negative. We will use the new type in DwarfCompileUnit::initStmtList() where we generate attribute DW_AT_stmt_list for the root DIE.

Hope I have made myself clear. Thanks again for your review.

I suppose that you describe how .dwsect pseudo-op works. That is quite interesting, why they designed the feature to work that way. Is it recommended to reference debug sections through the label minus the length field size (4 or 12) or they provide some means to simplify the calculation? How an assembler output of their own compiler looks like?

shchenz added a comment.EditedJan 15 2021, 5:37 AM

That is quite interesting, why they designed the feature to work that way. Is it recommended to reference debug sections through the label minus the length field size (4 or 12) or they provide some means to simplify the calculation?

This is a good question. But unfortunately I can not give you the answer. sorry about this. All I know is this design exists for a long time and we must follow this design, otherwise the assembler will emit error like invalid dwarf version number :(

How an assembler output of their own compiler looks like?

On aix, the default c/c++ compiler is XLC. With that compiler, there is no assembly mode. XLC only has object mode, so it does not involve assembler to generate a object file.
I checked the assembly result of gcc on AIX, it indeed follows this design. Reference debug sections through the label minus the length field size

2: On AIX, the assembler does not need the assembly file contains the dwarf sections length info in the dwarf section header(if the dwarf section has header.) Instead, the assembler will insert the calculated length into dwarf sections header of the final object according to DWARF type. Namely AIX assembler will insert 4 bytes in each section header for DWARF32 and 12 bytes for DWARF64.

Just to clarify this note: The assembler assumes 32-bit DWARF for XCOFF32 and 64-bit DWARF for XCOFF64.

That is quite interesting, why they designed the feature to work that way. Is it recommended to reference debug sections through the label minus the length field size (4 or 12) or they provide some means to simplify the calculation?

As @shchenz notes, GCC on AIX (the versions that actually account for the offset) uses this direct adjustment method. For better looking assembly, there is the possibility of generating a post-length-field label and using .set to associate the name normally used with the adjusted expression. I don't know what the consequences of trying that would be in terms of the object-generating path in LLVM.

MaskRay added a comment.EditedJan 15 2021, 11:03 AM

I want to have a feeling how the aix assembly works, so I tried binutils-gdb:

% mkdir -p out/aix; cd out/aix
% ../../configure --target=powerpc64-ibm-aix
% make -j 30 all-gas
# gas/as-new

% gas/as-new =(echo '.section .debug_info')
/tmp/zshHplgOV: Assembler messages:
/tmp/zshHplgOV:1: Error: the XCOFF file format does not support arbitrary sections

Seems that the error is expected, as gas/config/tc-ppc.c says:

/* This function handles the .section pseudo-op.  This is mostly to
   give an error, since XCOFF only supports .text, .data and .bss, but
   we do permit the user to name the text or data section.  */

static void
ppc_named_section (int ignore ATTRIBUTE_UNUSED)
{
  char *user_name;
  const char *real_name;
  char c;
  symbolS *sym;

  c = get_symbol_name (&user_name);

  if (strcmp (user_name, ".text") == 0)
    real_name = ".text[PR]";
  else if (strcmp (user_name, ".data") == 0)
    real_name = ".data[RW]";
  else
    {
      as_bad (_("the XCOFF file format does not support arbitrary sections"));
      (void) restore_line_pointer (c);
      ignore_rest_of_line ();
      return;
    }

So how does your example work?

       .section        .debug_info
.Ldebug_info:
        ###### .long   (length of .debug_info) # Length of Unit, this is not required any more
        .short version_number
        .....
        .long   .Ldebug_line-4            # DW_AT_stmt_list  ### refer to .debug_line table, but we need to refer to the previous 4 bytes as assembler will insert 4 bytes for the length field at the front of .debug_line.

        .section        .debug_line
.Ldebug_line:
        ######.long  (length of .debug_line)      #this is not required any more
        .short version_number

On aix, the default c/c++ compiler is XLC. With that compiler, there is no assembly mode. XLC only has object mode, so it does not involve assembler to generate a object file.

To be clear, you said XLC does not support assembly mode, but you'll implement something in llvm/lib/MC, then can this limitation be removed?

powerpc64-ibm-aix gas supports this:

.long .Ldebug_info_end0 - .Ldebug_info_start0
.Ldebug_info_start0:
.short 4
.Ldebug_info_end0:

Finally, can you provide instructions building a cross compiling GCC for AIX? I want to try out a freely accessible compiler to get the larger picture of the debug info support on AIX.

So how does your example work?

It doesn't.

.section        .debug_info

That has to be

.dwsect 0x10000

On aix, the default c/c++ compiler is XLC. With that compiler, there is no assembly mode. XLC only has object mode, so it does not involve assembler to generate a object file.

To be clear, you said XLC does not support assembly mode, but you'll implement something in llvm/lib/MC, then can this limitation be removed?

I'm not sure where your question is coming from. All released versions of XL C/C++, including the version with Clang front-end components, uses an IBM proprietary code generating back-end that is not based on LLVM.

powerpc64-ibm-aix gas supports this:

.long .Ldebug_info_end0 - .Ldebug_info_start0
.Ldebug_info_start0:
.short 4
.Ldebug_info_end0:

Can you elaborate on how that snippet would end up placing the data in the appropriate XCOFF section?

Seems that the error is expected, as gas/config/tc-ppc.c says:

In case the code you mentioned is GPL-licensed, I would ask that you understand that there are developers within the LLVM community that would prefer to minimize their exposure to GPL-licensed code.

Finally, can you provide instructions building a cross compiling GCC for AIX? I want to try out a freely accessible compiler to get the larger picture of the debug info support on AIX.

I'm not sure that would give an accurate picture of debug info support on AIX. Since the XCOFF object format has not defined new DWARF sections from DWARF Version 5, GCC can generate DWARF 5 "assembly" that (afaik) no assembler would consume. Namely:

  • DWARF sections in AIX assembly use the .dwsect syntax.
  • DWARF sections were added with AIX 7.1 and GNU as has not been updated to support AIX 7 (https://gcc.gnu.org/install/specific.html#x-ibm-aix).
  • When using the .dwsect syntax for a section that is not currently defined for XCOFF, GCC uses a section name in place of a flag value.

Thanks for your review @MaskRay
Also thanks Hubert for your good comments. @hubert.reinterpretcast

So how does your example work?

Sorry for causing confusion. That is a pseudo example.

To be clear, you said XLC does not support assembly mode, but you'll implement something in llvm/lib/MC, then can this limitation be removed?

Yes, as Hubert said, IBM XLC/C++ backend is not based on llvm. It only has object mode.

Finally, can you provide instructions building a cross compiling GCC for AIX? I want to try out a freely accessible compiler to get the larger picture of the debug info support on AIX.

Ha, I am not allowed to touch gcc source code, so I can not provide the instructions. Supplement to Hubert's explanation: There is some documentation about XCOFF format, like https://www.ibm.com/support/knowledgecenter/ssw_aix_72/filesreference/XCOFF.html and there are some explanations for dwarf sections in it.

Gentle ping...
A user of this patch is in D95518, in file lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp and function DwarfCompileUnit::initStmtList()

See the assembly output in file test/DebugInfo/XCOFF/empty.ll,
1: 32 bit, line 147, ; ASM32-NEXT: .vbyte 4, L..line_table_start0-4 # DW_AT_stmt_list
2: 64 bit, line 347, ; ASM64-NEXT: .vbyte 8, L..line_table_start0-12 # DW_AT_stmt_list

The comment was originally sent to D94668, but it looks like it suits here better.

Is it possible to use a .set pseudo-op to define a symbol that can be referenced from other sections? Something like

  .dwsect 0x20000
L...dwline.tmp:
  .set L...dwline, L...dwline.tmp-12
L..line_table_start0.tmp:
  .set L..line_table_start0, L..line_table_start0.tmp-12
  .vbyte  2, 4
...

Is it possible to use a .set pseudo-op to define a symbol that can be referenced from other sections?

See https://reviews.llvm.org/D94670#2501150. It seems to work. Do you have a suggestion on how to make use of it from the MC layer here?

Is it possible to use a .set pseudo-op to define a symbol that can be referenced from other sections?

See https://reviews.llvm.org/D94670#2501150. It seems to work. Do you have a suggestion on how to make use of it from the MC layer here?

If it works, it can probably be used in MCContext::getXCOFFSection() to create Begin symbols for DWARF sections except for XCOFF::SSUBTYP_DWABREV. If I understand it right, that should eliminate the need for platform-specific calculating of offsets in the general code, as well as for the new DIELabelPlusOffset class, and for emitLabelPlusOffset() to handle negative offsets.

Is it possible to use a .set pseudo-op to define a symbol that can be referenced from other sections?

See https://reviews.llvm.org/D94670#2501150. It seems to work. Do you have a suggestion on how to make use of it from the MC layer here?

If it works, it can probably be used in MCContext::getXCOFFSection() to create Begin symbols for DWARF sections except for XCOFF::SSUBTYP_DWABREV. If I understand it right, that should eliminate the need for platform-specific calculating of offsets in the general code, as well as for the new DIELabelPlusOffset class, and for emitLabelPlusOffset() to handle negative offsets.

Thanks for your comments @ikudrin and also the good comments for D95518 (I am working on it right now).

For the begin symbol change in MCContext::getXCOFFSection(), since getXCOFFSection is not MCStreamer type aware, if I create a .dwline symbol as .set L...dwline, L...dwline.tmp-12 in MCContext, will it impact the object mode debug_line section generation? For object mode, I still want to generate the debug unit length in the debug line header.

It looks like my idea was not well-thought.

What do you think about extending the MCTargetStreamer interface and implementing some PPCTargetXCOFFAsmStreamer so that it can intercept changing sections, emitting labels, and so on? It could customize printing labels at the beginning of debug sections. It could also suppress printing the length field; to simplify that you could move emitDwarfUnitLength() from AsmPrinter to MCStreamer.

It looks like my idea was not well-thought.

What do you think about extending the MCTargetStreamer interface and implementing some PPCTargetXCOFFAsmStreamer so that it can intercept changing sections, emitting labels, and so on? It could customize printing labels at the beginning of debug sections. It could also suppress printing the length field; to simplify that you could move emitDwarfUnitLength() from AsmPrinter to MCStreamer.

Thanks for pointing out the direction. I will take a look.

shchenz planned changes to this revision.Feb 2 2021, 2:50 AM
shchenz abandoned this revision.Feb 4 2021, 8:20 PM

We will use .set directive as @ikudrin comments. So this patch is not needed any more.