Page MenuHomePhabricator

[DebugInfo][NFC] add a new DIE type to represent label + offset
Needs ReviewPublic

Authored by shchenz on Thu, Jan 14, 4:12 AM.

Details

Reviewers
ikudrin
MaskRay
hubert.reinterpretcast
jasonliu
echristo
Group Reviewers
Restricted Project
Summary

Add one more DIE type DIELabelPlusOffset to represent label + offset.

In AsmPrinter class, we already have emitLabelPlusOffset interface, so this new added class is some kind of a wrapper.

This new DIE type will be used in following patches which are going to support dwarf on XCOFF.

Diff Detail

Event Timeline

shchenz created this revision.Thu, Jan 14, 4:12 AM
shchenz requested review of this revision.Thu, Jan 14, 4:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptThu, Jan 14, 4:12 AM
nemanjai added a subscriber: nemanjai.

Added Eric as the owner of debug info.

ikudrin added inline comments.Thu, Jan 14, 7:31 AM
llvm/include/llvm/CodeGen/AsmPrinter.h
583 ↗(On Diff #316616)

Can this change be separated and its own justification be added?

shchenz updated this revision to Diff 316841.Thu, Jan 14, 9:44 PM
shchenz marked an inline comment as done.

fix according to comments:
1: unchanged the Offset type
2: fix Lint warnings

llvm/include/llvm/CodeGen/AsmPrinter.h
583 ↗(On Diff #316616)

Hmm, it should be ok to set Offset as int64_t or uint64_t. Two users in emitLabelPlusOffset for Offset. One is EmitCOFFSecRel32(Label, Offset)(unsigned user) and the other is MCConstantExpr::create(Offset...)(signed user).

I need a negative offset for XCOFF dwarf, so even the offset is not implicitly converted as emitLabelPlusOffset parameter, it will be converted to int64_t in MCConstantExpr::create(Offset...). For now I left it as unchanged.

I originally changed it because I think it makes more sense to use a signed integer for an offset. Do you think should we change it to signed?

Can you provide a little more detail on the motivation here? Thanks!

-eric

Can you provide a little more detail on the motivation here? Thanks!

-eric

Hi Eric, thanks very much for looking into this.

Background of this patch:
1: we are going to support dwarf for XCOFF(running on AIX).
2: On AIX, the assembler does not need the assembly file contains the dwarf sections length info in the dwarf section header(if the dwarf section has header.) Instead, the assembler will insert the calculated length into dwarf sections header of the final object according to DWARF type. Namely AIX assembler will insert 4 bytes in each section header for DWARF32 and 12 bytes for DWARF64.

For now, in compiler, dwarf section refers to other section by emit other section label in the required place. For example, in .debug_info section, it needs a relocation to indicate where the .debug_line section is. So normally the output assembly file is like:

        .section        .debug_info
.Ldebug_info:
        .long   (length of .debug_info) # Length of Unit
        .short version_number
        .....
        .long   .Ldebug_line             # DW_AT_stmt_list  #      refer to .debug_line table

        .section        .debug_line
.Ldebug_line:
        .long  (length of .debug_line)
        .short version_number

For now, the reference to .debug_line always has no offset, because on all current supported arch, .debug_line contains the length info.

But for XCOFF, as assembler does not need the length and it will insert the length info at the start of .debug_line, so the reference to .debug_line is not accurate in the final object if we still use .Ldebug_line. We need to consider the assembler insertion bytes when we add the reference in compiler. The final assembly on XCOFF is like, taking DWARF32 as example:

       .section        .debug_info
.Ldebug_info:
        ###### .long   (length of .debug_info) # Length of Unit, this is not required any more
        .short version_number
        .....
        .long   .Ldebug_line-4            # DW_AT_stmt_list  ### refer to .debug_line table, but we need to refer to the previous 4 bytes as assembler will insert 4 bytes for the length field at the front of .debug_line.

        .section        .debug_line
.Ldebug_line:
        ######.long  (length of .debug_line)      #this is not required any more
        .short version_number

So here we need the DIE type label + offset and the offset is negative. We will use the new type in DwarfCompileUnit::initStmtList() where we generate attribute DW_AT_stmt_list for the root DIE.

Hope I have made myself clear. Thanks again for your review.

I suppose that you describe how .dwsect pseudo-op works. That is quite interesting, why they designed the feature to work that way. Is it recommended to reference debug sections through the label minus the length field size (4 or 12) or they provide some means to simplify the calculation? How an assembler output of their own compiler looks like?

shchenz added a comment.EditedFri, Jan 15, 5:37 AM

That is quite interesting, why they designed the feature to work that way. Is it recommended to reference debug sections through the label minus the length field size (4 or 12) or they provide some means to simplify the calculation?

This is a good question. But unfortunately I can not give you the answer. sorry about this. All I know is this design exists for a long time and we must follow this design, otherwise the assembler will emit error like invalid dwarf version number :(

How an assembler output of their own compiler looks like?

On aix, the default c/c++ compiler is XLC. With that compiler, there is no assembly mode. XLC only has object mode, so it does not involve assembler to generate a object file.
I checked the assembly result of gcc on AIX, it indeed follows this design. Reference debug sections through the label minus the length field size

2: On AIX, the assembler does not need the assembly file contains the dwarf sections length info in the dwarf section header(if the dwarf section has header.) Instead, the assembler will insert the calculated length into dwarf sections header of the final object according to DWARF type. Namely AIX assembler will insert 4 bytes in each section header for DWARF32 and 12 bytes for DWARF64.

Just to clarify this note: The assembler assumes 32-bit DWARF for XCOFF32 and 64-bit DWARF for XCOFF64.

That is quite interesting, why they designed the feature to work that way. Is it recommended to reference debug sections through the label minus the length field size (4 or 12) or they provide some means to simplify the calculation?

As @shchenz notes, GCC on AIX (the versions that actually account for the offset) uses this direct adjustment method. For better looking assembly, there is the possibility of generating a post-length-field label and using .set to associate the name normally used with the adjusted expression. I don't know what the consequences of trying that would be in terms of the object-generating path in LLVM.

MaskRay added a comment.EditedFri, Jan 15, 11:03 AM

I want to have a feeling how the aix assembly works, so I tried binutils-gdb:

% mkdir -p out/aix; cd out/aix
% ../../configure --target=powerpc64-ibm-aix
% make -j 30 all-gas
# gas/as-new

% gas/as-new =(echo '.section .debug_info')
/tmp/zshHplgOV: Assembler messages:
/tmp/zshHplgOV:1: Error: the XCOFF file format does not support arbitrary sections

Seems that the error is expected, as gas/config/tc-ppc.c says:

/* This function handles the .section pseudo-op.  This is mostly to
   give an error, since XCOFF only supports .text, .data and .bss, but
   we do permit the user to name the text or data section.  */

static void
ppc_named_section (int ignore ATTRIBUTE_UNUSED)
{
  char *user_name;
  const char *real_name;
  char c;
  symbolS *sym;

  c = get_symbol_name (&user_name);

  if (strcmp (user_name, ".text") == 0)
    real_name = ".text[PR]";
  else if (strcmp (user_name, ".data") == 0)
    real_name = ".data[RW]";
  else
    {
      as_bad (_("the XCOFF file format does not support arbitrary sections"));
      (void) restore_line_pointer (c);
      ignore_rest_of_line ();
      return;
    }

So how does your example work?

       .section        .debug_info
.Ldebug_info:
        ###### .long   (length of .debug_info) # Length of Unit, this is not required any more
        .short version_number
        .....
        .long   .Ldebug_line-4            # DW_AT_stmt_list  ### refer to .debug_line table, but we need to refer to the previous 4 bytes as assembler will insert 4 bytes for the length field at the front of .debug_line.

        .section        .debug_line
.Ldebug_line:
        ######.long  (length of .debug_line)      #this is not required any more
        .short version_number

On aix, the default c/c++ compiler is XLC. With that compiler, there is no assembly mode. XLC only has object mode, so it does not involve assembler to generate a object file.

To be clear, you said XLC does not support assembly mode, but you'll implement something in llvm/lib/MC, then can this limitation be removed?

powerpc64-ibm-aix gas supports this:

.long .Ldebug_info_end0 - .Ldebug_info_start0
.Ldebug_info_start0:
.short 4
.Ldebug_info_end0:

Finally, can you provide instructions building a cross compiling GCC for AIX? I want to try out a freely accessible compiler to get the larger picture of the debug info support on AIX.

So how does your example work?

It doesn't.

.section        .debug_info

That has to be

.dwsect 0x10000

On aix, the default c/c++ compiler is XLC. With that compiler, there is no assembly mode. XLC only has object mode, so it does not involve assembler to generate a object file.

To be clear, you said XLC does not support assembly mode, but you'll implement something in llvm/lib/MC, then can this limitation be removed?

I'm not sure where your question is coming from. All released versions of XL C/C++, including the version with Clang front-end components, uses an IBM proprietary code generating back-end that is not based on LLVM.

powerpc64-ibm-aix gas supports this:

.long .Ldebug_info_end0 - .Ldebug_info_start0
.Ldebug_info_start0:
.short 4
.Ldebug_info_end0:

Can you elaborate on how that snippet would end up placing the data in the appropriate XCOFF section?

Seems that the error is expected, as gas/config/tc-ppc.c says:

In case the code you mentioned is GPL-licensed, I would ask that you understand that there are developers within the LLVM community that would prefer to minimize their exposure to GPL-licensed code.

Finally, can you provide instructions building a cross compiling GCC for AIX? I want to try out a freely accessible compiler to get the larger picture of the debug info support on AIX.

I'm not sure that would give an accurate picture of debug info support on AIX. Since the XCOFF object format has not defined new DWARF sections from DWARF Version 5, GCC can generate DWARF 5 "assembly" that (afaik) no assembler would consume. Namely:

  • DWARF sections in AIX assembly use the .dwsect syntax.
  • DWARF sections were added with AIX 7.1 and GNU as has not been updated to support AIX 7 (https://gcc.gnu.org/install/specific.html#x-ibm-aix).
  • When using the .dwsect syntax for a section that is not currently defined for XCOFF, GCC uses a section name in place of a flag value.