This is an archive of the discontinued LLVM Phabricator instance.

Fix debug_abbrev emitter to only assign table id once
ClosedPublic

Authored by aadsm on Sep 4 2020, 10:03 PM.

Details

Summary

While generating yamls for my tests I noticed that the new debug_abbrev format (with multiple table support) was incorrectly assigning id's to the table because it was generating one per abbrev entry in the table. For instance, the first table would get id 4 when 5 abbrev entries existed in the table. By itself this is not a problem but the corresponding debug_info sections were still referencing id 0. This was introduced here: https://reviews.llvm.org/D83116.

Maybe a better fix is to actually correctly calculate the table id when emitting debug info? From a quick glance it seems to me the ID is just being calculated as the distance between the first DWARFAbbreviationDeclarationSet and the one the debug info entry points to, which means it's just its index and not the actual table id that was generated when emitting the debug_abbrev tables. With my fix I guess this is fine but on the diff that introduced this Pavel mentioned that he would like to have some sort of unique id between them but not necessarily +1 increasing, but for that to work we need to actually find the table ID, I guess by going directly to Y.DebugAbbrev but to honest I have no idea how to link the DWARFAbbreviationDeclarationSet and the Y.DebugAbbrev, so I just did this simple fix.

I also realized there's barely any tests for MachO so it might useful to invest on that if the tool is being reworked on.

Diff Detail

Event Timeline

aadsm created this revision.Sep 4 2020, 10:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2020, 10:03 PM
aadsm requested review of this revision.Sep 4 2020, 10:03 PM
aadsm added inline comments.Sep 4 2020, 10:05 PM
llvm/test/tools/obj2yaml/MachO/debug-abbrev.yaml
4 ↗(On Diff #290062)

This test could be better if I added debug_infos and made sure it was pointing to table 0 as well. It would be a pain for me to make all the sizes and offsets match though. Is there an easy way to achieve this? How were the other tests created?

I think the problem is caused by misplacing the line Y.DebugAbbrev.back().ID = AbbrevTableID++;. It should be placed in an outer loop.

The ID of abbrev tables should be their index and the AbbrevID of compilation units should be the distance between the associated abbrev table and the first abbrev table.

Thanks for the quick fix!

llvm/test/tools/obj2yaml/MachO/debug-abbrev.yaml
1 ↗(On Diff #290062)

I would suggest modifying the test case (c) by duplicating entries in abbrev tables and in compilation units in llvm/test/ObjectYAML/MachO/DWARF-debug_info.yaml rather than creating a standalone test case here.

I think the following YAML is a good candidate that replaces the one in test case (c). Please ping me if it doesn't work.

--- !mach-o
FileHeader:
  magic:      0xFEEDFACF
  cputype:    0x01000007
  cpusubtype: 0x00000003
  filetype:   0x0000000A
  ncmds:      1
  sizeofcmds: 232
  flags:      0x00000000
  reserved:   0x00000000
LoadCommands:
  - cmd:      LC_SEGMENT_64
    cmdsize:  232
    segname:  __DWARF
    vmaddr:   0x00
    vmsize:   0x00
    fileoff:  0x00
    filesize: 0x00
    maxprot:  0
    initprot: 0
    nsects:   2
    flags:    0
    Sections:
      - sectname:  __debug_abbrev
        segname:   __DWARF
        addr:      0x00
        size:      45
        offset:    528
        align:     0
        reloff:    0x00000000
        nreloc:    0
        flags:     0x00000000
        reserved1: 0x00000000
        reserved2: 0x00000000
        reserved3: 0x00000000
      - sectname:  __debug_info
        segname:   __DWARF
        addr:      0x00
        size:      90
        offset:    1070
        align:     0
        reloff:    0x00000000
        nreloc:    0
        flags:     0x00000000
        reserved1: 0x00000000
        reserved2: 0x00000000
        reserved3: 0x00000000
DWARF:
  debug_abbrev:
    - Table:
        - Code:     1
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_addr
        - Code:     2
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_addr
    - ID: 2
      Table:
        - Code:     1
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_data4
        - Code:     2
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_data4
    - ID: 1
      Table:
        - Code:     1
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_udata
        - Code:     2
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_udata
  debug_info:
    - Version:       4
      AbbrevTableID: 2
      Entries:
        - AbbrCode: 1
          Values:
            - Value: 0x1234
        - AbbrCode: 2
          Values:
            - Value: 0x1234
    - Version:       4
      AbbrevTableID: 2
      Entries:
        - AbbrCode: 1
          Values:
            - Value: 0x4321
        - AbbrCode: 2
          Values:
            - Value: 0x4321
    - Version:       4
      AbbrevTableID: 0
      Entries:
        - AbbrCode: 1
          Values:
            - Value: 0x5678
        - AbbrCode: 2
          Values:
            - Value: 0x5678
    - Version:       4
      AbbrevTableID: 1
      Entries:
        - AbbrCode: 1
          Values:
            - Value: 0x8765
        - AbbrCode: 2
          Values:
            - Value: 0x8765
4 ↗(On Diff #290062)

Yeah, creating Mach-O tests are painful. I create them by manually assigning offset and size as well. When handcrafting minimal Mach-O test cases, I usually focus on the following fields.

  • The ncmds and sizeofcmds in the FileHeader: For DWARF tests, ncmds should be 1 since there is only one load command (LC_SEGMENT_64). sizeofcmds should be greater than or equal to the size of load commands and less than or equal to the size of object file minus the size of mach_header_64 (actual_file_size - sizeof(mach_header_64)). In this case, sizeofcmds should be greater than or equal to 8 since we only have one load command and sizeof(load_command) is 8.
  • The cmdsize field of load commands: cmdsize must be a multiple of 8 in 64-bit object files or a multiple of 4 in 32-bit object files. When there are 1 section in the load command, 152 would cover most cases. When there are 2 sections in the load command, 232 would cover most cases. I didn't investigate it very much.
  • The size field of sections: Firstly, we can use yaml2elf create a ELF object file with DWARF sections and use readelf -S to check the debug sections' size. The section size in Mach-O is similar to the size in ELF.
  • The offset field of sections: I didn't investigate it as well. But 528 is usually fine for the first section.

I think we can improve it in the future 😂

llvm/tools/obj2yaml/dwarf2yaml.cpp
43–46

It looks that Y.DebugAbbrev.back().ID = AbbrevTableID++; should be put in the outer loop.

aadsm added inline comments.Sep 5 2020, 12:23 PM
llvm/test/tools/obj2yaml/MachO/debug-abbrev.yaml
4 ↗(On Diff #290062)

oh wow :D, sound good, I'll do that! if many more tests are needed might be worthwhile writing a little python script to automate the math.

llvm/tools/obj2yaml/dwarf2yaml.cpp
43–46

ah yeah of course, not sure why I added an if instead.

aadsm updated this revision to Diff 290111.Sep 5 2020, 4:14 PM

Moved the increment to the outer loop and used the already existing debug_info test instead of creating a new one.

Higuoxing added inline comments.Sep 5 2020, 6:48 PM
llvm/test/ObjectYAML/MachO/DWARF-debug_info.yaml
10

This change is not relevant and there are some test cases that have the similar format issue. Please do it in a separate patch.

855–856

We can simply remove this line since yaml2obj is able to calculate the value of debug_abbrev_offset for us.

llvm/tools/obj2yaml/dwarf2yaml.cpp
28–29

Do we really need this check here? I think it will prevent us from dumping an empty abbrev table.

debug_abbrev:
    - ID: 0
      Table:
        - Code:           1
          Tag:            DW_TAG_compile_unit
          Children:       DW_CHILDREN_yes
          Attributes:
            - Attribute:  DW_AT_producer
              Form:       DW_FORM_strp
    - ID: 1   ## This table will not be dumped.

Thanks @aadsm for the fix. On the note of writing Mach-O inputs, I completely agree that it's a pain, especially when compared with the ELF equivalents. The problem is that the Mach-O side of yaml2obj/obj2yaml has not yet been heavily worked on in a similar manner to much of @grimar's recent work on the ELF side. The obvious change would be to remove the need for lots of properties that could otherwise be derived automatically. If you or anybody else wants looking at that, I'm sure there would be lots of grateful Mach-O developers (I personally don't develop for Mach-O, so am not going to be investing time myself, but am happy to review).

llvm/test/ObjectYAML/MachO/DWARF-debug_info.yaml
682

Remove --dump-input from the FileCheck command.

I actually have rarely used the option myself - you can achieve the goal of getting FileCheck to dump the input by having no check prefixes that match the prefix being looked for, which is often a good starting point.

llvm/tools/obj2yaml/dwarf2yaml.cpp
28–29

Could we get a test case for this situation, please? Either separately or as part of this patch, I don't mind.

@Higuoxing I didn't realize the DWARF section was completely new until I read your GSOC project. Thanks a lot for this work otherwise it would have been impossible for me to create the tests I needed for my other diff!

llvm/test/ObjectYAML/MachO/DWARF-debug_info.yaml
682

Not sure if I understand what you mean with "having no check prefixes that match the prefix being looked for," but I'll remove this! I added for debugging to see where the mismatch was but forgot to remove it.

llvm/tools/obj2yaml/dwarf2yaml.cpp
28–29

That is true. I didn't realize there should always be an empty table at the end of the debug_abbrev. I'll update then.

aadsm updated this revision to Diff 291457.Sep 13 2020, 7:56 AM

Make sure there's an empty abbrev table as the last table.

Higuoxing accepted this revision.Sep 13 2020, 6:10 PM

LGTM with nit.

llvm/test/ObjectYAML/MachO/DWARF-debug_info.yaml
833–835

May I suggest explicitly adding an empty table here? Since the size of __debug_abbrev section is 0x20 (32 in decimal). If the actual content size doesn't match the size field, yaml2macho usually writes random bytes to fill the gap.

This revision is now accepted and ready to land.Sep 13 2020, 6:10 PM

@Higuoxing I didn't realize the DWARF section was completely new until I read your GSOC project. Thanks a lot for this work otherwise it would have been impossible for me to create the tests I needed for my other diff!

Yeah, it's my pleasure to do that. Please feel free to ping me if there's anything I can help (adding new features or fixing bugs).

jhenderson added inline comments.Sep 14 2020, 3:55 AM
llvm/test/ObjectYAML/MachO/DWARF-debug_info.yaml
682

Trivial example of what I meant:

# test.test
# RUN: echo foo | FileCheck %s
# eof

When at least run with lit -v, you get the output of echo foo printed by FileCheck as part of the error for not having any matching CHECK directives. Same applies if you e.g. do --check-prefix=BAZ and include no BAZ: lines etc. I often don't write my CHECK lines in my test until after running the test once, which allows me to then copy this output into the test for finer tuning.

855–856

Looks like this hasn't been addressed?

llvm/tools/obj2yaml/dwarf2yaml.cpp
28–29

I don't think there always needs to be an empty table at the end of debug_abbrev (IIRC, there is a null byte added at the end of the abbrev table, but that's not a table itself), but I think we want to allow a user to have one, if they want.

aadsm added inline comments.Sep 14 2020, 9:41 AM
llvm/test/ObjectYAML/MachO/DWARF-debug_info.yaml
682

oh I see, that won't help me because my use case is not building the test. It's once the test is up and running a modification makes it fail and I need to know which lines are failing.

855–856

yes, I missed this, will address before landing.

llvm/tools/obj2yaml/dwarf2yaml.cpp
28–29

oh, but that's exactly what will happen if the begin() == end() check is not there since this code optimistically creates a table before checking if it reached the last item in the iteration or not. That's the reason I added it, maybe I misunderstood @Higuoxing's comment.

jhenderson added inline comments.Sep 15 2020, 2:16 AM
llvm/tools/obj2yaml/dwarf2yaml.cpp
28–29

Section 7.5.3 of the DWARF v4 standard discusses the format of the .debug_abbrev section (I haven't checked, but I believe it's the same for other standards). In particular there is a statement that says "The abbreviations for a given compilation unit end with an entry consisting of a 0 byte for the abbreviation code." An empty abbrev table would consist of just that 0 byte, and nothing else. So a .debug_abbrev section with the content {0, 0, 0, 0} would actually represent 4 separate empty tables.

We want to allow a user the ability to add an empty table. The format would be:

debug_abbrev:
    - ID: 42 ## This should be an empty table, with explicit ID.
    - Table: [] ## Also an empty table, with implicit ID.
    - ID: 100
      Table: [] ## Also an empty table, with explicit ID.

I believe all three should produce identical output (i.e. a single 0 byte), resulting in a 3-byte debug_abbrev section.

As far as I can tell, your patch looks good here.

aadsm added inline comments.Sep 20 2020, 2:38 PM
llvm/tools/obj2yaml/dwarf2yaml.cpp
28–29

Ok, before I update this patch, I want to have agreement from both of you that the begin() == end() check I had before is required to avoid always having an empty table added.

Basically, that if I have the following yaml:

debug_abbrev:
    - ID: 3
    - Table:
        - Code:     1
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_addr
    - ID: 4
    - Table:
        - Code:     1
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_addr

And passing it through yaml2obj | obj2yaml, should yield:

debug_abbrev:
    - ID: 0
    - Table:
        - Code:     1
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_addr
    - ID: 1
    - Table:
        - Code:     1
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_addr

and not (the next output is what this diff in its current iteration does):

debug_abbrev:
    - ID: 0
    - Table:
        - Code:     1
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_addr
    - ID: 1
    - Table:
        - Code:     1
          Tag:      DW_TAG_compile_unit
          Children: DW_CHILDREN_no
          Attributes:
            - Attribute: DW_AT_low_pc
              Form:      DW_FORM_add
    - ID: 2
Higuoxing added inline comments.Sep 20 2020, 10:50 PM
llvm/tools/obj2yaml/dwarf2yaml.cpp
28–29

Currently, obj2yaml uses the index of the abbrev table as its ID. That is to say, the ID field of the abbrev table should always be the index of the table. The AbbrevTableID field of the compilation unit should always be the distance between the associated abbrev table and the first abbrev table.

Your patch looks good and I think we shouldn't add the checker begin() == end() to avoid dumping empty abbrev tables since the DWARF spec doesn't forbid having empty abbrev tables. Besides, the output of obj2yaml and the input of yaml2obj should be identical though the abbrev tables might have different IDs.

debug_abbrev:
  - ID: 3  ## Empty table with explicit ID(3).
  - Table: ## Non-empty table with implicit ID(1).
      - Code:     1
        Tag:      DW_TAG_compile_unit
        Children: DW_CHILDREN_no
        Attributes:
          - Attribute: DW_AT_low_pc
            Form:      DW_FORM_addr
  - ID: 4  ## Empty table with explicit ID(4).
  - Table: ## Non-empty table with implicit ID(3).
      - Code:     1
        Tag:      DW_TAG_compile_unit
        Children: DW_CHILDREN_no
        Attributes:
          - Attribute: DW_AT_low_pc
            Form:      DW_FORM_addr

after passing it through yaml2obj | obj2yaml, the output should be:

debug_abbrev:
  - ID:    0 ## Empty table.
  - ID:    1 ## Non-empty table with ID/Index (1).
    Table:
      - Code:        0x0000000000000001
        Tag:         DW_TAG_compile_unit
        Children:    DW_CHILDREN_no
        Attributes:
          - Attribute: DW_AT_low_pc
            Form:      DW_FORM_addr
  - ID: 2  ## Empty table with ID/Index(2).
  - ID: 3  ## Non-empty table with ID/Index(3).
    Table:
      - Code:      0x0000000000000001
        Tag:       DW_TAG_compile_unit
        Children:  DW_CHILDREN_no
        Attributes:
          - Attribute: DW_AT_low_pc
            Form:      DW_FORM_addr

This is exactly the output of obj2yaml after applying this patch.

aadsm updated this revision to Diff 302154.Nov 1 2020, 10:25 AM

Address all comments (I hope)

jhenderson accepted this revision.Nov 2 2020, 12:53 AM

I don't think I have any more comments, so LGTM, but I'd like @Higuoxing to have another brief look if he gets a moment to make sure I haven't missed anything.

Higuoxing accepted this revision.Nov 2 2020, 2:41 AM

LGTM, thanks!

This revision was landed with ongoing or failed builds.Nov 8 2020, 6:12 PM
This revision was automatically updated to reflect the committed changes.