This is an archive of the discontinued LLVM Phabricator instance.

[WIP][Debuginfo][LLD] Remove obsolete debug info while garbage collecting.
AbandonedPublic

Authored by avl on Sep 11 2019, 2:44 PM.

Details

Summary

Remove obsolete debug info while garbage collecting.

This patch is an illustration for llvm-dev thread:
http://lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html

It does "Proof of Concept" implementation for the "removing obsolete debug info while garbage collecting" problem.
It implements two features "Remove obsolete debug info" and "Alternative implementation for the types deduplication":

  1. Remove obsolete debug info.

    Currently when the linker does garbage collection a lot of abandoned debug info is left behind. The patch skips the debug info (subprograms, address ranges, line sentences) that corresponds to the dead sections.

    Command-line option is added to lld: --gc-debuginfo. It removes pieces of debug information related to the discarded sections.

    Short description of the implementation:
    1. Examine compile unit`s address ranges, check whether they resolved into live section and put the corresponding subprogram into live subprograms list.

      For .debug_info section:
    2. Parse abbreviations and note attributes of lld/test/ELF/reference type. DIE with references should be patched, since offsets could become incorrect if some parts of .debug_info sections would be removed.
    3. Go through the list and parse live subprograms, scan through all references and mark reached DIEs as live.
    4. Scan all compile unit`s DIEs in continuous order, mark live DIEs for copying into the output section.
    5. After the whole marking finished: scan for .debug_info references and patch them according to the new section layout.

      For .debug_ranges/debug_rnglists
    6. Mark address entries related to live sections for copying into the output section.

      For .debug_line section:
    7. Mark line program pieces related to live sections for copying into the output section.

      Result(for gc-debuginfo-wo-types.s):

      A. .debug_info section:
            w/o -gc-debuginfo:                 |         with -gc-debuginfo:
                                               |
                                               |
0x0000000b: DW_TAG_compile_unit                | 0x0000000b: DW_TAG_compile_unit
  DW_AT_name        ("mod1.cpp")               |   DW_AT_name        ("mod1.cpp")
  DW_AT_low_pc      (0x0000000000000000)       |   DW_AT_low_pc      (0x0000000000000000)
  DW_AT_ranges      (0x00000000                |   DW_AT_ranges      (0x00000000
     [0x0000000000201010, 0x0000000000201018)  |     [0x0000000000201010, 0x0000000000201018))
     [0x0000000000000000, 0x0000000000000008)) |
                                               |
0x0000002a:   DW_TAG_subprogram                | 0x0000002a:   DW_TAG_subprogram
  DW_AT_low_pc    (0x0000000000201010)         |   DW_AT_low_pc    (0x0000000000201010)
  DW_AT_high_pc   (0x0000000000201018)         |   DW_AT_high_pc   (0x0000000000201018)
  DW_AT_name      ("func_mod1_used")           |   DW_AT_name      ("func_mod1_used")
  DW_AT_type      (0x00000082 "int")           |   DW_AT_type      (0x00000056 "int")
                                               |
0x00000047:     DW_TAG_formal_parameter        | 0x00000047:     DW_TAG_formal_parameter
  DW_AT_name    ("p1")                         |   DW_AT_name    ("p1")
  DW_AT_type    (0x00000082 "int")             |   DW_AT_type    (0x00000056 "int")
                                               |
0x00000055:     NULL                           | 0x00000055:     NULL
                                               |
0x00000056:   DW_TAG_subprogram                | 0x00000056:   DW_TAG_base_type
  DW_AT_low_pc    (0x0000000000000000)         |   DW_AT_name      ("int")
  DW_AT_high_pc   (0x0000000000000008)         |
  DW_AT_name      ("func_mod1_not_used")       | 0x0000005d:   NULL
  DW_AT_type      (0x00000082 "int")           |
                                               |
0x00000073:     DW_TAG_formal_parameter        |
  DW_AT_name    ("p1")                         |
  DW_AT_type    (0x00000082 "int")             |
                                               | 
0x00000081:     NULL                           |
                                               |
0x00000082:   DW_TAG_base_type                 |
  DW_AT_name      ("int")                      |
                                               |
0x00000089:   NULL                             |
B. .debug_ranges section:
            w/o -gc-debuginfo:                 |       with -gc-debuginfo:
                                               |
00000000 0000000000201010 0000000000201018     | 00000000 0000000000201010 0000000000201018 
00000000 0000000000000000 0000000000000008     | 00000000 <End of list>
00000000 <End of list>                         | 00000020 0000000000201020 0000000000201033
00000030 0000000000201020 0000000000201033     | 00000020 <End of list>
00000030 0000000000000000 000000000000000c     |
00000030 <End of list>                         |
C. .debug_line section:

w/o -gc-debuginfo:

debug_line[0x0000005c]
.....
Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000201010      3      0      1   0             0  is_stmt
0x0000000000201014      4     15      1   0             0  is_stmt prologue_end
0x0000000000201017      4      5      1   0             0 
0x0000000000201018      4      5      1   0             0  end_sequence
0x0000000000000000      7      0      1   0             0  is_stmt
0x0000000000000004      8     15      1   0             0  is_stmt prologue_end
0x0000000000000007      8      5      1   0             0 
0x0000000000000008      8      5      1   0             0  end_sequence

with -gc-debuginfo:

debug_line[0x0000005c]
.....
Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000201010      3      0      1   0             0  is_stmt
0x0000000000201014      4     15      1   0             0  is_stmt prologue_end
0x0000000000201017      4      5      1   0             0 
0x0000000000201018      4      5      1   0             0  end_sequence
  1. Alternative implementation for the types deduplication.

    There already exists implementation for types deduplication: -fdebug-types-section. This patch uses another solution : parse DWARF, cut out duplicated types, patch type references to point to the single type definition.

    Command-line option is added to lld: --gc-debuginfo-types. It does alternative type deduplication while doing --gc-debuginfo.

    Short description of the implementation:
    1. Parse abbreviations and note attributes of reference type. references of DW_FORM_ref4 kind pointing to the types would be changed into DW_FORM_ref_addr.
    2. Scan all compile unit`s DIEs in continuous order, calculate type hash for types. Mark new type descriptions as live and put it`s offset into type map. Skip type descriptions which already have matching entry in the type map.
    3. After the whole marking finished: scan for type references and patch them according to the created type offsets map. Also scan for .debug_info references and patch them according to the new section layout.

      Result(for gc-debuginfo-with-types.s):
0x00000000: Compile Unit: length = 0x0000005a           | 0x00000000: Compile Unit: length = 0x0000005a 
                                                        |
0x0000000b: DW_TAG_compile_unit                         | 0x0000000b: DW_TAG_compile_unit
  DW_AT_name        ("mod1.cpp")                        |   DW_AT_name        ("mod1.cpp")
                                                        |
0x0000002a:   DW_TAG_subprogram                         | 0x0000002a:   DW_TAG_subprogram 
  DW_AT_name      ("func_mod1_used")                    |   DW_AT_name      ("func_mod1_used")
  DW_AT_type      (0x00000056 "int")                    |   DW_AT_type      (0x0000000000000056 "int")
                                                        |
0x00000047:     DW_TAG_formal_parameter                 | 0x00000047:     DW_TAG_formal_parameter
  DW_AT_name    ("p1")                                  |   DW_AT_name    ("p1")
  DW_AT_type    (0x00000056 "int")                      |   DW_AT_type    (0x0000000000000056 "int")
                                                        |
0x00000055:     NULL                                    | 0x00000055:     NULL
                                                        |
0x00000056:   DW_TAG_base_type                          | 0x00000056:   DW_TAG_base_type
  DW_AT_name      ("int")                               |   DW_AT_name      ("int")
                                                        |
0x0000005d:   NULL                                      | 0x0000005d:   NULL
                                                        |
0x0000005e: Compile Unit: length = 0x000000a7           | 0x0000005e: Compile Unit: length = 0x000000a0
                                                        |
0x00000069: DW_TAG_compile_unit                         | 0x00000069: DW_TAG_compile_unit
  DW_AT_name        ("mod2.cpp")                        |   DW_AT_name        ("mod2.cpp")
                                                        |
0x00000088:   DW_TAG_subprogram                         | 0x00000088:   DW_TAG_subprogram
  DW_AT_name      ("func_mod2_used")                    |   DW_AT_name      ("func_mod2_used")
  DW_AT_type      (0x000000c2 "int")                    |   DW_AT_type      (0x0000000000000056 "int")          
                                                        |
0x000000a5:     DW_TAG_formal_parameter                 | 0x000000a5:     DW_TAG_formal_parameter
  DW_AT_name    ("p1")                                  |   DW_AT_name    ("p1")
  DW_AT_type    (0x000000c9 "SS")                       |   DW_AT_type    (0x00000000000000c2 "SS")
                                                        |
0x000000b3:     DW_TAG_variable                         | 0x000000b3:     DW_TAG_variable
  DW_AT_name    ("is")                                  |   DW_AT_name    ("is")
  DW_AT_type    (0x000000ea "inner_SS")                 |   DW_AT_type    (0x00000000000000e3 "inner_SS")
                                                        |
0x000000c1:     NULL                                    | 0x000000c1:     NULL
                                                        |
0x000000c2:   DW_TAG_base_type                          |
  DW_AT_name      ("int")                               |
                                                        |
0x000000c9:   DW_TAG_structure_type                     | 0x000000c2:   DW_TAG_structure_type
  DW_AT_name      ("SS")                                |   DW_AT_name      ("SS")
                                                        |
0x000000d2:     DW_TAG_member                           | 0x000000cb:     DW_TAG_member
  DW_AT_name    ("a1")                                  |   DW_AT_name    ("a1")
  DW_AT_type    (0x000000c2 "int")                      |   DW_AT_type    (0x0000000000000056 "int")
                                                        |
0x000000de:     DW_TAG_member                           | 0x000000d7:     DW_TAG_member
  DW_AT_name    ("a2")                                  |   DW_AT_name    ("a2")
  DW_AT_type    (0x00000101 "float")                    |   DW_AT_type    (0x00000000000000fa "float")
                                                        |
0x000000ea:     DW_TAG_structure_type                   | 0x000000e3:     DW_TAG_structure_type
  DW_AT_name    ("inner_SS")                            |   DW_AT_name    ("inner_SS")
                                                        |
0x000000f3:       DW_TAG_member                         | 0x000000ec:       DW_TAG_member
  DW_AT_name  ("a1")                                    |   DW_AT_name  ("a1")
  DW_AT_type  (0x000000c2 "int")                        |   DW_AT_type  (0x0000000000000056 "int")
                                                        |
0x000000ff:       NULL                                  | 0x000000f8:       NULL
                                                        |
0x00000100:     NULL                                    | 0x000000f9:     NULL
                                                        |
0x00000101:   DW_TAG_base_type                          | 0x000000fa:   DW_TAG_base_type
  DW_AT_name      ("float")                             |   DW_AT_name      ("float")
                                                        |
0x00000108:   NULL                                      | 0x00000101:   NULL
                                                        |
0x00000109: Compile Unit: length = 0x000000a3           | 0x00000102: Compile Unit: length = 0x0000005d
                                                        |
0x00000114: DW_TAG_compile_unit                         | 0x0000010d: DW_TAG_compile_unit
  DW_AT_name        ("main.cpp")                        |   DW_AT_name        ("main.cpp")
                                                        |
0x00000133:   DW_TAG_subprogram                         | 0x0000012c:   DW_TAG_subprogram
  DW_AT_name      ("main")                              |   DW_AT_name      ("main")
  DW_AT_type      (0x00000169 "int")                    |   DW_AT_type      (0x0000000000000056 "int")
                                                        |
0x0000014c:     DW_TAG_variable                         | 0x00000145:     DW_TAG_variable
  DW_AT_name    ("s")                                   |   DW_AT_name    ("s")
  DW_AT_type    (0x00000170 "SS")                       |   DW_AT_type    (0x00000000000000c2 "SS")
                                                        |
0x0000015a:     DW_TAG_variable                         | 0x00000153:     DW_TAG_variable
  DW_AT_name    ("is")                                  |   DW_AT_name    ("is")
  DW_AT_type    (0x00000191 "inner_SS")                 |   DW_AT_type    (0x00000000000000e3 "inner_SS")
                                                        |
0x00000168:     NULL                                    | 0x00000161:     NULL
                                                        |
0x00000169:   DW_TAG_base_type                          | 0x00000162:   NULL
                DW_AT_name      ("int")                 |
                                                        |
0x00000170:   DW_TAG_structure_type                     |
                DW_AT_name      ("SS")                  |
                                                        |
0x00000179:     DW_TAG_member                           |
                  DW_AT_name    ("a1")                  |
                  DW_AT_type    (0x00000169 "int")      |
                                                        |
0x00000185:     DW_TAG_member                           |
                  DW_AT_name    ("a2")                  |
                  DW_AT_type    (0x000001a8 "float")    |
                                                        |
0x00000191:     DW_TAG_structure_type                   |
                  DW_AT_name    ("inner_SS")            |
                                                        |
0x0000019a:       DW_TAG_member                         |
                    DW_AT_name  ("a1")                  |
                    DW_AT_type  (0x00000169 "int")      |
                                                        |
0x000001a6:       NULL                                  |
                                                        |
0x000001a7:     NULL                                    |
                                                        |
0x000001a8:   DW_TAG_base_type                          |
                DW_AT_name      ("float")               |
                                                        |
0x000001af:   NULL                                      |

Files description :

lld/ELF/MarkDebuginfo.h (20 lines)
lld/ELF/MarkDebuginfo.cpp (98 lines)

main routine: markUsedDebuginfo().

lld/ELF/MarkDebugSectionInfo.h (197 lines)
lld/ELF/MarkDebugSectionInfo.cpp (639 lines)

  Parses .debug_info

lld/ELF/MarkDebugSectionLines.h (142 lines)

  Parses .debug_line

lld/ELF/MarkDebugSectionRanges.h (212 lines)

Parses .debug_ranges and /debug_rnglists

lld/ELF/MarkDebuginfoTypeHash.h (95 lines)
lld/ELF/MarkDebuginfoTypeHash.cpp (404 lines)

Implements type hash, for types comparison.

lld/ELF/InputSection.h

Implements DebugInputSection.

Diff Detail

Event Timeline

avl created this revision.Sep 11 2019, 2:44 PM
avl added a comment.Sep 11 2019, 3:34 PM

Please note, this patch is not for integrating. This is for illustrative purpose for the thread on llvm-dev. http://lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html

avl edited the summary of this revision. (Show Details)Sep 12 2019, 7:33 AM
avl added subscribers: ruiu, dblaikie, probinson and 3 others.
avl added a comment.May 20 2020, 5:25 AM

That patch was a prototype. Currently there is another implementation which uses code extracted from dsymutil - https://reviews.llvm.org/D74169. I abandone this patch in favor to https://reviews.llvm.org/D74169.

avl abandoned this revision.May 20 2020, 5:26 AM