Remove obsolete debug info while garbage collecting.
This patch is an illustration for llvm-dev thread:
http://lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html
It does "Proof of Concept" implementation for the "removing obsolete debug info while garbage collecting" problem.
It implements two features "Remove obsolete debug info" and "Alternative implementation for the types deduplication":
- Remove obsolete debug info.
Currently when the linker does garbage collection a lot of abandoned debug info is left behind. The patch skips the debug info (subprograms, address ranges, line sentences) that corresponds to the dead sections.
Command-line option is added to lld: --gc-debuginfo. It removes pieces of debug information related to the discarded sections.
Short description of the implementation:- Examine compile unit`s address ranges, check whether they resolved into live section and put the corresponding subprogram into live subprograms list.
For .debug_info section: - Parse abbreviations and note attributes of lld/test/ELF/reference type. DIE with references should be patched, since offsets could become incorrect if some parts of .debug_info sections would be removed.
- Go through the list and parse live subprograms, scan through all references and mark reached DIEs as live.
- Scan all compile unit`s DIEs in continuous order, mark live DIEs for copying into the output section.
- After the whole marking finished: scan for .debug_info references and patch them according to the new section layout.
For .debug_ranges/debug_rnglists - Mark address entries related to live sections for copying into the output section.
For .debug_line section: - Mark line program pieces related to live sections for copying into the output section.
Result(for gc-debuginfo-wo-types.s):
A. .debug_info section:
- Examine compile unit`s address ranges, check whether they resolved into live section and put the corresponding subprogram into live subprograms list.
w/o -gc-debuginfo: | with -gc-debuginfo: | | 0x0000000b: DW_TAG_compile_unit | 0x0000000b: DW_TAG_compile_unit DW_AT_name ("mod1.cpp") | DW_AT_name ("mod1.cpp") DW_AT_low_pc (0x0000000000000000) | DW_AT_low_pc (0x0000000000000000) DW_AT_ranges (0x00000000 | DW_AT_ranges (0x00000000 [0x0000000000201010, 0x0000000000201018) | [0x0000000000201010, 0x0000000000201018)) [0x0000000000000000, 0x0000000000000008)) | | 0x0000002a: DW_TAG_subprogram | 0x0000002a: DW_TAG_subprogram DW_AT_low_pc (0x0000000000201010) | DW_AT_low_pc (0x0000000000201010) DW_AT_high_pc (0x0000000000201018) | DW_AT_high_pc (0x0000000000201018) DW_AT_name ("func_mod1_used") | DW_AT_name ("func_mod1_used") DW_AT_type (0x00000082 "int") | DW_AT_type (0x00000056 "int") | 0x00000047: DW_TAG_formal_parameter | 0x00000047: DW_TAG_formal_parameter DW_AT_name ("p1") | DW_AT_name ("p1") DW_AT_type (0x00000082 "int") | DW_AT_type (0x00000056 "int") | 0x00000055: NULL | 0x00000055: NULL | 0x00000056: DW_TAG_subprogram | 0x00000056: DW_TAG_base_type DW_AT_low_pc (0x0000000000000000) | DW_AT_name ("int") DW_AT_high_pc (0x0000000000000008) | DW_AT_name ("func_mod1_not_used") | 0x0000005d: NULL DW_AT_type (0x00000082 "int") | | 0x00000073: DW_TAG_formal_parameter | DW_AT_name ("p1") | DW_AT_type (0x00000082 "int") | | 0x00000081: NULL | | 0x00000082: DW_TAG_base_type | DW_AT_name ("int") | | 0x00000089: NULL |
B. .debug_ranges section:
w/o -gc-debuginfo: | with -gc-debuginfo: | 00000000 0000000000201010 0000000000201018 | 00000000 0000000000201010 0000000000201018 00000000 0000000000000000 0000000000000008 | 00000000 <End of list> 00000000 <End of list> | 00000020 0000000000201020 0000000000201033 00000030 0000000000201020 0000000000201033 | 00000020 <End of list> 00000030 0000000000000000 000000000000000c | 00000030 <End of list> |
C. .debug_line section:
w/o -gc-debuginfo:
debug_line[0x0000005c] ..... Address Line Column File ISA Discriminator Flags ------------------ ------ ------ ------ --- ------------- ------------- 0x0000000000201010 3 0 1 0 0 is_stmt 0x0000000000201014 4 15 1 0 0 is_stmt prologue_end 0x0000000000201017 4 5 1 0 0 0x0000000000201018 4 5 1 0 0 end_sequence 0x0000000000000000 7 0 1 0 0 is_stmt 0x0000000000000004 8 15 1 0 0 is_stmt prologue_end 0x0000000000000007 8 5 1 0 0 0x0000000000000008 8 5 1 0 0 end_sequence
with -gc-debuginfo:
debug_line[0x0000005c] ..... Address Line Column File ISA Discriminator Flags ------------------ ------ ------ ------ --- ------------- ------------- 0x0000000000201010 3 0 1 0 0 is_stmt 0x0000000000201014 4 15 1 0 0 is_stmt prologue_end 0x0000000000201017 4 5 1 0 0 0x0000000000201018 4 5 1 0 0 end_sequence
- Alternative implementation for the types deduplication.
There already exists implementation for types deduplication: -fdebug-types-section. This patch uses another solution : parse DWARF, cut out duplicated types, patch type references to point to the single type definition.
Command-line option is added to lld: --gc-debuginfo-types. It does alternative type deduplication while doing --gc-debuginfo.
Short description of the implementation:- Parse abbreviations and note attributes of reference type. references of DW_FORM_ref4 kind pointing to the types would be changed into DW_FORM_ref_addr.
- Scan all compile unit`s DIEs in continuous order, calculate type hash for types. Mark new type descriptions as live and put it`s offset into type map. Skip type descriptions which already have matching entry in the type map.
- After the whole marking finished: scan for type references and patch them according to the created type offsets map. Also scan for .debug_info references and patch them according to the new section layout.
Result(for gc-debuginfo-with-types.s):
0x00000000: Compile Unit: length = 0x0000005a | 0x00000000: Compile Unit: length = 0x0000005a | 0x0000000b: DW_TAG_compile_unit | 0x0000000b: DW_TAG_compile_unit DW_AT_name ("mod1.cpp") | DW_AT_name ("mod1.cpp") | 0x0000002a: DW_TAG_subprogram | 0x0000002a: DW_TAG_subprogram DW_AT_name ("func_mod1_used") | DW_AT_name ("func_mod1_used") DW_AT_type (0x00000056 "int") | DW_AT_type (0x0000000000000056 "int") | 0x00000047: DW_TAG_formal_parameter | 0x00000047: DW_TAG_formal_parameter DW_AT_name ("p1") | DW_AT_name ("p1") DW_AT_type (0x00000056 "int") | DW_AT_type (0x0000000000000056 "int") | 0x00000055: NULL | 0x00000055: NULL | 0x00000056: DW_TAG_base_type | 0x00000056: DW_TAG_base_type DW_AT_name ("int") | DW_AT_name ("int") | 0x0000005d: NULL | 0x0000005d: NULL | 0x0000005e: Compile Unit: length = 0x000000a7 | 0x0000005e: Compile Unit: length = 0x000000a0 | 0x00000069: DW_TAG_compile_unit | 0x00000069: DW_TAG_compile_unit DW_AT_name ("mod2.cpp") | DW_AT_name ("mod2.cpp") | 0x00000088: DW_TAG_subprogram | 0x00000088: DW_TAG_subprogram DW_AT_name ("func_mod2_used") | DW_AT_name ("func_mod2_used") DW_AT_type (0x000000c2 "int") | DW_AT_type (0x0000000000000056 "int") | 0x000000a5: DW_TAG_formal_parameter | 0x000000a5: DW_TAG_formal_parameter DW_AT_name ("p1") | DW_AT_name ("p1") DW_AT_type (0x000000c9 "SS") | DW_AT_type (0x00000000000000c2 "SS") | 0x000000b3: DW_TAG_variable | 0x000000b3: DW_TAG_variable DW_AT_name ("is") | DW_AT_name ("is") DW_AT_type (0x000000ea "inner_SS") | DW_AT_type (0x00000000000000e3 "inner_SS") | 0x000000c1: NULL | 0x000000c1: NULL | 0x000000c2: DW_TAG_base_type | DW_AT_name ("int") | | 0x000000c9: DW_TAG_structure_type | 0x000000c2: DW_TAG_structure_type DW_AT_name ("SS") | DW_AT_name ("SS") | 0x000000d2: DW_TAG_member | 0x000000cb: DW_TAG_member DW_AT_name ("a1") | DW_AT_name ("a1") DW_AT_type (0x000000c2 "int") | DW_AT_type (0x0000000000000056 "int") | 0x000000de: DW_TAG_member | 0x000000d7: DW_TAG_member DW_AT_name ("a2") | DW_AT_name ("a2") DW_AT_type (0x00000101 "float") | DW_AT_type (0x00000000000000fa "float") | 0x000000ea: DW_TAG_structure_type | 0x000000e3: DW_TAG_structure_type DW_AT_name ("inner_SS") | DW_AT_name ("inner_SS") | 0x000000f3: DW_TAG_member | 0x000000ec: DW_TAG_member DW_AT_name ("a1") | DW_AT_name ("a1") DW_AT_type (0x000000c2 "int") | DW_AT_type (0x0000000000000056 "int") | 0x000000ff: NULL | 0x000000f8: NULL | 0x00000100: NULL | 0x000000f9: NULL | 0x00000101: DW_TAG_base_type | 0x000000fa: DW_TAG_base_type DW_AT_name ("float") | DW_AT_name ("float") | 0x00000108: NULL | 0x00000101: NULL | 0x00000109: Compile Unit: length = 0x000000a3 | 0x00000102: Compile Unit: length = 0x0000005d | 0x00000114: DW_TAG_compile_unit | 0x0000010d: DW_TAG_compile_unit DW_AT_name ("main.cpp") | DW_AT_name ("main.cpp") | 0x00000133: DW_TAG_subprogram | 0x0000012c: DW_TAG_subprogram DW_AT_name ("main") | DW_AT_name ("main") DW_AT_type (0x00000169 "int") | DW_AT_type (0x0000000000000056 "int") | 0x0000014c: DW_TAG_variable | 0x00000145: DW_TAG_variable DW_AT_name ("s") | DW_AT_name ("s") DW_AT_type (0x00000170 "SS") | DW_AT_type (0x00000000000000c2 "SS") | 0x0000015a: DW_TAG_variable | 0x00000153: DW_TAG_variable DW_AT_name ("is") | DW_AT_name ("is") DW_AT_type (0x00000191 "inner_SS") | DW_AT_type (0x00000000000000e3 "inner_SS") | 0x00000168: NULL | 0x00000161: NULL | 0x00000169: DW_TAG_base_type | 0x00000162: NULL DW_AT_name ("int") | | 0x00000170: DW_TAG_structure_type | DW_AT_name ("SS") | | 0x00000179: DW_TAG_member | DW_AT_name ("a1") | DW_AT_type (0x00000169 "int") | | 0x00000185: DW_TAG_member | DW_AT_name ("a2") | DW_AT_type (0x000001a8 "float") | | 0x00000191: DW_TAG_structure_type | DW_AT_name ("inner_SS") | | 0x0000019a: DW_TAG_member | DW_AT_name ("a1") | DW_AT_type (0x00000169 "int") | | 0x000001a6: NULL | | 0x000001a7: NULL | | 0x000001a8: DW_TAG_base_type | DW_AT_name ("float") | | 0x000001af: NULL |
Files description :
lld/ELF/MarkDebuginfo.h (20 lines)
lld/ELF/MarkDebuginfo.cpp (98 lines)
main routine: markUsedDebuginfo().
lld/ELF/MarkDebugSectionInfo.h (197 lines)
lld/ELF/MarkDebugSectionInfo.cpp (639 lines)
Parses .debug_info
lld/ELF/MarkDebugSectionLines.h (142 lines)
Parses .debug_line
lld/ELF/MarkDebugSectionRanges.h (212 lines)
Parses .debug_ranges and /debug_rnglists
lld/ELF/MarkDebuginfoTypeHash.h (95 lines)
lld/ELF/MarkDebuginfoTypeHash.cpp (404 lines)
Implements type hash, for types comparison.
lld/ELF/InputSection.h
Implements DebugInputSection.