This is an archive of the discontinued LLVM Phabricator instance.

RFC: [ELF] Add --dwarf32-before-dwarf64 to place DWARF32 input sections before DWARF64
Needs ReviewPublic

Authored by MaskRay on Nov 12 2020, 11:02 PM.

Details

Summary

See https://lists.llvm.org/pipermail/llvm-dev/2020-November/146522.html
"[LLD] Support DWARF64, debug_info "sorting""

If a .debug_* output section S can be larger than 32-bit and its section
offset is referenced by a DWARF32 input section of itself or another .debug_* output section,
S may be subject to 32-bit relocation overflow. If we place such DWARF32
sections before DWARF64 sections, we can likely mitigate overflows.

Parsing DWARF is time consuming, breaks the "smart format, dumb linker" design goal of ELF,
and can get in the way if we eagerly free uncompressed buffers after OutputSection::writeTo.
So we use the relocation idea proposed by Igor Kudrin
https://lists.llvm.org/pipermail/llvm-dev/2020-November/146528.html

In practice, the first relocation of .debug_* is good indicator whether it is a
DWARF64 section, to list a few:

  • .debug_info: the first relocation is a .debug_abbrev offset
  • .debug_names references .debug_info: the first relocation is a .debug_info offset
  • .debug_aranges references .debug_info: the first relocation is a .debug_info offset
  • .debug_str_offsets references .debug_str: the first relocation is a .debug_str offset
  • ...

This patch adds the partition code to finalizeInputSections so that it works
for both orphan sections and when an output section description for .debug_info
is used. In some sense --dwarf32-before-dwarf64 will behave like
--sort-section: it can affect input section order of a wildcard.

Diff Detail

Event Timeline

MaskRay created this revision.Nov 12 2020, 11:02 PM
MaskRay requested review of this revision.Nov 12 2020, 11:02 PM
MaskRay added inline comments.Nov 12 2020, 11:04 PM
lld/ELF/OutputSections.cpp
170

This should be Rel. I'll add a mips64 test.

jhenderson added inline comments.Nov 13 2020, 1:13 AM
lld/ELF/Options.td
150–151

"place" sounds like the linker isn't allowed to place DWARF32 sections first, but this obviously isn't the behaviour, if users just so happen to place objects in the right order on the command-line. Perhaps "sort" makes it clearer?

lld/test/ELF/dwarf32-before-dwarf64.s
1

Is there a particular reason this is using ppc rather than the more common X86?

24

Here and below, maybe worth replacing the area patched by relocations with {{.*}} to make the test more robust?

Well, this is a bit different from my original idea but is an overall good heuristic for many of the debug sections. It works for .debug_info, which is one of the biggest sections; it does not work for .debug_line, though, which is not that big as .debug_info, but potentially might become problematic in the (distant) future; it also does not work for .debug_abbrev, .debug_addr, .debug_ranges, and some others, which are usually not that big. However, the patch should be extended to support .debug_str, which can be even larger than .debug_info.

avl added a subscriber: avl.EditedNov 13 2020, 5:48 AM

Well, this is a bit different from my original idea but is an overall good heuristic for many of the debug sections. It works for .debug_info, which is one of the biggest sections; it does not work for .debug_line, though, which is not that big as .debug_info, but potentially might become problematic in the (distant) future; it also does not work for .debug_abbrev, .debug_addr, .debug_ranges, and some others, which are usually not that big. However, the patch should be extended to support .debug_str, which can be even larger than .debug_info.

Would it be useful if decision done for the debug_info section would also be applied to all other debug sections from the same object file?

wenlei added a subscriber: wenlei.Nov 13 2020, 7:23 AM
MaskRay added a comment.EditedNov 13 2020, 8:56 AM
In D91404#2393750, @avl wrote:

Well, this is a bit different from my original idea but is an overall good heuristic for many of the debug sections. It works for .debug_info, which is one of the biggest sections; it does not work for .debug_line, though, which is not that big as .debug_info, but potentially might become problematic in the (distant) future; it also does not work for .debug_abbrev, .debug_addr, .debug_ranges, and some others, which are usually not that big. However, the patch should be extended to support .debug_str, which can be even larger than .debug_info.

Would it be useful if decision done for the debug_info section would also be applied to all other debug sections from the same object file?

Well, this is a bit different from my original idea but is an overall good heuristic for many of the debug sections. It works for .debug_info, which is one of the biggest sections; it does not work for .debug_line, though, which is not that big as .debug_info, but potentially might become problematic in the (distant) future; it also does not work for .debug_abbrev, .debug_addr, .debug_ranges, and some others, which are usually not that big. However, the patch should be extended to support .debug_str, which can be even larger than .debug_info.

We either mark InputFile or InputSectionBase referenced by a 64-bit absolute relocation. I prefer InputBaseBase which is more direct (we have spare bits in SectionBase after keepUnique).

DWARF v4 .debug_str is referenced by .debug_info's non-first relocation. The InputSectionBase based referenced bit approach does not work. So maybe we have to use InputFile.

In D91404#2393750, @avl wrote:

Well, this is a bit different from my original idea but is an overall good heuristic for many of the debug sections. It works for .debug_info, which is one of the biggest sections; it does not work for .debug_line, though, which is not that big as .debug_info, but potentially might become problematic in the (distant) future; it also does not work for .debug_abbrev, .debug_addr, .debug_ranges, and some others, which are usually not that big. However, the patch should be extended to support .debug_str, which can be even larger than .debug_info.

Would it be useful if decision done for the debug_info section would also be applied to all other debug sections from the same object file?

Well, this is a bit different from my original idea but is an overall good heuristic for many of the debug sections. It works for .debug_info, which is one of the biggest sections; it does not work for .debug_line, though, which is not that big as .debug_info, but potentially might become problematic in the (distant) future; it also does not work for .debug_abbrev, .debug_addr, .debug_ranges, and some others, which are usually not that big. However, the patch should be extended to support .debug_str, which can be even larger than .debug_info.

We either mark InputFile or InputSectionBase referenced by a 64-bit absolute relocation. I prefer InputBaseBase which is more direct (we have spare bits in SectionBase after keepUnique).

DWARF v4 .debug_str is referenced by .debug_info's non-first relocation. The InputSectionBase based referenced bit approach does not work. So maybe we have to use InputFile.

If I understand this correctly if we store some kind of DWARF64 flag in InputFile, we can still use firstReloc approach.
The assumption is that all the input sections within the file will have the same DWARF pointer width.
Otherwise we have to scan all the relocs and mark corresponding sections where Symbol is pointing as DWARF64.
Correct?