This is an archive of the discontinued LLVM Phabricator instance.

Prototype fix for lld DWARF parsing of base address selection entries in range lists
AbandonedPublic

Authored by dblaikie on Jul 31 2017, 11:53 AM.

Details

Reviewers
grimar
rafael
Summary

A recent improvement to LLVM produces DWARF compliant, but (apparently)
infrequently used range lists - using base address selection entries to
reduce the number of relocations required (& reduce file size &
hopefully link time)

This breaks lld's use of LLVM's libDebugInfo which isn't correctly
tracking the section index in the presence of this kind of debug info.

Here's my first blush at a fix for libDebugInfo - though it does rais
some questions and needs testing (is there any testing in LLVM (not lld)
for the section index API in libDebugInfo? it'd be good if there was,
maybe API level testing).

How should/does this differentiate between the case where the unit's
base address is actually zero? compared to when it's not present? Does
that matter? I'm not sure.

(aside: does anyone have an idea of how well LLD scales with the number of
relocations versus the number of bytes of data in a section? I'd love to know
how much this reduction in relocations is worth (like is runtime of the linker
roughly N*relocs + M*raw bytes? Could be interesting to know - obviously
varying on different configurations (cores, disk speed, etc))

Event Timeline

dblaikie created this revision.Jul 31 2017, 11:53 AM
grimar edited edge metadata.Aug 1 2017, 3:16 AM

A recent improvement to LLVM produces DWARF compliant, but (apparently)
infrequently used range lists - using base address selection entries to
reduce the number of relocations required (& reduce file size &
hopefully link time)

This breaks lld's use of LLVM's libDebugInfo which isn't correctly
tracking the section index in the presence of this kind of debug info.

Do you have any sample code/testcase/anything that shows/can help to reproduce the issue ?
(looking at implementation I think I know how to generate DWARF section content manually, but I
am not sure what should I do to reproduce issue using cpp+clang for example).

Here's my first blush at a fix for libDebugInfo - though it does rais
some questions and needs testing (is there any testing in LLVM (not lld)
for the section index API in libDebugInfo? it'd be good if there was,
maybe API level testing).

I do not think we have. That looks to be my fault, I don't think I added
LLVM testcase when implemented section index API.
I'll check what can I do for it.

lib/DebugInfo/DWARF/DWARFDebugRangeList.cpp
75

How should/does this differentiate between the case where the unit's
base address is actually zero? compared to when it's not present? Does
that matter? I'm not sure.

So DWARF spec says:
The applicable base address of a range list entry is determined by the closest preceding base
address selection entry (see below) in the same range list. If there is no such selection entry, then
the applicable base address defaults to the base address of the compilation unit (see Section
3.1.1).

Should BaseAddress and SectionIndex be optional then ? Then code can be like following probably:

DWARFAddressRangesVector
DWARFDebugRangeList::getAbsoluteRanges(Optional<uint64_t> BaseAddress,
                                        Optional<uint64_t> BaseSectionIndex) const {
  DWARFAddressRangesVector Res;
  for (const RangeListEntry &RLE : Entries) {
    if (RLE.isBaseAddressSelectionEntry(AddressSize)) {
      BaseAddress = RLE.EndAddress;
      BaseSectionIndex = RLE.SectionIndex;
      continue;
    } 

    uint64_t Start = RLE.StartAddress;
    uint64_t End = RLE.EndAddress;
    uint64_t SectionIndex = RLE.SectionIndex;
    if (BaseAddress) {
      Start  += *BaseAddress;
      End += *BaseAddress;
      SectionIndex = *BaseSectionIndex;
    }

    Res.push_back({Start, End, SectionIndex });
  }
}
grimar added a comment.Aug 3 2017, 7:02 AM

After debugging I found place which looks will not allow this patch to work properly.
See. Output from testcase you provided is:

	.section	.debug_ranges,"",@progbits
.Ldebug_ranges0:
	.quad	-1
	.quad	.Lfunc_begin0
	.quad	.Lfunc_begin0-.Lfunc_begin0
	.quad	.Lfunc_end0-.Lfunc_begin0
	.quad	.Lfunc_begin2-.Lfunc_begin0
	.quad	.Lfunc_end2-.Lfunc_begin0
	.quad	0
	.quad	0

And here is relocations produced:

Relocation section '.rela.debug_ranges' at offset 0x4c8 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000008  000200000001 R_X86_64_64       0000000000000000 .text + 0

Noticable thing here is that single relocation produced has offset 8. That points on a second quad
of base address entry of .debug_ranges. But problem is that code that takes SectionIndex assumed
that relocation pointing on a StartAddress (LowPC, first quad) will exist, so code takes section index from it
and ignores section index from second quad:

bool DWARFDebugRangeList::extract(const DWARFDataExtractor &data,
                                  uint32_t *offset_ptr) {
...
    entry.StartAddress =
        data.getRelocatedAddress(offset_ptr, &entry.SectionIndex);
    entry.EndAddress = data.getRelocatedAddress(offset_ptr);

What I think we should do here is either take section index from relocation pointing on EndAddress(HighPC), or
what is probably more preferable, take it from both HighPC/LowPC, and check that section index is equal for both
or not exist for both or exist for only one of addresses.

D36270 addresses issue from my last comment and adds testcases for section index API.

A testcase would be great.

grimar added inline comments.Aug 28 2017, 8:15 AM
lib/DebugInfo/DWARF/DWARFUnit.cpp
247

I debugged this patch today and tried to use Optional<BaseAddr> here for setBaseAddress,
where BaseAddr was a struct for address and section index pair:

struct BaseAddress {
  uint64_t Address;
  uint64_t SectionIndex;
};

But I faced following issues when
run tests from llvm\test\DebugInfo\X86 and I am not sure how to resolve them correctly.

  1. I had to implement setBaseAddress in next way:
void setBaseAddress(BaseAddress BaseAddr) {
  if (!BaseAddr.Address && BaseAddr.SectionIndex == -1ULL)
    return;
  assert(BaseAddr.SectionIndex != -1ULL);
  this->BaseAddr = BaseAddr;
}

I can not justify first 2 lines for myself though. it looks that is is common that CU has DW_AT_low_pc set to 0
without corresponding relocation.

For example I can take following code and invocation:

# clang test.cpp -S -o test.s -gmlt -ffunction-sections
# test.cpp:
#   void foo1() { }  
#   void foo2() { }

And .debug_info section will be:

	.section	.debug_info,"",@progbits
.Lcu_begin0:
......
	.long	.Linfo_string2          # DW_AT_comp_dir
	.quad	0                       # DW_AT_low_pc
	.long	.Ldebug_ranges0         # DW_AT_ranges

So DW_AT_low_pc will present in CU, but be zero and have no relocation.

Can we safely filter such cases out, like I did in my version of setBaseAddress above ?
Not sure why DW_AT_low_pc is ever emited then, it looks useless.

  1. Even with above change, there is still testcase that fails my new assertion assert(BaseAddr.SectionIndex != -1ULL);.

It is DebugInfo/X86/stmt-list-multiple-compile-units.ll.
It has DW_AT_low_pc == 0x10 but still no corresponding relocation and so it can not find the section index.

That happens because DWARFObjInMemory has following code that skips scanning relocations:

// In Mach-o files, the relocations do not need to be applied if
 // there is no load offset to apply. The value read at the
 // relocation point already factors in the section address
 // (actually applying the relocations will produce wrong results
 // as the section address will be added twice).
 if (!L && isa<MachOObjectFile>(&Obj))
   continue;

So relocations are in the file, but we do not read them at all for MachO. And unable to get section index so far.
I guess correct way would be to enable scanning and storing them, but still skip applying somehow.
Does it make sence ?

FWIW, I uploaded what I have here: D37214. It passes all tests now (assertion I mentioned is removed).

dblaikie abandoned this revision.Aug 30 2017, 10:47 AM