This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/DebugInfo/DWARF/
-
llvm/
-
DebugInfo/
-
DWARF/
-
DWARFDebugRangeList.h
-
DWARFUnit.h
-
lib/DebugInfo/DWARF/
-
DebugInfo/
-
DWARF/
1
DWARFDebugRangeList.cpp
-
DWARFDie.cpp
1
DWARFUnit.cpp

Differential D36097

Prototype fix for lld DWARF parsing of base address selection entries in range lists
AbandonedPublic

Authored by dblaikie on Jul 31 2017, 11:53 AM.

Download Raw Diff

Details

Reviewers

grimar
• rafael

Summary

A recent improvement to LLVM produces DWARF compliant, but (apparently)
infrequently used range lists - using base address selection entries to
reduce the number of relocations required (& reduce file size &
hopefully link time)

This breaks lld's use of LLVM's libDebugInfo which isn't correctly
tracking the section index in the presence of this kind of debug info.

Here's my first blush at a fix for libDebugInfo - though it does rais
some questions and needs testing (is there any testing in LLVM (not lld)
for the section index API in libDebugInfo? it'd be good if there was,
maybe API level testing).

How should/does this differentiate between the case where the unit's
base address is actually zero? compared to when it's not present? Does
that matter? I'm not sure.

(aside: does anyone have an idea of how well LLD scales with the number of
relocations versus the number of bytes of data in a section? I'd love to know
how much this reduction in relocations is worth (like is runtime of the linker
roughly N*relocs + M*raw bytes? Could be interesting to know - obviously
varying on different configurations (cores, disk speed, etc))

Diff Detail

Build Status

Buildable 8785
Build 8785: arc lint + arc unit

Event Timeline

dblaikie created this revision.Jul 31 2017, 11:53 AM

Herald added a subscriber: aprantl. · View Herald TranscriptJul 31 2017, 11:53 AM

A recent improvement to LLVM produces DWARF compliant, but (apparently)
infrequently used range lists - using base address selection entries to
reduce the number of relocations required (& reduce file size &
hopefully link time)

This breaks lld's use of LLVM's libDebugInfo which isn't correctly
tracking the section index in the presence of this kind of debug info.

Do you have any sample code/testcase/anything that shows/can help to reproduce the issue ?
(looking at implementation I think I know how to generate DWARF section content manually, but I
am not sure what should I do to reproduce issue using cpp+clang for example).

Here's my first blush at a fix for libDebugInfo - though it does rais
some questions and needs testing (is there any testing in LLVM (not lld)
for the section index API in libDebugInfo? it'd be good if there was,
maybe API level testing).

I do not think we have. That looks to be my fault, I don't think I added
LLVM testcase when implemented section index API.
I'll check what can I do for it.

lib/DebugInfo/DWARF/DWARFDebugRangeList.cpp
75	How should/does this differentiate between the case where the unit's base address is actually zero? compared to when it's not present? Does that matter? I'm not sure. So DWARF spec says: The applicable base address of a range list entry is determined by the closest preceding base address selection entry (see below) in the same range list. If there is no such selection entry, then the applicable base address defaults to the base address of the compilation unit (see Section 3.1.1). Should `BaseAddress` and `SectionIndex` be optional then ? Then code can be like following probably: DWARFAddressRangesVector DWARFDebugRangeList::getAbsoluteRanges(Optional<uint64_t> BaseAddress, Optional<uint64_t> BaseSectionIndex) const { DWARFAddressRangesVector Res; for (const RangeListEntry &RLE : Entries) { if (RLE.isBaseAddressSelectionEntry(AddressSize)) { BaseAddress = RLE.EndAddress; BaseSectionIndex = RLE.SectionIndex; continue; } uint64_t Start = RLE.StartAddress; uint64_t End = RLE.EndAddress; uint64_t SectionIndex = RLE.SectionIndex; if (BaseAddress) { Start += BaseAddress; End += BaseAddress; SectionIndex = *BaseSectionIndex; } Res.push_back({Start, End, SectionIndex }); } }

After debugging I found place which looks will not allow this patch to work properly.
See. Output from testcase you provided is:

	.section	.debug_ranges,"",@progbits
.Ldebug_ranges0:
	.quad	-1
	.quad	.Lfunc_begin0
	.quad	.Lfunc_begin0-.Lfunc_begin0
	.quad	.Lfunc_end0-.Lfunc_begin0
	.quad	.Lfunc_begin2-.Lfunc_begin0
	.quad	.Lfunc_end2-.Lfunc_begin0
	.quad	0
	.quad	0

And here is relocations produced:

Relocation section '.rela.debug_ranges' at offset 0x4c8 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000008  000200000001 R_X86_64_64       0000000000000000 .text + 0

Noticable thing here is that single relocation produced has offset 8. That points on a second quad
of base address entry of .debug_ranges. But problem is that code that takes SectionIndex assumed
that relocation pointing on a StartAddress (LowPC, first quad) will exist, so code takes section index from it
and ignores section index from second quad:

bool DWARFDebugRangeList::extract(const DWARFDataExtractor &data,
                                  uint32_t *offset_ptr) {
...
    entry.StartAddress =
        data.getRelocatedAddress(offset_ptr, &entry.SectionIndex);
    entry.EndAddress = data.getRelocatedAddress(offset_ptr);

What I think we should do here is either take section index from relocation pointing on EndAddress(HighPC), or
what is probably more preferable, take it from both HighPC/LowPC, and check that section index is equal for both
or not exist for both or exist for only one of addresses.

D36270 addresses issue from my last comment and adds testcases for section index API.

A testcase would be great.

grimar mentioned this in D37214: Another prototype fix for lld DWARF parsing of base address selection entries in range lists.Aug 28 2017, 8:12 AM

grimar added inline comments.Aug 28 2017, 8:15 AM

lib/DebugInfo/DWARF/DWARFUnit.cpp
247	I debugged this patch today and tried to use `Optional<BaseAddr>` here for `setBaseAddress`, where `BaseAddr` was a struct for address and section index pair: struct BaseAddress { uint64_t Address; uint64_t SectionIndex; }; But I faced following issues when run tests from `llvm\test\DebugInfo\X86` and I am not sure how to resolve them correctly. I had to implement `setBaseAddress` in next way: void setBaseAddress(BaseAddress BaseAddr) { if (!BaseAddr.Address && BaseAddr.SectionIndex == -1ULL) return; assert(BaseAddr.SectionIndex != -1ULL); this->BaseAddr = BaseAddr; } I can not justify first 2 lines for myself though. it looks that is is common that CU has `DW_AT_low_pc` set to 0 without corresponding relocation. For example I can take following code and invocation: # clang test.cpp -S -o test.s -gmlt -ffunction-sections # test.cpp: # void foo1() { } # void foo2() { } And .debug_info section will be: .section .debug_info,"",@progbits .Lcu_begin0: ...... .long .Linfo_string2 # DW_AT_comp_dir .quad 0 # DW_AT_low_pc .long .Ldebug_ranges0 # DW_AT_ranges So DW_AT_low_pc will present in CU, but be zero and have no relocation. Can we safely filter such cases out, like I did in my version of `setBaseAddress` above ? Not sure why DW_AT_low_pc is ever emited then, it looks useless. Even with above change, there is still testcase that fails my new assertion `assert(BaseAddr.SectionIndex != -1ULL);`. It is `DebugInfo/X86/stmt-list-multiple-compile-units.ll`. It has `DW_AT_low_pc == 0x10` but still no corresponding relocation and so it can not find the section index. That happens because `DWARFObjInMemory` has following code that skips scanning relocations: // In Mach-o files, the relocations do not need to be applied if // there is no load offset to apply. The value read at the // relocation point already factors in the section address // (actually applying the relocations will produce wrong results // as the section address will be added twice). if (!L && isa<MachOObjectFile>(&Obj)) continue; So relocations are in the file, but we do not read them at all for MachO. And unable to get section index so far. I guess correct way would be to enable scanning and storing them, but still skip applying somehow. Does it make sence ? FWIW, I uploaded what I have here: D37214. It passes all tests now (assertion I mentioned is removed).

grimar mentioned this in D37297: [ELF] - Add testcase testing .gdb_index generation when base address of CU is used..Aug 30 2017, 7:50 AM

dblaikie abandoned this revision.Aug 30 2017, 10:47 AM

grimar mentioned this in rL312477: [DebugInfo] - Fix for lld DWARF parsing of base address selection entries in….Sep 4 2017, 3:31 AM

Revision Contents

Path

Size

include/

llvm/

DebugInfo/

DWARF/

DWARFDebugRangeList.h

3 lines

DWARFUnit.h

7 lines

lib/

DebugInfo/

DWARF/

DWARFDebugRangeList.cpp

7 lines

DWARFDie.cpp

3 lines

DWARFUnit.cpp

6 lines

Diff 108965

include/llvm/DebugInfo/DWARF/DWARFDebugRangeList.h

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	public:
void clear();		void clear();
void dump(raw_ostream &OS) const;		void dump(raw_ostream &OS) const;
bool extract(const DWARFDataExtractor &data, uint32_t *offset_ptr);		bool extract(const DWARFDataExtractor &data, uint32_t *offset_ptr);
const std::vector<RangeListEntry> &getEntries() { return Entries; }		const std::vector<RangeListEntry> &getEntries() { return Entries; }

/// getAbsoluteRanges - Returns absolute address ranges defined by this range		/// getAbsoluteRanges - Returns absolute address ranges defined by this range
/// list. Has to be passed base address of the compile unit referencing this		/// list. Has to be passed base address of the compile unit referencing this
/// range list.		/// range list.
DWARFAddressRangesVector getAbsoluteRanges(uint64_t BaseAddress) const;		DWARFAddressRangesVector getAbsoluteRanges(uint64_t BaseAddress,
		uint64_t SectionIndex) const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_DEBUGINFO_DWARF_DWARFDEBUGRANGELIST_H		#endif // LLVM_DEBUGINFO_DWARF_DWARFDEBUGRANGELIST_H

include/llvm/DebugInfo/DWARF/DWARFUnit.h

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	class DWARFUnit {
// Version, address size, and DWARF format.		// Version, address size, and DWARF format.
DWARFFormParams FormParams;		DWARFFormParams FormParams;

uint32_t Offset;		uint32_t Offset;
uint32_t Length;		uint32_t Length;
const DWARFAbbreviationDeclarationSet *Abbrevs;		const DWARFAbbreviationDeclarationSet *Abbrevs;
uint8_t UnitType;		uint8_t UnitType;
uint64_t BaseAddr;		uint64_t BaseAddr;
		uint64_t BaseAddrSectionIndex;
/// The compile unit debug information entry items.		/// The compile unit debug information entry items.
std::vector<DWARFDebugInfoEntry> DieArray;		std::vector<DWARFDebugInfoEntry> DieArray;

/// Map from range's start address to end address and corresponding DIE.		/// Map from range's start address to end address and corresponding DIE.
/// IntervalMap does not support range removal, as a result, we use the		/// IntervalMap does not support range removal, as a result, we use the
/// std::map::upper_bound for address range lookup.		/// std::map::upper_bound for address range lookup.
std::map<uint64_t, std::pair<uint64_t, DWARFDie>> AddrDieMap;		std::map<uint64_t, std::pair<uint64_t, DWARFDie>> AddrDieMap;

▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	static uint32_t getDWARF5HeaderSize(uint8_t UnitType) {
case dwarf::DW_UT_type:		case dwarf::DW_UT_type:
case dwarf::DW_UT_split_type:		case dwarf::DW_UT_split_type:
return 24;		return 24;
}		}
llvm_unreachable("Invalid UnitType.");		llvm_unreachable("Invalid UnitType.");
}		}

uint64_t getBaseAddress() const { return BaseAddr; }		uint64_t getBaseAddress() const { return BaseAddr; }
		uint64_t getBaseAddressSectionIndex() const { return BaseAddrSectionIndex; }

void setBaseAddress(uint64_t base_addr) {		void setBaseAddress(uint64_t BaseAddr, uint64_t SectionIndex) {
BaseAddr = base_addr;		this->BaseAddr = BaseAddr;
		this->BaseAddrSectionIndex = SectionIndex;
}		}

DWARFDie getUnitDIE(bool ExtractUnitDIEOnly = true) {		DWARFDie getUnitDIE(bool ExtractUnitDIEOnly = true) {
extractDIEsIfNeeded(ExtractUnitDIEOnly);		extractDIEsIfNeeded(ExtractUnitDIEOnly);
if (DieArray.empty())		if (DieArray.empty())
return DWARFDie();		return DWARFDie();
return DWARFDie(this, &DieArray[0]);		return DWARFDie(this, &DieArray[0]);
}		}
▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

lib/DebugInfo/DWARF/DWARFDebugRangeList.cpp

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	const char *format_str = (AddressSize == 4
? "%08x %08" PRIx64 " %08" PRIx64 "\n"		? "%08x %08" PRIx64 " %08" PRIx64 "\n"
: "%08x %016" PRIx64 " %016" PRIx64 "\n");		: "%08x %016" PRIx64 " %016" PRIx64 "\n");
OS << format(format_str, Offset, RLE.StartAddress, RLE.EndAddress);		OS << format(format_str, Offset, RLE.StartAddress, RLE.EndAddress);
}		}
OS << format("%08x <End of list>\n", Offset);		OS << format("%08x <End of list>\n", Offset);
}		}

DWARFAddressRangesVector		DWARFAddressRangesVector
DWARFDebugRangeList::getAbsoluteRanges(uint64_t BaseAddress) const {		DWARFDebugRangeList::getAbsoluteRanges(uint64_t BaseAddress,
		uint64_t SectionIndex) const {
DWARFAddressRangesVector Res;		DWARFAddressRangesVector Res;
for (const RangeListEntry &RLE : Entries) {		for (const RangeListEntry &RLE : Entries) {
if (RLE.isBaseAddressSelectionEntry(AddressSize)) {		if (RLE.isBaseAddressSelectionEntry(AddressSize)) {
BaseAddress = RLE.EndAddress;		BaseAddress = RLE.EndAddress;
		SectionIndex = RLE.SectionIndex;
} else {		} else {
Res.push_back({BaseAddress + RLE.StartAddress,		Res.push_back({BaseAddress + RLE.StartAddress,
BaseAddress + RLE.EndAddress, RLE.SectionIndex});		BaseAddress + RLE.EndAddress,
		BaseAddress ? SectionIndex : RLE.SectionIndex});
		grimarUnsubmitted Not Done Reply Inline Actions How should/does this differentiate between the case where the unit's base address is actually zero? compared to when it's not present? Does that matter? I'm not sure. So DWARF spec says: The applicable base address of a range list entry is determined by the closest preceding base address selection entry (see below) in the same range list. If there is no such selection entry, then the applicable base address defaults to the base address of the compilation unit (see Section 3.1.1). Should `BaseAddress` and `SectionIndex` be optional then ? Then code can be like following probably: DWARFAddressRangesVector DWARFDebugRangeList::getAbsoluteRanges(Optional<uint64_t> BaseAddress, Optional<uint64_t> BaseSectionIndex) const { DWARFAddressRangesVector Res; for (const RangeListEntry &RLE : Entries) { if (RLE.isBaseAddressSelectionEntry(AddressSize)) { BaseAddress = RLE.EndAddress; BaseSectionIndex = RLE.SectionIndex; continue; } uint64_t Start = RLE.StartAddress; uint64_t End = RLE.EndAddress; uint64_t SectionIndex = RLE.SectionIndex; if (BaseAddress) { Start += BaseAddress; End += BaseAddress; SectionIndex = BaseSectionIndex; } Res.push_back({Start, End, SectionIndex }); } } grimar:* >How should/does this differentiate between the case where the unit's >base address is actually…
}		}
}		}
return Res;		return Res;
}		}

lib/DebugInfo/DWARF/DWARFDie.cpp

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	DWARFDie::getAddressRanges() const {
if (getLowAndHighPC(LowPC, HighPC, Index))		if (getLowAndHighPC(LowPC, HighPC, Index))
return {{LowPC, HighPC, Index}};		return {{LowPC, HighPC, Index}};

// Multiple ranges from .debug_ranges section.		// Multiple ranges from .debug_ranges section.
auto RangesOffset = toSectionOffset(find(DW_AT_ranges));		auto RangesOffset = toSectionOffset(find(DW_AT_ranges));
if (RangesOffset) {		if (RangesOffset) {
DWARFDebugRangeList RangeList;		DWARFDebugRangeList RangeList;
if (U->extractRangeList(*RangesOffset, RangeList))		if (U->extractRangeList(*RangesOffset, RangeList))
return RangeList.getAbsoluteRanges(U->getBaseAddress());		return RangeList.getAbsoluteRanges(U->getBaseAddress(),
		U->getBaseAddressSectionIndex());
}		}
return DWARFAddressRangesVector();		return DWARFAddressRangesVector();
}		}

void		void
DWARFDie::collectChildrenAddressRanges(DWARFAddressRangesVector& Ranges) const {		DWARFDie::collectChildrenAddressRanges(DWARFAddressRangesVector& Ranges) const {
if (isNULL())		if (isNULL())
return;		return;
▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

lib/DebugInfo/DWARF/DWARFUnit.cpp

Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	size_t DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) {
extractDIEsToVector(!HasCUDie, !CUDieOnly, DieArray);		extractDIEsToVector(!HasCUDie, !CUDieOnly, DieArray);

if (DieArray.empty())		if (DieArray.empty())
return 0;		return 0;

// If CU DIE was just parsed, copy several attribute values from it.		// If CU DIE was just parsed, copy several attribute values from it.
if (!HasCUDie) {		if (!HasCUDie) {
DWARFDie UnitDie = getUnitDIE();		DWARFDie UnitDie = getUnitDIE();
auto BaseAddr = toAddress(UnitDie.find({DW_AT_low_pc, DW_AT_entry_pc}));		const auto &LowPCDie = UnitDie.find({DW_AT_low_pc, DW_AT_entry_pc});
if (BaseAddr)		if (auto BaseAddr = toAddress(LowPCDie))
setBaseAddress(*BaseAddr);		setBaseAddress(*BaseAddr, LowPCDie->getSectionIndex());
		grimarUnsubmitted Not Done Reply Inline Actions I debugged this patch today and tried to use `Optional<BaseAddr>` here for `setBaseAddress`, where `BaseAddr` was a struct for address and section index pair: struct BaseAddress { uint64_t Address; uint64_t SectionIndex; }; But I faced following issues when run tests from `llvm\test\DebugInfo\X86` and I am not sure how to resolve them correctly. I had to implement `setBaseAddress` in next way: void setBaseAddress(BaseAddress BaseAddr) { if (!BaseAddr.Address && BaseAddr.SectionIndex == -1ULL) return; assert(BaseAddr.SectionIndex != -1ULL); this->BaseAddr = BaseAddr; } I can not justify first 2 lines for myself though. it looks that is is common that CU has `DW_AT_low_pc` set to 0 without corresponding relocation. For example I can take following code and invocation: # clang test.cpp -S -o test.s -gmlt -ffunction-sections # test.cpp: # void foo1() { } # void foo2() { } And .debug_info section will be: .section .debug_info,"",@progbits .Lcu_begin0: ...... .long .Linfo_string2 # DW_AT_comp_dir .quad 0 # DW_AT_low_pc .long .Ldebug_ranges0 # DW_AT_ranges So DW_AT_low_pc will present in CU, but be zero and have no relocation. Can we safely filter such cases out, like I did in my version of `setBaseAddress` above ? Not sure why DW_AT_low_pc is ever emited then, it looks useless. Even with above change, there is still testcase that fails my new assertion `assert(BaseAddr.SectionIndex != -1ULL);`. It is `DebugInfo/X86/stmt-list-multiple-compile-units.ll`. It has `DW_AT_low_pc == 0x10` but still no corresponding relocation and so it can not find the section index. That happens because `DWARFObjInMemory` has following code that skips scanning relocations: // In Mach-o files, the relocations do not need to be applied if // there is no load offset to apply. The value read at the // relocation point already factors in the section address // (actually applying the relocations will produce wrong results // as the section address will be added twice). if (!L && isa<MachOObjectFile>(&Obj)) continue; So relocations are in the file, but we do not read them at all for MachO. And unable to get section index so far. I guess correct way would be to enable scanning and storing them, but still skip applying somehow. Does it make sence ? FWIW, I uploaded what I have here: D37214. It passes all tests now (assertion I mentioned is removed). grimar: I debugged this patch today and tried to use `Optional<BaseAddr>` here for `setBaseAddress`…
AddrOffsetSectionBase = toSectionOffset(UnitDie.find(DW_AT_GNU_addr_base), 0);		AddrOffsetSectionBase = toSectionOffset(UnitDie.find(DW_AT_GNU_addr_base), 0);
RangeSectionBase = toSectionOffset(UnitDie.find(DW_AT_rnglists_base), 0);		RangeSectionBase = toSectionOffset(UnitDie.find(DW_AT_rnglists_base), 0);

// In general, we derive the offset of the unit's contibution to the		// In general, we derive the offset of the unit's contibution to the
// debug_str_offsets{.dwo} section from the unit DIE's		// debug_str_offsets{.dwo} section from the unit DIE's
// DW_AT_str_offsets_base attribute. In dwp files we add to it the offset		// DW_AT_str_offsets_base attribute. In dwp files we add to it the offset
// we get from the index table.		// we get from the index table.
StringOffsetSectionBase =		StringOffsetSectionBase =
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines