This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Introduce GdbIndexBuilderDWARFContent
AbandonedPublic

Authored by grimar on Mar 23 2017, 10:54 AM.

Download Raw Diff

Details

Reviewers

dblaikie
friss
echristo

Summary

This patch continues work on speed up --gdb-index LLD option started in D31136.

In this patch I suggest to create separate class GdbIndexBuilderDWARFContent for parsing content required
to build .gdb_index section.

That has next benefits:

Applying relocations is slow. Building .gdb_index seems requires resolving relocations

only in .debug_info section (for futher collectAddressRanges() call). We dont have to proccess any
other relocations I think. It easy to filter and optimize code separatelly for this case without overcomplicating the logic.

This patch implements next optimization: only relocatons against symbols in allocatable sections

are handled. That allows to properly collect address ranges and skip all useless relocations in a simple way.
At first I though about adding option to LoadedObjectInfo for that, but it looks very specific feature and easier
to implement it in GdbIndexBuilderDWARFContent instead.

It also opens road to pass already decompressed sections to parser. Since LLD linker decompresses sections on his side,

it will be convinent to adapt implementation of GdbIndexBuilderDWARFContent to pass decomressed sections instead object::ObjectFile.
That looks not needed for generic parser and would overcomplicate it.

Creating GdbIndexBuilderDWARFContent allows to move implementation of .gdb_index builder code from LLD to DWARF

parser side, what allows futher reuse.

Patch also implements optimization of relocations processing, like D31136 did.
It is simple now and does not touch main parsers logic.

Number I got when linked llc using LLD with --gdb-index binary are:

Without this patch (average, 5 runs):

25365,272594      task-clock (msec)         #    1,312 CPUs utilized            ( +-  0,82% )
            16 827      context-switches          #    0,663 K/sec                    ( +-  2,26% )
               893      cpu-migrations            #    0,035 K/sec                    ( +-  8,21% )
           695 748      page-faults               #    0,027 M/sec                    ( +-  0,03% )
      19,330233261 seconds time elapsed

With this patch (average, 5 runs):

15865,250347      task-clock (msec)         #    1,554 CPUs utilized            ( +-  0,06% )
      15 830      context-switches          #    0,998 K/sec                    ( +-  0,93% )
         963      cpu-migrations            #    0,061 K/sec                    ( +-  4,79% )
     696 143      page-faults               #    0,044 M/sec                    ( +-  0,05% )
10,209859902 seconds time elapsed                                          ( +-  0,21% )

That makes --gdb-index almost 2x faster for my case !

Diff Detail

Event Timeline

grimar created this revision.Mar 23 2017, 10:54 AM

Herald added subscribers: aprantl, mehdi_amini. · View Herald TranscriptMar 23 2017, 10:54 AM

Added comment.

I had to move and split some code to getSymbolAddress().
I can split this as NFC change into separate patch to make review of this one simpler if whole idea is OK.

As a thought experiment, if you want to make 20x faster, would you use this existing DWARF parser? I'm wondering if we need a ground-up approach. 2x is an awesome speedup, but we are probably 2x slower than the gold, so that is not a very high bar.

In D31296#709045, @ruiu wrote:

As a thought experiment, if you want to make 20x faster, would you use this existing DWARF parser? I'm wondering if we need a ground-up approach. 2x is an awesome speedup, but we are probably 2x slower than the gold, so that is not a very high bar.

I am still looking at it, first impressions that is very generic and do too much error checking, even in hot places like relocations handling where it strange to see.
I hope it be more clear soon if we can speedup it 20x times and how.

I also going to try to parralelize our .gdb_index building code (not sure if gold do that or not for gdb_index.).
That should give more speedup probably. But I want to spend more time on single threaded case at first to understand weakness.

How much code do you expect you would have to write if you pull out only the information out of DWARF debug sections for the purpose of creating .gdB_index?

(By the way incremental changes are fine and welcome as long as they don't mess up the existing design of the DWARF parser. But if you are going to eventually decide not to use this class, you want to roll this back as no one else will be using this.)

In D31296#709099, @ruiu wrote:

How much code do you expect you would have to write if you pull out only the information out of DWARF debug sections for the purpose of creating .gdB_index?

I would answer "not much". If you remember my first version of .gdb_index implementation for LLD used hand written parser already.
(https://reviews.llvm.org/D24267). But there was a discussion in llvm-mails with strong suggestion to reuse existent parsers.

(By the way incremental changes are fine and welcome as long as they don't mess up the existing design of the DWARF parser. But if you are going to eventually decide not to use this class, you want to roll this back as no one else will be using this.)

I'm not sure what's the best way of doing this, but we shouldn't exclude the choice to write a special-purpose parser for .gdb_index.

By the way, I think we had a correctness issue with .gdb_index. Did you address that already? If not, you probably want to do that first, as correctness is more important than speed.

In D31296#709143, @ruiu wrote:

I'm not sure what's the best way of doing this, but we shouldn't exclude the choice to write a special-purpose parser for .gdb_index.

I can probably try to ressurect my old patch for example and check how much it is faster/slower. To have some numbers for reference.
(though to have generic fast LLVM DWARF parser is still better, right ?)

By the way, I think we had a correctness issue with .gdb_index. Did you address that already?

Not yet, will try to look tomorrow.

Regarding your old patch to parse DWARF, it seems you are reading and applying relocations in your code. Why do you need to do that? LLD applies relocations when creating debug sections. Do you compute the same values twice?

In D31296#709196, @ruiu wrote:

Regarding your old patch to parse DWARF, it seems you are reading and applying relocations in your code. Why do you need to do that? LLD applies relocations when creating debug sections. Do you compute the same values twice?

Yes, I thought about that today too. But problem is that we apply relocations very late, when doing WriteTo().
I need to calculate just some relocations to get address area entries. I need to do that early to find final .gdb_index size. I thought about next way:

Do not apply relocations and gather broken address area (just to find the amount of entries).
Write .gdb_index after all other debug sections (after LLD writes them and do relocations).
Use LoadedObjectInfo::getLoadedSectionContents() to give parsers relocated content and re-read address area (with correct entries).
Update .gdb_index content and write it.

That should work in theory, but I doubt that it is faster then resolve some alloc relocations, like I do now. And definetely not cleaner.
Without applying relocations at all (just commented them out), I had about 9 seconds link time I think (need to recheck that), that is close to 10.2 I have in last patch.
So I probably would try optimize somewhere else at first.

aprantl added a subscriber: beanz.Mar 23 2017, 1:31 PM

You can always put .gdb_index at end of file even if some layout is enforced by linker scripts. That shouldn't cause any trouble because .gdb_index is not an SHF_ALLOC section. So the fact that we don't know the size of .gdb_index during linking is not an problem.

My intuition is that doing everything after applying relocations is easier and faster than doing it in the current way. The .gdb_index is after all designed with gdb in mind so that gdb can create .gdb_index sections for existing executables and libraries by reading their debug sections. So, post-processing (i.e. doing everything after applying relocations) is definitely doable and probably straightforward.

In D31296#709243, @ruiu wrote:

You can always put .gdb_index at end of file even if some layout is enforced by linker scripts. That shouldn't cause any trouble because .gdb_index is not an SHF_ALLOC section. So the fact that we don't know the size of .gdb_index during linking is not an problem.

My intuition is that doing everything after applying relocations is easier and faster than doing it in the current way. The .gdb_index is after all designed with gdb in mind so that gdb can create .gdb_index sections for existing executables and libraries by reading their debug sections. So, post-processing (i.e. doing everything after applying relocations) is definitely doable and probably straightforward.

That probably worth to try, but again - that should will not give 10x/20x speed up you mentioned earlier.
I need to recheck numbers here, but it had about 10-15% of speedup (in compare with latest patch) when relocations just disabled I think.

D31424 makes this useless.

Revision Contents

Path

Size

include/

llvm/

DebugInfo/

DWARF/

DWARFContext.h

10 lines

lib/

DebugInfo/

DWARF/

DWARFContext.cpp

243 lines

Diff 92835

include/llvm/DebugInfo/DWARF/DWARFContext.h

Show First 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	private:
/// address.		/// address.
DWARFCompileUnit *getCompileUnitForAddress(uint64_t Address);		DWARFCompileUnit *getCompileUnitForAddress(uint64_t Address);
};		};

/// DWARFContextInMemory is the simplest possible implementation of a		/// DWARFContextInMemory is the simplest possible implementation of a
/// DWARFContext. It assumes all content is available in memory and stores		/// DWARFContext. It assumes all content is available in memory and stores
/// pointers to it.		/// pointers to it.
class DWARFContextInMemory : public DWARFContext {		class DWARFContextInMemory : public DWARFContext {
		protected:
virtual void anchor();		virtual void anchor();

bool IsLittleEndian;		bool IsLittleEndian;
uint8_t AddressSize;		uint8_t AddressSize;
DWARFSection InfoSection;		DWARFSection InfoSection;
TypeSectionMap TypesSections;		TypeSectionMap TypesSections;
StringRef AbbrevSection;		StringRef AbbrevSection;
DWARFSection LocSection;		DWARFSection LocSection;
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	StringRef getAddrSection() override {
return AddrSection;		return AddrSection;
}		}

StringRef getCUIndexSection() override { return CUIndexSection; }		StringRef getCUIndexSection() override { return CUIndexSection; }
StringRef getGdbIndexSection() override { return GdbIndexSection; }		StringRef getGdbIndexSection() override { return GdbIndexSection; }
StringRef getTUIndexSection() override { return TUIndexSection; }		StringRef getTUIndexSection() override { return TUIndexSection; }
};		};

		/// GdbIndexBuilderDWARFContent is a version of DWARFContext used
		/// for building .gdb_index sections.
		class GdbIndexBuilderDWARFContent final : public DWARFContextInMemory {
		public:
		GdbIndexBuilderDWARFContent(const object::ObjectFile &Obj,
		const LoadedObjectInfo *L);

		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_DEBUGINFO_DWARF_DWARFCONTEXT_H		#endif // LLVM_DEBUGINFO_DWARF_DWARFCONTEXT_H

lib/DebugInfo/DWARF/DWARFContext.cpp

Show First 20 Lines • Show All 610 Lines • ▼ Show 20 Lines	if (Spec.FLIKind != FileLineInfoKind::None) {
FunctionDIE.getCallerFrame(CallFile, CallLine, CallColumn);		FunctionDIE.getCallerFrame(CallFile, CallLine, CallColumn);
}		}
}		}
InliningInfo.addFrame(Frame);		InliningInfo.addFrame(Frame);
}		}
return InliningInfo;		return InliningInfo;
}		}

		static Error createError(const Twine &Reason, llvm::Error E) {
		return make_error<StringError>(Reason + toString(std::move(E)),
		inconvertibleErrorCode());
		}

		// Returns the address of symbol relocation used against. Used for futher
		// relocations computation. Symbol's section load address is taken in account if
		// LoadedObjectInfo interface is provided.
		static Expected<uint64_t> getSymbolAddress(const object::ObjectFile &Obj,
		const RelocationRef &Reloc,
		const LoadedObjectInfo *L) {
		uint64_t Ret = 0;
		object::section_iterator RSec = Obj.section_end();
		object::symbol_iterator Sym = Reloc.getSymbol();

		// First calculate the address of the symbol or section as it appears
		// in the object file
		if (Sym != Obj.symbol_end()) {
		Expected<uint64_t> SymAddrOrErr = Sym->getAddress();
		if (!SymAddrOrErr)
		return createError("error: failed to compute symbol address: ",
		SymAddrOrErr.takeError());

		// Also remember what section this symbol is in for later
		auto SectOrErr = Sym->getSection();
		if (!SectOrErr)
		return createError("error: failed to get symbol section: ",
		SectOrErr.takeError());

		RSec = *SectOrErr;
		Ret = *SymAddrOrErr;
		} else if (auto *MObj = dyn_cast<MachOObjectFile>(&Obj)) {
		RSec = MObj->getRelocationSection(Reloc.getRawDataRefImpl());
		Ret = RSec->getAddress();
		}

		// If we are given load addresses for the sections, we need to adjust:
		// SymAddr = (Address of Symbol Or Section in File) -
		// (Address of Section in File) +
		// (Load Address of Section)
		if (L != nullptr && RSec != Obj.section_end()) {
		// RSec is now either the section being targeted or the section
		// containing the symbol being targeted. In either case,
		// we need to perform the same computation.
		uint64_t SectionLoadAddress = L->getSectionLoadAddress(*RSec);
		if (SectionLoadAddress != 0)
		Ret += SectionLoadAddress - RSec->getAddress();
		}

		return Ret;
		}

		// Resolves single relocation and andds result to Map.
		static void applyRelocation(RelocAddrMap &Map, const SectionRef &Section,
		const RelocationRef &Reloc, uint64_t Addr) {
		uint64_t Address = Reloc.getOffset();
		uint64_t Type = Reloc.getType();

		object::RelocVisitor V(*Section.getObject());
		object::RelocToApply R(V.visit(Type, Reloc, Addr));
		if (V.error()) {
		SmallString<32> Name;
		Reloc.getTypeName(Name);
		errs() << "error: failed to compute relocation: " << Name << "\n";
		return;
		}

		section_iterator RelocatedSection = Section.getRelocatedSection();
		uint64_t SectionSize = RelocatedSection->getSize();
		if (Address + R.Width > SectionSize) {
		StringRef Name;
		Section.getName(Name);
		errs() << "error: " << R.Width << "-byte relocation starting " << Address
		<< " bytes into section " << Name << " which is " << SectionSize
		<< " bytes long.\n";
		return;
		}
		if (R.Width > 8) {
		errs() << "error: can't handle a relocation of more than 8 bytes at "
		"a time.\n";
		return;
		}
		DEBUG(dbgs() << "Writing " << format("%p", R.Value) << " at "
		<< format("%p", Address) << " with width "
		<< format("%d", R.Width) << "\n");
		Map.insert(std::make_pair(Address, std::make_pair(R.Width, R.Value)));
		return;
		}

DWARFContextInMemory::DWARFContextInMemory(const object::ObjectFile &Obj,		DWARFContextInMemory::DWARFContextInMemory(const object::ObjectFile &Obj,
const LoadedObjectInfo *L)		const LoadedObjectInfo *L)
: IsLittleEndian(Obj.isLittleEndian()),		: IsLittleEndian(Obj.isLittleEndian()),
AddressSize(Obj.getBytesInAddress()) {		AddressSize(Obj.getBytesInAddress()) {
for (const SectionRef &Section : Obj.sections()) {		for (const SectionRef &Section : Obj.sections()) {
StringRef name;		StringRef name;
Section.getName(name);		Section.getName(name);
// Skip BSS and Virtual sections, they aren't interesting.		// Skip BSS and Virtual sections, they aren't interesting.
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	if (!Map) {
Map = &TypesSections[*RelocatedSection].Relocs;		Map = &TypesSections[*RelocatedSection].Relocs;
else if (RelSecName == "debug_types.dwo")		else if (RelSecName == "debug_types.dwo")
Map = &TypesDWOSections[*RelocatedSection].Relocs;		Map = &TypesDWOSections[*RelocatedSection].Relocs;
else		else
continue;		continue;
}		}

if (Section.relocation_begin() != Section.relocation_end()) {		if (Section.relocation_begin() != Section.relocation_end()) {
uint64_t SectionSize = RelocatedSection->getSize();
for (const RelocationRef &Reloc : Section.relocations()) {		for (const RelocationRef &Reloc : Section.relocations()) {
uint64_t Address = Reloc.getOffset();		Expected<uint64_t> SymAddrOrErr = getSymbolAddress(Obj, Reloc, L);
uint64_t Type = Reloc.getType();
uint64_t SymAddr = 0;
uint64_t SectionLoadAddress = 0;
object::symbol_iterator Sym = Reloc.getSymbol();
object::section_iterator RSec = Obj.section_end();

// First calculate the address of the symbol or section as it appears
// in the objct file
if (Sym != Obj.symbol_end()) {
Expected<uint64_t> SymAddrOrErr = Sym->getAddress();
if (!SymAddrOrErr) {		if (!SymAddrOrErr) {
std::string Buf;		errs() << toString(std::move(SymAddrOrErr.takeError())) << '\n';
raw_string_ostream OS(Buf);
logAllUnhandledErrors(SymAddrOrErr.takeError(), OS, "");
OS.flush();
errs() << "error: failed to compute symbol address: "
<< Buf << '\n';
continue;
}
SymAddr = *SymAddrOrErr;
// Also remember what section this symbol is in for later
auto SectOrErr = Sym->getSection();
if (!SectOrErr) {
std::string Buf;
raw_string_ostream OS(Buf);
logAllUnhandledErrors(SectOrErr.takeError(), OS, "");
OS.flush();
errs() << "error: failed to get symbol section: "
<< Buf << '\n';
continue;		continue;
}		}
RSec = *SectOrErr;		applyRelocation(Map, Section, Reloc, SymAddrOrErr);
} else if (auto *MObj = dyn_cast<MachOObjectFile>(&Obj)) {
// MachO also has relocations that point to sections and
// scattered relocations.
auto RelocInfo = MObj->getRelocation(Reloc.getRawDataRefImpl());
if (MObj->isRelocationScattered(RelocInfo)) {
// FIXME: it's not clear how to correctly handle scattered
// relocations.
continue;
} else {
RSec = MObj->getRelocationSection(Reloc.getRawDataRefImpl());
SymAddr = RSec->getAddress();
}
}

// If we are given load addresses for the sections, we need to adjust:
// SymAddr = (Address of Symbol Or Section in File) -
// (Address of Section in File) +
// (Load Address of Section)
if (L != nullptr && RSec != Obj.section_end()) {
// RSec is now either the section being targeted or the section
// containing the symbol being targeted. In either case,
// we need to perform the same computation.
StringRef SecName;
RSec->getName(SecName);
// dbgs() << "Name: '" << SecName
// << "', RSec: " << RSec->getRawDataRefImpl()
// << ", Section: " << Section.getRawDataRefImpl() << "\n";
SectionLoadAddress = L->getSectionLoadAddress(*RSec);
if (SectionLoadAddress != 0)
SymAddr += SectionLoadAddress - RSec->getAddress();
}

object::RelocVisitor V(Obj);
object::RelocToApply R(V.visit(Type, Reloc, SymAddr));
if (V.error()) {
SmallString<32> Name;
Reloc.getTypeName(Name);
errs() << "error: failed to compute relocation: "
<< Name << "\n";
continue;
}

if (Address + R.Width > SectionSize) {
errs() << "error: " << R.Width << "-byte relocation starting "
<< Address << " bytes into section " << name << " which is "
<< SectionSize << " bytes long.\n";
continue;
}
if (R.Width > 8) {
errs() << "error: can't handle a relocation of more than 8 bytes at "
"a time.\n";
continue;
}
DEBUG(dbgs() << "Writing " << format("%p", R.Value)
<< " at " << format("%p", Address)
<< " with width " << format("%d", R.Width)
<< "\n");
Map->insert(std::make_pair(Address, std::make_pair(R.Width, R.Value)));
}		}
}		}
}		}
}		}

DWARFContextInMemory::DWARFContextInMemory(		DWARFContextInMemory::DWARFContextInMemory(
const StringMap<std::unique_ptr<MemoryBuffer>> &Sections, uint8_t AddrSize,		const StringMap<std::unique_ptr<MemoryBuffer>> &Sections, uint8_t AddrSize,
bool isLittleEndian)		bool isLittleEndian)
Show All 35 Lines	return StringSwitch<StringRef *>(Name)
.Case("debug_cu_index", &CUIndexSection)		.Case("debug_cu_index", &CUIndexSection)
.Case("debug_tu_index", &TUIndexSection)		.Case("debug_tu_index", &TUIndexSection)
.Case("gdb_index", &GdbIndexSection)		.Case("gdb_index", &GdbIndexSection)
// Any more debug info sections go here.		// Any more debug info sections go here.
.Default(nullptr);		.Default(nullptr);
}		}

void DWARFContextInMemory::anchor() {}		void DWARFContextInMemory::anchor() {}

		GdbIndexBuilderDWARFContent::GdbIndexBuilderDWARFContent(
		const object::ObjectFile &Obj, const LoadedObjectInfo *L)
		: DWARFContextInMemory({}, Obj.getBytesInAddress(), Obj.isLittleEndian()) {

		for (const SectionRef &Section : Obj.sections()) {
		StringRef Name;
		Section.getName(Name);

		// Compressed sections names in GNU style starts from ".z".
		// Skip ".", "z" and "_" prefixes.
		Name = Name.substr(Name.find_first_not_of("._z"));
		StringRef SectionData = StringSwitch<StringRef >(Name)
		.Case("debug_info", &InfoSection.Data)
		.Case("debug_abbrev", &AbbrevSection)
		.Case("debug_gnu_pubnames", &GnuPubNamesSection)
		.Case("debug_gnu_pubtypes", &GnuPubTypesSection)
		.Default(nullptr);

		if (SectionData)
		Section.getContents(*SectionData);

		section_iterator RelocatedSection = Section.getRelocatedSection();
		if (RelocatedSection == Obj.section_end())
		continue;

		StringRef RelSecName;
		RelocatedSection->getName(RelSecName);
		// Skip . and _ prefixes.
		RelSecName = RelSecName.substr(RelSecName.find_first_not_of("._"));
		if (RelSecName != "debug_info")
		continue;

		if (Section.relocation_begin() != Section.relocation_end()) {
		uint64_t SymAddr = 0;
		object::symbol_iterator Sym = Obj.symbol_end();

		for (const RelocationRef &Reloc : Section.relocations()) {
		auto &Sec = static_cast<const object::ELFSectionRef &>(
		**Reloc.getSymbol()->getSection());
		if ((Sec.getFlags() & ELF::SHF_ALLOC) == 0)
		continue;

		object::symbol_iterator RelSym = Reloc.getSymbol();
		if (Sym == Obj.symbol_end() \|\| Sym != RelSym) {
		Expected<uint64_t> SymAddrOrErr = getSymbolAddress(Obj, Reloc, L);
		if (!SymAddrOrErr) {
		errs() << toString(std::move(SymAddrOrErr.takeError())) << '\n';
		continue;
		}
		SymAddr = *SymAddrOrErr;
		Sym = RelSym;
		}
		applyRelocation(InfoSection.Relocs, Section, Reloc, SymAddr);
		}
		}
		}
		}