This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Use multithreading to build .gdb_index
AbandonedPublic

Authored by grimar on May 12 2017, 3:06 AM.

Download Raw Diff

Details

Reviewers

ruiu
• rafael

Summary

Currently LLD uses LLVM DWARF parsers to scan objects
to find information required for building .gdb_index.

There are 2 known possible ways to speedup building index.
One of them is to use relocated output to produce .gdb_index (D31424),
that saves time because DWARF parsers don't need to apply some
amount of relocations on their side.

But way above looks does not allow to parallel generation. This
patch suggests to use parrallel_for loop when we scan input objects.
That gives significant speedup itself. And if we also land D32853 and D31136,
.gdb_index feature will be really much faster.

Numbers for this patch are below. I am using i7 4790K (4 phys cores, @4.0), ssd, 10 runs for each test.
Tried to link debug llc binary using LLD.

no --gdb-index option:
4,044980833 seconds time elapsed ( +- 0,45% )

with --gdb-index, without patch
18,688194986 seconds time elapsed ( +- 0,50% )

with --gdb-index, with patch
8,578650449 seconds time elapsed ( +- 0,32% )

That means gdb_index building time without patch is 18.688s - 4.044s = 14.664s.
Time after patch is 8.578s - 4.044s = 4.534s.
After patch we are 3.234x faster for my configuration.

Diff Detail

Event Timeline

grimar created this revision.May 12 2017, 3:06 AM

Herald added a subscriber: aprantl. · View Herald TranscriptMay 12 2017, 3:06 AM

grimar edited the summary of this revision. (Show Details)May 12 2017, 3:07 AM

grimar mentioned this in D32853: [ELF] - Speedup readAddressArea() implementation..May 12 2017, 3:49 AM

I'll leave most of the review to Rui et, al. Just a drive by glance.

ELF/SyntheticSections.cpp
1772–1774	Might make more sense to have one container containing a struct with 3 members, than 3 parallel containers like this?

grimar added inline comments.May 12 2017, 10:46 AM

ELF/SyntheticSections.cpp
1772–1774	May be, I just tried to do as minimum changes as possible in current structure, to make initial review proccess be easier, also assuming that if whole idea is OK we will came to something more ideal during reviews. So I would be happy to start from general question for all reviewers - is this whole idea looks fine ? And then we can polish it together if so :)

This looks mostly fine, but I wouldn't submit this patch at the moment, as I think the code to generate .gdb_index is not optimize enough for single-thread yet. Our general strategy is to focus on single-thread performance and then use multi-threads as a final shot. I believe this strategy is working well, because (a) using multi-threads may hide real problems if used too early that are obvious when run with only one thread (which would result in making the linker a so-so performance), and (b) thinking about multi-threading is distracting when optimizing code.

Do you think you can't make it any faster without using multi-threading? I instinct is that the performance we can reach without multi-threading is higher than it is now.

In D33122#753528, @ruiu wrote:

This looks mostly fine, but I wouldn't submit this patch at the moment, as I think the code to generate .gdb_index is not optimize enough for single-thread yet. Our general strategy is to focus on single-thread performance and then use multi-threads as a final shot. I believe this strategy is working well, because (a) using multi-threads may hide real problems if used too early that are obvious when run with only one thread (which would result in making the linker a so-so performance), and (b) thinking about multi-threading is distracting when optimizing code.

Do you think you can't make it any faster without using multi-threading? I instinct is that the performance we can reach without multi-threading is higher than it is now.

Sure I think I can. I am pretty sure that for example something I implemented in D32853 was a right general direction and if you agree with whole idea of this patch then I'll abandon D32853 and going to suggest a patch for DWARF parsers instead of D32853.
Another patch - D31136 had LGTM today and it opens road to speedup how relocations are handled on DWARF parsers side too. I am pretty sure that whole way is probably more promising then way of using relocated content for generation of index.

And at the same time I would really want to ask you to land this patch too, because it is usefull for comparsion purposes and does not interfere with idea to optimize single threaded solution, but raises whole
target perfomance on a new level and gives some point we can reference to.

What about landing this one with single trhreaded loop instead of parallelForEachN ? This gives ability to have multithreaded design "in mind" and still focus on a single thread ?

And at the same time I would really want to ask you to land this patch too, because it is usefull for comparsion purposes and does not interfere with idea to optimize single threaded solution, but raises whole
target perfomance on a new level and gives some point we can reference to.

What do you compare with what? If you compare a multi-threaded version against a single-threaded version, it's likely that the former is faster, but it's no surprise and not a fair comparison. That is exactly I meant by enabling multi-threading too early is distracting.

In D33122#753706, @ruiu wrote:

And at the same time I would really want to ask you to land this patch too, because it is usefull for comparsion purposes and does not interfere with idea to optimize single threaded solution, but raises whole
target perfomance on a new level and gives some point we can reference to.

What do you compare with what? If you compare a multi-threaded version against a single-threaded version, it's likely that the former is faster, but it's no surprise and not a fair comparison. That is exactly I meant by enabling multi-threading too early is distracting.

It is interesting to compare LLD with gold for example for the same configuration to see whole link time difference and also probably useful to have multithreaded version under hand to check numbers from time to time and how changes
in code affect scalability. Globally working on single thread perfomance is othogonal task I think. This patch does not help and does not interfere with work on that (probably).

My main reason why I would want to land it is that will be easier to work on other patches, since this one changes internal structures which other patches may depend on.
As I mentioned - we can land single threaded loop and continue keeping builder single threaded. It should not be distracting for optimizations, because the whole structure of index building code
is straightforward, most of job done in separated static methods or inside DWARF parsers, so I would not expect it will be distracting.

At the same time I think I understand your position good enough. I am happy that this patch seems is a right direction and that opens road for implementing possible optimizations I have in mind.
So if you have strong feel it is early for it, lets place it on hold, I am fine with that too.

grimar mentioned this in D31424: [ELF] - Use relocated content when generating .gdb_index.May 15 2017, 3:47 AM

I think I'm not convinced. Using multi-threading when you are optimizing it for single-thread is distracting and increases deviation when you measure its performance. Please keep it as your local patch if you need it.

If you just replace parallelForEachN with std::for_each, the resulting code would look awkward, as this code was written with thread-safety in mind. If you can make this patch more natural, it may be ok to land it, though. So if you wish to land something like this, please update this without parallelForEachN and upload a new patch.

grimar mentioned this in D33552: [ELF] - Make implementation of .gdb index to be more natural for futher paralleling..May 25 2017, 9:12 AM

In favor of D33552

Revision Contents

Path

Size

ELF/

SyntheticSections.h

16 lines

SyntheticSections.cpp

125 lines

Diff 98744

ELF/SyntheticSections.h

	Show First 20 Lines • Show All 495 Lines • ▼ Show 20 Lines
	class GdbIndexSection final : public SyntheticSection {			class GdbIndexSection final : public SyntheticSection {
	const unsigned OffsetTypeSize = 4;			const unsigned OffsetTypeSize = 4;
	const unsigned CuListOffset = 6 * OffsetTypeSize;			const unsigned CuListOffset = 6 * OffsetTypeSize;
	const unsigned CompilationUnitSize = 16;			const unsigned CompilationUnitSize = 16;
	const unsigned AddressEntrySize = 16 + OffsetTypeSize;			const unsigned AddressEntrySize = 16 + OffsetTypeSize;
	const unsigned SymTabEntrySize = 2 * OffsetTypeSize;			const unsigned SymTabEntrySize = 2 * OffsetTypeSize;

	public:			public:
				typedef std::pair<uint64_t, uint64_t> CuEntry;
				typedef std::pair<StringRef, uint8_t> NameEntry;

	GdbIndexSection();			GdbIndexSection();
	void finalizeContents() override;			void finalizeContents() override;
	void writeTo(uint8_t *Buf) override;			void writeTo(uint8_t *Buf) override;
	size_t getSize() const override;			size_t getSize() const override;
	bool empty() const override;			bool empty() const override;

	// Pairs of [CU Offset, CU length].
	std::vector<std::pair<uint64_t, uint64_t>> CompilationUnits;

	llvm::StringTableBuilder StringPool;			llvm::StringTableBuilder StringPool;

	GdbHashTab SymbolTable;			GdbHashTab SymbolTable;

	// The CU vector portion of the constant pool.			// The CU vector portion of the constant pool.
	std::vector<std::vector<std::pair<uint32_t, uint8_t>>> CuVectors;			std::vector<std::vector<std::pair<uint32_t, uint8_t>>> CuVectors;

	std::vector<AddressEntry> AddressArea;			std::vector<std::vector<AddressEntry>> AddressArea;

				std::vector<std::vector<NameEntry>> NamesAndTypes;

				// Pairs of [CU Offset, CU length] for each object.
				std::vector<std::vector<CuEntry>> CompilationUnits;

	private:			private:
	void readDwarf(InputSection *Sec);			void readDwarf(InputSection *Sec, size_t ObjNdx);
				void buildIndex();

	uint32_t CuTypesOffset;			uint32_t CuTypesOffset;
	uint32_t SymTabOffset;			uint32_t SymTabOffset;
	uint32_t ConstantPoolOffset;			uint32_t ConstantPoolOffset;
	uint32_t StringPoolOffset;			uint32_t StringPoolOffset;

	size_t CuVectorsSize = 0;			size_t CuVectorsSize = 0;
	std::vector<size_t> CuVectorsOffset;			std::vector<size_t> CuVectorsOffset;
	▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

ELF/SyntheticSections.cpp

Show First 20 Lines • Show All 1,689 Lines • ▼ Show 20 Lines
// for version 4.		// for version 4.
static uint32_t hash(StringRef Str) {		static uint32_t hash(StringRef Str) {
uint32_t R = 0;		uint32_t R = 0;
for (uint8_t C : Str)		for (uint8_t C : Str)
R = R * 67 + tolower(C) - 113;		R = R * 67 + tolower(C) - 113;
return R;		return R;
}		}

static std::vector<std::pair<uint64_t, uint64_t>>		static std::vector<GdbIndexSection::CuEntry> readCuList(DWARFContext &Dwarf,
readCuList(DWARFContext &Dwarf, InputSection *Sec) {		InputSection *Sec) {
std::vector<std::pair<uint64_t, uint64_t>> Ret;		std::vector<std::pair<uint64_t, uint64_t>> Ret;
for (std::unique_ptr<DWARFCompileUnit> &CU : Dwarf.compile_units())		for (std::unique_ptr<DWARFCompileUnit> &CU : Dwarf.compile_units())
Ret.push_back({Sec->OutSecOff + CU->getOffset(), CU->getLength() + 4});		Ret.push_back({Sec->OutSecOff + CU->getOffset(), CU->getLength() + 4});
return Ret;		return Ret;
}		}

static InputSectionBase findSection(ArrayRef<InputSectionBase > Arr,		static InputSectionBase findSection(ArrayRef<InputSectionBase > Arr,
uint64_t Offset) {		uint64_t Offset) {
for (InputSectionBase *S : Arr)		for (InputSectionBase *S : Arr)
if (S && S != &InputSection::Discarded)		if (S && S != &InputSection::Discarded)
if (Offset >= S->getOffsetInFile() &&		if (Offset >= S->getOffsetInFile() &&
Offset < S->getOffsetInFile() + S->getSize())		Offset < S->getOffsetInFile() + S->getSize())
return S;		return S;
return nullptr;		return nullptr;
}		}

static std::vector<AddressEntry>		static std::vector<AddressEntry> readAddressArea(DWARFContext &Dwarf,
readAddressArea(DWARFContext &Dwarf, InputSection *Sec, size_t CurrentCU) {		InputSection *Sec) {
std::vector<AddressEntry> Ret;		std::vector<AddressEntry> Ret;

		size_t CurrentCU = 0;
for (std::unique_ptr<DWARFCompileUnit> &CU : Dwarf.compile_units()) {		for (std::unique_ptr<DWARFCompileUnit> &CU : Dwarf.compile_units()) {
DWARFAddressRangesVector Ranges;		DWARFAddressRangesVector Ranges;
CU->collectAddressRanges(Ranges);		CU->collectAddressRanges(Ranges);

ArrayRef<InputSectionBase *> Sections = Sec->File->getSections();		ArrayRef<InputSectionBase *> Sections = Sec->File->getSections();
for (std::pair<uint64_t, uint64_t> &R : Ranges)		for (std::pair<uint64_t, uint64_t> &R : Ranges)
if (InputSectionBase *S = findSection(Sections, R.first))		if (InputSectionBase *S = findSection(Sections, R.first))
Ret.push_back({S, R.first - S->getOffsetInFile(),		Ret.push_back({S, R.first - S->getOffsetInFile(),
Show All 24 Lines	uint64_t getSectionLoadAddress(const object::SectionRef &Sec) const override {
if (S.getFlags() & ELF::SHF_ALLOC)		if (S.getFlags() & ELF::SHF_ALLOC)
return S.getOffset();		return S.getOffset();
return 0;		return 0;
}		}

std::unique_ptr<llvm::LoadedObjectInfo> clone() const override { return {}; }		std::unique_ptr<llvm::LoadedObjectInfo> clone() const override { return {}; }
};		};

void GdbIndexSection::readDwarf(InputSection *Sec) {		void GdbIndexSection::buildIndex() {
Expected<std::unique_ptr<object::ObjectFile>> Obj =		std::vector<InputSection*> InfoV;
object::ObjectFile::createObjectFile(Sec->File->MB);		for (InputSectionBase *S : InputSections)
if (!Obj) {		if (InputSection *IS = dyn_cast<InputSection>(S))
error(toString(Sec->File) + ": error creating DWARF context");		if (IS->OutSec && IS->Name == ".debug_info")
return;		InfoV.push_back(IS);
}

ObjInfoTy ObjInfo;		if (InfoV.empty())
DWARFContextInMemory Dwarf(*Obj.get(), &ObjInfo);		return;

size_t CuId = CompilationUnits.size();		size_t Size = InfoV.size();
for (std::pair<uint64_t, uint64_t> &P : readCuList(Dwarf, Sec))		CompilationUnits.resize(Size);
CompilationUnits.push_back(P);		AddressArea.resize(Size);
		NamesAndTypes.resize(Size);
		dblaikieUnsubmitted Not Done Reply Inline Actions Might make more sense to have one container containing a struct with 3 members, than 3 parallel containers like this? dblaikie: Might make more sense to have one container containing a struct with 3 members, than 3 parallel…
		grimarAuthorUnsubmitted Not Done Reply Inline Actions May be, I just tried to do as minimum changes as possible in current structure, to make initial review proccess be easier, also assuming that if whole idea is OK we will came to something more ideal during reviews. So I would be happy to start from general question for all reviewers - is this whole idea looks fine ? And then we can polish it together if so :) grimar: May be, I just tried to do as minimum changes as possible in current structure, to make initial…

for (AddressEntry &Ent : readAddressArea(Dwarf, Sec, CuId))		parallelForEachN(0, Size, [&](size_t I) {
AddressArea.push_back(Ent);		readDwarf(InfoV[I], I);
		});

std::vector<std::pair<StringRef, uint8_t>> NamesAndTypes =		size_t CuId = 0;
readPubNamesAndTypes(Dwarf, Config->IsLE);		for (size_t I = 0; I < Size; ++I) {
		// Now when we know amount of compilation units for each object, we fixup
		// the compilation units ID's in address area section.
		for (AddressEntry &E : AddressArea[I])
		E.CuIndex += CuId;

for (std::pair<StringRef, uint8_t> &Pair : NamesAndTypes) {		// Populate constant pool area.
		for (std::pair<StringRef, uint8_t> &Pair : NamesAndTypes[I]) {
uint32_t Hash = hash(Pair.first);		uint32_t Hash = hash(Pair.first);
size_t Offset = StringPool.add(Pair.first);		size_t Offset = StringPool.add(Pair.first);

bool IsNew;		bool IsNew;
GdbSymbol *Sym;		GdbSymbol *Sym;
std::tie(IsNew, Sym) = SymbolTable.add(Hash, Offset);		std::tie(IsNew, Sym) = SymbolTable.add(Hash, Offset);
if (IsNew) {		if (IsNew) {
Sym->CuVectorIndex = CuVectors.size();		Sym->CuVectorIndex = CuVectors.size();
CuVectors.push_back({{CuId, Pair.second}});		CuVectors.push_back({{CuId, Pair.second}});
continue;		continue;
}		}

CuVectors[Sym->CuVectorIndex].push_back({CuId, Pair.second});		CuVectors[Sym->CuVectorIndex].push_back({CuId, Pair.second});
}		}

		CuId += CompilationUnits[I].size();
		}
		}

		void GdbIndexSection::readDwarf(InputSection *Sec, size_t ObjNdx) {
		Expected<std::unique_ptr<object::ObjectFile>> Obj =
		object::ObjectFile::createObjectFile(Sec->File->MB);
		if (!Obj) {
		error(toString(Sec->File) + ": error creating DWARF context");
		return;
		}

		ObjInfoTy ObjInfo;
		DWARFContextInMemory Dwarf(*Obj.get(), &ObjInfo);
		CompilationUnits[ObjNdx] = readCuList(Dwarf, Sec);
		AddressArea[ObjNdx] = readAddressArea(Dwarf, Sec);
		NamesAndTypes[ObjNdx] = readPubNamesAndTypes(Dwarf, Config->IsLE);
		}

		template <class T> static size_t findSize(std::vector<std::vector<T>> &C) {
		size_t Ret = 0;
		for (std::vector<T> &V : C)
		Ret += V.size();
		return Ret;
}		}

void GdbIndexSection::finalizeContents() {		void GdbIndexSection::finalizeContents() {
if (Finalized)		if (Finalized)
return;		return;
Finalized = true;		Finalized = true;

for (InputSectionBase *S : InputSections)		buildIndex();
if (InputSection *IS = dyn_cast<InputSection>(S))
if (IS->OutSec && IS->Name == ".debug_info")
readDwarf(IS);

SymbolTable.finalizeContents();		SymbolTable.finalizeContents();

// GdbIndex header consist from version fields		// GdbIndex header consist from version fields
// and 5 more fields with different kinds of offsets.		// and 5 more fields with different kinds of offsets.
CuTypesOffset = CuListOffset + CompilationUnits.size() * CompilationUnitSize;		CuTypesOffset = CuListOffset + findSize(CompilationUnits) * CompilationUnitSize;
SymTabOffset = CuTypesOffset + AddressArea.size() * AddressEntrySize;		SymTabOffset = CuTypesOffset + findSize(AddressArea) * AddressEntrySize;

ConstantPoolOffset =		ConstantPoolOffset =
SymTabOffset + SymbolTable.getCapacity() * SymTabEntrySize;		SymTabOffset + SymbolTable.getCapacity() * SymTabEntrySize;

for (std::vector<std::pair<uint32_t, uint8_t>> &CuVec : CuVectors) {		for (std::vector<std::pair<uint32_t, uint8_t>> &CuVec : CuVectors) {
CuVectorsOffset.push_back(CuVectorsSize);		CuVectorsOffset.push_back(CuVectorsSize);
CuVectorsSize += OffsetTypeSize * (CuVec.size() + 1);		CuVectorsSize += OffsetTypeSize * (CuVec.size() + 1);
}		}
Show All 12 Lines	void GdbIndexSection::writeTo(uint8_t *Buf) {
write32le(Buf + 4, CuListOffset); // CU list offset.		write32le(Buf + 4, CuListOffset); // CU list offset.
write32le(Buf + 8, CuTypesOffset); // Types CU list offset.		write32le(Buf + 8, CuTypesOffset); // Types CU list offset.
write32le(Buf + 12, CuTypesOffset); // Address area offset.		write32le(Buf + 12, CuTypesOffset); // Address area offset.
write32le(Buf + 16, SymTabOffset); // Symbol table offset.		write32le(Buf + 16, SymTabOffset); // Symbol table offset.
write32le(Buf + 20, ConstantPoolOffset); // Constant pool offset.		write32le(Buf + 20, ConstantPoolOffset); // Constant pool offset.
Buf += 24;		Buf += 24;

// Write the CU list.		// Write the CU list.
for (std::pair<uint64_t, uint64_t> CU : CompilationUnits) {		for (std::vector<CuEntry> &V : CompilationUnits) {
		for (std::pair<uint64_t, uint64_t> CU : V) {
write64le(Buf, CU.first);		write64le(Buf, CU.first);
write64le(Buf + 8, CU.second);		write64le(Buf + 8, CU.second);
Buf += 16;		Buf += 16;
}		}
		}

// Write the address area.		// Write the address area.
for (AddressEntry &E : AddressArea) {		for (std::vector<AddressEntry> &V : AddressArea) {
		for (AddressEntry &E : V) {
uint64_t BaseAddr = E.Section->OutSec->Addr + E.Section->getOffset(0);		uint64_t BaseAddr = E.Section->OutSec->Addr + E.Section->getOffset(0);
write64le(Buf, BaseAddr + E.LowAddress);		write64le(Buf, BaseAddr + E.LowAddress);
write64le(Buf + 8, BaseAddr + E.HighAddress);		write64le(Buf + 8, BaseAddr + E.HighAddress);
write32le(Buf + 16, E.CuIndex);		write32le(Buf + 16, E.CuIndex);
Buf += 20;		Buf += 20;
}		}
		}

// Write the symbol table.		// Write the symbol table.
for (size_t I = 0; I < SymbolTable.getCapacity(); ++I) {		for (size_t I = 0; I < SymbolTable.getCapacity(); ++I) {
GdbSymbol *Sym = SymbolTable.getSymbol(I);		GdbSymbol *Sym = SymbolTable.getSymbol(I);
if (Sym) {		if (Sym) {
size_t NameOffset =		size_t NameOffset =
Sym->NameOffset + StringPoolOffset - ConstantPoolOffset;		Sym->NameOffset + StringPoolOffset - ConstantPoolOffset;
size_t CuVectorOffset = CuVectorsOffset[Sym->CuVectorIndex];		size_t CuVectorOffset = CuVectorsOffset[Sym->CuVectorIndex];
▲ Show 20 Lines • Show All 492 Lines • Show Last 20 Lines