This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
ELF/
-
SyntheticSections.h
2
SyntheticSections.cpp

Differential D58271

[ELF] --gdb-index: split off GdbSymbol::CuVector and add a separate CuVectors
AbandonedPublic

Authored by MaskRay on Feb 14 2019, 8:53 PM.

Download Raw Diff

Details

Reviewers

ruiu
echristo
• espindola

Summary

These GdbSymbol::CuVector cause memory allocation and cost lots of
memory. This patch splits off the field and add a separate CuVectors to
be more memory efficient.

For one of our large internal targets, there are 4791276 symbols and the
sum of size (eapacity) of GdbSymbol::CuVector is 19740000 (26185902).

Before: 24.820 seconds, 13.74GiB
After: 24.175 seconds, 13.21GiB

As a comparison,
/usr/bin/gold (Debian): 134.29 seconds, 12.04GiB
lld --no-gdb-index: 20.619 seconds, 9.12GiB

Diff Detail

Repository

rLLD LLVM Linker

Build Status

Buildable 28172
Build 28171: arc lint + arc unit

Event Timeline

MaskRay created this revision.Feb 14 2019, 8:53 PM

Herald added a reviewer: • espindola. · View Herald TranscriptFeb 14 2019, 8:53 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, jdoerfert, arphaman and 2 others. · View Herald Transcript

Harbormaster completed remote builds in B28172: Diff 186959.Feb 14 2019, 8:53 PM

The current parallelism scheme may reach some local optimum and it is hard to improve further. I have tried another approach D58276, which may have to greater potential to improve. For D58276, the lost of parallelism is my concern.

ruiu added inline comments.Feb 15 2019, 3:31 PM

ELF/SyntheticSections.cpp
2484	Doesn't this make a copy of a vector? Is this vector always small?
2543	Perhaps I'm missing something, but why do you have to create both GdbIndex vector and CuVectors in this function? I wonder if you can split it up into two functions.

I'll abandon this revision. This does decrease memory footprint (~3.85% in two of our large internal executables) for glibc allocator based lld without a performance hit, but unexpectedly increases memory footprint for out internal tcmalloc based lld. I also don't like the additional complexity.

I have a strong belief: (I actually wanted to call it impossible but just didn't want to make an absolute assertion :) ) we can't decrease the memory usage of .gdb_index without a performance hit. I sorta blame the function has been optimized tell well for performance :( If we can emphasize less on performance, https://reviews.llvm.org/D58276 (and https://reviews.llvm.org/differential/diff/187004) is a more feasible direction to decrease memory usage (internally, we care memory usage a lot. The performance is already very good (2x ~ 4x faster compared with gold for a wide range of applications). Some sacrifice on it is totally acceptable, but on the other hand, we have some hard memory usage limits and the current memory footprint characterstics make some huge targets unable to link).

I have a strong belief: (I actually wanted to call it impossible but just didn't want to make an absolute assertion :) ) we can't decrease the memory usage of .gdb_index without a performance hit. I sorta blame the function has been optimized tell well for performance :( If we can emphasize less on performance, https://reviews.llvm.org/D58276 (and https://reviews.llvm.org/differential/diff/187004) is a more feasible direction to decrease memory usage (internally, we care memory usage a lot. The performance is already very good (2x ~ 4x faster compared with gold for a wide range of applications). Some sacrifice on it is totally acceptable, but on the other hand, we have some hard memory usage limits and the current memory footprint characterstics make some huge targets unable to link).

That is my perception too; it seems nearly impossible to reduce the memory usage without sacrificing speed.

(I'm sorry, I sent it prematurely.)

That is my perception too; it seems nearly impossible to reduce the memory usage without sacrificing speed. If memory consumption is a problem for most users, we probably should choose memory reduction over speed, but I don't think our use case within Google is strong enough to change that design choice. In lld, we parallelize things if we are handling a massive number of the same kind of objects, and this perfectly matches that pattern. It'd be pretty odd if we don't do this only this place.

Fortunately there is a workaround for it: we could build a binary without --gdb-index and then add it using gdb as a post-processing. Maybe we should live with that.

dblaikie added a subscriber: dblaikie.Mar 25 2019, 7:16 PM

Revision Contents

Path

Size

ELF/

SyntheticSections.h

4 lines

SyntheticSections.cpp

117 lines

Diff 186959

ELF/SyntheticSections.h

Show First 20 Lines • Show All 691 Lines • ▼ Show 20 Lines	public:
struct GdbChunk {		struct GdbChunk {
InputSection *Sec;		InputSection *Sec;
std::vector<AddressEntry> AddressAreas;		std::vector<AddressEntry> AddressAreas;
std::vector<CuEntry> CompilationUnits;		std::vector<CuEntry> CompilationUnits;
};		};

struct GdbSymbol {		struct GdbSymbol {
llvm::CachedHashStringRef Name;		llvm::CachedHashStringRef Name;
std::vector<uint32_t> CuVector;
uint32_t NameOff;		uint32_t NameOff;
uint32_t CuVectorOff;		uint32_t CuVectorOff;
};		};

GdbIndexSection();		GdbIndexSection();
template <typename ELFT> static GdbIndexSection *create();		template <typename ELFT> static GdbIndexSection *create();
void writeTo(uint8_t *Buf) override;		void writeTo(uint8_t *Buf) override;
size_t getSize() const override { return Size; }		size_t getSize() const override { return Size; }
Show All 14 Lines	private:

// Each chunk contains information gathered from debug sections of a		// Each chunk contains information gathered from debug sections of a
// single object file.		// single object file.
std::vector<GdbChunk> Chunks;		std::vector<GdbChunk> Chunks;

// A symbol table for this .gdb_index section.		// A symbol table for this .gdb_index section.
std::vector<GdbSymbol> Symbols;		std::vector<GdbSymbol> Symbols;

		// CU vectors in the constant pool.
		std::vector<uint32_t> CuVectors;

size_t Size;		size_t Size;
};		};

// --eh-frame-hdr option tells linker to construct a header for all the		// --eh-frame-hdr option tells linker to construct a header for all the
// .eh_frame sections. This header is placed to a section named .eh_frame_hdr		// .eh_frame sections. This header is placed to a section named .eh_frame_hdr
// and also to a PT_GNU_EH_FRAME segment.		// and also to a PT_GNU_EH_FRAME segment.
// At runtime the unwinder then can find all the PT_GNU_EH_FRAME segments by		// At runtime the unwinder then can find all the PT_GNU_EH_FRAME segments by
// calling dl_iterate_phdr.		// calling dl_iterate_phdr.
▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines

ELF/SyntheticSections.cpp

Show First 20 Lines • Show All 2,473 Lines • ▼ Show 20 Lines	for (const DWARFDebugPubTable::Set &Set : Table.getData()) {
(Ent.Descriptor.toBits() << 24) \| I});		(Ent.Descriptor.toBits() << 24) \| I});
}		}
}		}
return Ret;		return Ret;
}		}

// Create a list of symbols from a given list of symbol names and types		// Create a list of symbols from a given list of symbol names and types
// by uniquifying them by name.		// by uniquifying them by name.
static std::vector<GdbIndexSection::GdbSymbol>		static std::pair<std::vector<GdbIndexSection::GdbSymbol>, std::vector<uint32_t>>
createSymbols(ArrayRef<std::vector<GdbIndexSection::NameAttrEntry>> NameAttrs,		createSymbols(
		std::vector<std::vector<GdbIndexSection::NameAttrEntry>> NameAttrs,
		ruiuUnsubmitted Not Done Reply Inline Actions Doesn't this make a copy of a vector? Is this vector always small? ruiu: Doesn't this make a copy of a vector? Is this vector always small?
const std::vector<GdbIndexSection::GdbChunk> &Chunks) {		const std::vector<GdbIndexSection::GdbChunk> &Chunks) {
typedef GdbIndexSection::GdbSymbol GdbSymbol;		typedef GdbIndexSection::GdbSymbol GdbSymbol;
typedef GdbIndexSection::NameAttrEntry NameAttrEntry;		typedef GdbIndexSection::NameAttrEntry NameAttrEntry;

// For each chunk, compute the number of compilation units preceding it.		// For each chunk, compute the number of compilation units preceding it.
uint32_t CuIdx = 0;		uint32_t CuIdx = 0;
std::vector<uint32_t> CuIdxs(Chunks.size());		std::vector<uint32_t> CuIdxs(Chunks.size());
for (uint32_t I = 0, E = Chunks.size(); I != E; ++I) {		for (uint32_t I = 0, E = Chunks.size(); I != E; ++I) {
CuIdxs[I] = CuIdx;		CuIdxs[I] = CuIdx;
CuIdx += Chunks[I].CompilationUnits.size();		CuIdx += Chunks[I].CompilationUnits.size();
}		}

// The number of symbols we will handle in this function is of the order		// The number of symbols we will handle in this function is of the order
// of millions for very large executables, so we use multi-threading to		// of millions for very large executables, so we use multi-threading to
// speed it up.		// speed it up.
size_t NumShards = 32;		size_t NumShards = 32;
size_t Concurrency = 1;		size_t Concurrency = 1;
if (ThreadsEnabled)		if (ThreadsEnabled)
Concurrency =		Concurrency =
std::min<size_t>(PowerOf2Floor(hardware_concurrency()), NumShards);		std::min<size_t>(PowerOf2Floor(hardware_concurrency()), NumShards);

// A sharded map to uniquify symbols by name.
std::vector<DenseMap<CachedHashStringRef, size_t>> Map(NumShards);
size_t Shift = 32 - countTrailingZeros(NumShards);		size_t Shift = 32 - countTrailingZeros(NumShards);

// Instantiate GdbSymbols while uniqufying them by name.		// Instantiate GdbSymbols while uniqufying them by name.
std::vector<std::vector<GdbSymbol>> Symbols(NumShards);		std::vector<std::vector<GdbSymbol>> Symbols(NumShards);

		// For each symbol get the size of its CU vector and save to CuVectorOff.
		{
		std::vector<DenseMap<CachedHashStringRef, size_t>> Map(NumShards);
		parallelForEachN(0, Concurrency, [&](size_t ThreadId) {
		for (std::vector<NameAttrEntry> &Entries : NameAttrs) {
		for (NameAttrEntry &Ent : Entries) {
		size_t ShardId = Ent.Name.hash() >> Shift;
		if ((ShardId & (Concurrency - 1)) != ThreadId)
		continue;

		std::vector<GdbSymbol> &Vec = Symbols[ShardId];
		auto R = Map[ShardId].try_emplace(Ent.Name, Vec.size());
		if (R.second)
		Vec.push_back({Ent.Name, 0, 0});
		++Vec[R.first->second].CuVectorOff;
		// Repurpose Ent.Name.size() to be the index into Symbols[ShardId].
		Ent.Name = {{Ent.Name.data(), R.first->second}, Ent.Name.hash()};
		}
		}
		});
		}

		// CuVectors is the concatenated CU vectors of all symbols. The first value of
		// each CU vector describes the size, other values are attributes. Accumulate
		// CuVectorOff to get offsets in CuVectors.
		uint32_t Off = 0;
		for (std::vector<GdbSymbol> &Vec : Symbols)
		for (GdbSymbol &Sym : Vec) {
		uint32_t Next = Off + 1 + Sym.CuVectorOff;
		Sym.CuVectorOff = Off;
		Off = Next;
		}
		const uint32_t CuVectorsSize = Off;
		std::vector<uint32_t> CuVectors(CuVectorsSize, 0);
		ruiuUnsubmitted Not Done Reply Inline Actions Perhaps I'm missing something, but why do you have to create both GdbIndex vector and CuVectors in this function? I wonder if you can split it up into two functions. ruiu: Perhaps I'm missing something, but why do you have to create both GdbIndex vector and CuVectors…

		// Fill in the first values now. Also compute NameOff.
		uint32_t I = 0, Last;
		Off *= 4;
		for (std::vector<GdbSymbol> &Vec : Symbols)
		for (GdbSymbol &Sym : Vec) {
		Sym.NameOff = Off;
		Off += Sym.Name.size() + 1;
		if (I)
		CuVectors[Last] = Sym.CuVectorOff - Last - 1;
		Last = Sym.CuVectorOff;
		++I;
		}
		if (I)
		CuVectors[Last] = CuVectorsSize - Last - 1;

		// Fill in other elements of CuVectors. We use CuVectorOff in the counting
		// sort manner thus we need to recompute them below.
parallelForEachN(0, Concurrency, [&](size_t ThreadId) {		parallelForEachN(0, Concurrency, [&](size_t ThreadId) {
uint32_t I = 0;		uint32_t I = 0;
for (ArrayRef<NameAttrEntry> Entries : NameAttrs) {		for (ArrayRef<NameAttrEntry> Entries : NameAttrs) {
for (const NameAttrEntry &Ent : Entries) {		for (const NameAttrEntry &Ent : Entries) {
size_t ShardId = Ent.Name.hash() >> Shift;		size_t ShardId = Ent.Name.hash() >> Shift;
if ((ShardId & (Concurrency - 1)) != ThreadId)		if ((ShardId & (Concurrency - 1)) != ThreadId)
continue;		continue;

uint32_t V = Ent.CuIndexAndAttrs + CuIdxs[I];		// Ent.Name.size() is the index into Symbols[ShardId].
size_t &Idx = Map[ShardId][Ent.Name];		CuVectors[++Symbols[ShardId][Ent.Name.size()].CuVectorOff] =
if (Idx) {		Ent.CuIndexAndAttrs + CuIdxs[I];
Symbols[ShardId][Idx - 1].CuVector.push_back(V);
continue;
}

Idx = Symbols[ShardId].size() + 1;
Symbols[ShardId].push_back({Ent.Name, {V}, 0, 0});
}		}
++I;		++I;
}		}
});		});

size_t NumSymbols = 0;		size_t NumSymbols = 0;
for (ArrayRef<GdbSymbol> V : Symbols)		Off = 0;
NumSymbols += V.size();		for (std::vector<GdbSymbol> &Vec : Symbols) {
		for (GdbSymbol &Sym : Vec) {
		uint32_t Next = Off + 1 + Sym.CuVectorOff;
		Sym.CuVectorOff = Off;
		Off = Next;
		}
		NumSymbols += Vec.size();
		}

// The return type is a flattened vector, so we'll copy each vector		// The return type is a flattened vector, so we'll copy each vector
// contents to Ret.		// contents to Ret.
std::vector<GdbSymbol> Ret;		std::vector<GdbSymbol> Ret;
Ret.reserve(NumSymbols);		Ret.reserve(NumSymbols);
for (std::vector<GdbSymbol> &Vec : Symbols)		for (const std::vector<GdbSymbol> &Vec : Symbols)
for (GdbSymbol &Sym : Vec)		Ret.insert(Ret.end(), Vec.begin(), Vec.end());
Ret.push_back(std::move(Sym));

// CU vectors and symbol names are adjacent in the output file.		return {std::move(Ret), std::move(CuVectors)};
// We can compute their offsets in the output file now.
size_t Off = 0;
for (GdbSymbol &Sym : Ret) {
Sym.CuVectorOff = Off;
Off += (Sym.CuVector.size() + 1) * 4;
}
for (GdbSymbol &Sym : Ret) {
Sym.NameOff = Off;
Off += Sym.Name.size() + 1;
}

return Ret;
}		}

// Returns a newly-created .gdb_index section.		// Returns a newly-created .gdb_index section.
template <class ELFT> GdbIndexSection *GdbIndexSection::create() {		template <class ELFT> GdbIndexSection *GdbIndexSection::create() {
std::vector<InputSection *> Sections = getDebugInfoSections();		std::vector<InputSection *> Sections = getDebugInfoSections();

// .debug_gnu_pub{names,types} are useless in executables.		// .debug_gnu_pub{names,types} are useless in executables.
// They are present in input object files solely for creating		// They are present in input object files solely for creating
Show All 14 Lines	parallelForEachN(0, Sections.size(), [&](size_t I) {
Chunks[I].AddressAreas = readAddressAreas(Dwarf, Sections[I]);		Chunks[I].AddressAreas = readAddressAreas(Dwarf, Sections[I]);
NameAttrs[I] = readPubNamesAndTypes<ELFT>(		NameAttrs[I] = readPubNamesAndTypes<ELFT>(
static_cast<const LLDDwarfObj<ELFT> &>(Dwarf.getDWARFObj()),		static_cast<const LLDDwarfObj<ELFT> &>(Dwarf.getDWARFObj()),
Chunks[I].CompilationUnits);		Chunks[I].CompilationUnits);
});		});

auto *Ret = make<GdbIndexSection>();		auto *Ret = make<GdbIndexSection>();
Ret->Chunks = std::move(Chunks);		Ret->Chunks = std::move(Chunks);
Ret->Symbols = createSymbols(NameAttrs, Ret->Chunks);		std::tie(Ret->Symbols, Ret->CuVectors) = createSymbols(NameAttrs, Ret->Chunks);
Ret->initOutputSize();		Ret->initOutputSize();
return Ret;		return Ret;
}		}

void GdbIndexSection::writeTo(uint8_t *Buf) {		void GdbIndexSection::writeTo(uint8_t *Buf) {
// Write the header.		// Write the header.
auto Hdr = reinterpret_cast<GdbIndexHeader >(Buf);		auto Hdr = reinterpret_cast<GdbIndexHeader >(Buf);
uint8_t *Start = Buf;		uint8_t *Start = Buf;
Show All 34 Lines	for (GdbSymbol &Sym : Symbols) {
uint32_t H = Sym.Name.hash();		uint32_t H = Sym.Name.hash();
uint32_t I = H & Mask;		uint32_t I = H & Mask;
uint32_t Step = ((H * 17) & Mask) \| 1;		uint32_t Step = ((H * 17) & Mask) \| 1;

while (read32le(Buf + I * 8))		while (read32le(Buf + I * 8))
I = (I + Step) & Mask;		I = (I + Step) & Mask;

write32le(Buf + I * 8, Sym.NameOff);		write32le(Buf + I * 8, Sym.NameOff);
write32le(Buf + I * 8 + 4, Sym.CuVectorOff);		write32le(Buf + I * 8 + 4, Sym.CuVectorOff * 4);
}		}

Buf += SymtabSize * 8;		Buf += SymtabSize * 8;

// Write the string pool.		// Write the string pool.
Hdr->ConstantPoolOff = Buf - Start;		Hdr->ConstantPoolOff = Buf - Start;
parallelForEach(Symbols, [&](GdbSymbol &Sym) {		parallelForEach(Symbols, [&](GdbSymbol &Sym) {
memcpy(Buf + Sym.NameOff, Sym.Name.data(), Sym.Name.size());		memcpy(Buf + Sym.NameOff, Sym.Name.data(), Sym.Name.size());
});		});

// Write the CU vectors.		// Write the CU vectors.
for (GdbSymbol &Sym : Symbols) {		for (uint32_t V : CuVectors) {
write32le(Buf, Sym.CuVector.size());		write32le(Buf, V);
Buf += 4;		Buf += 4;
for (uint32_t Val : Sym.CuVector) {
write32le(Buf, Val);
Buf += 4;
}
}		}
}		}

bool GdbIndexSection::empty() const { return Chunks.empty(); }		bool GdbIndexSection::empty() const { return Chunks.empty(); }

EhFrameHeader::EhFrameHeader()		EhFrameHeader::EhFrameHeader()
: SyntheticSection(SHF_ALLOC, SHT_PROGBITS, 4, ".eh_frame_hdr") {}		: SyntheticSection(SHF_ALLOC, SHT_PROGBITS, 4, ".eh_frame_hdr") {}

▲ Show 20 Lines • Show All 569 Lines • Show Last 20 Lines