This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/DWP/
-
llvm/
-
DWP/
-
DWP.h
-
lib/DWP/
-
DWP/
-
ConsumerQueue.h
-
DWP.cpp
-
DWPStringPool.h
-
OrderedConsumerQueue.h
-
WorkerState.h
-
tools/llvm-dwp/
-
llvm-dwp/
-
llvm-dwp.cpp

Differential D152162

DWP multithreading
Needs ReviewPublic

Authored by zhuna8616 on Jun 5 2023, 7:07 AM.

Download Raw Diff

Details

Reviewers

ayermolo
dblaikie

Summary

This patch is intended to add multithreading into dwp packaging. Most of the multithreading happens when input files are parsed and str & str_offset sections are merged. Command line options are introduced to specify the number of thread in those 2 stages.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

zhuna8616 created this revision.Jun 5 2023, 7:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2023, 7:07 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

zhuna8616 requested review of this revision.Jun 5 2023, 7:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2023, 7:07 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

zhuna8616 retitled this revision from This patch is intended to add multithreading into dwp packaging. Most of the multithreading happens when input files are parsed and str & str_offset sections are merged. Command line options are introduced to specify the number of thread in those... to DWP multithreading.Jun 5 2023, 7:10 AM

zhuna8616 edited the summary of this revision. (Show Details)

zhuna8616 added reviewers: ayermolo, dblaikie.

Harbormaster completed remote builds in B236622: Diff 528426.Jun 5 2023, 8:26 AM

Do you have the detail data for this patch to speed up the dwp ? And I think, we need to reuse LLVM code as much as possible instead of add new one.(i.e. ConsumerQueue should be easily implement with llvm ADT).

@dblaikie We are trying to speed up the dwp with multiple threading as it takes too much time. Do you think this is the right direction we want to go ?

Enna1 added a subscriber: Enna1.Jun 5 2023, 11:39 PM

Do you have a profile for this? I think there's probably more low-hanging fruit that'd be good to address before parallelism's the tool I'd reach for. Like currently using MC to write the output adds a lot of overhead of copying all the to-be-written bytes into buffers before writing them out, and the inputs are probably decompressed and cached...

Comparing a profile between llvm-dwp and gold's dwp might give some hints at places to improve/where time's being wasted.

Though I'm not averse to adding multithreading.

If there were ways to share some of lld's multithreading & more efficient reading/writing that'd be great too, though I think last time that was mentioned @MaskRay suggested it'd be better to reimplement things - maybe at least inspired by lld's implementation.

+1 for achieving low-hanging fruits and switching to MCStreamer before adding parallelism. (@dblaikie seems to have a plan)
After these optimizations, the time proportion of different phases may be different enough that warrants a different design of parallelism.
I think this direction is going to have smaller engineering efforts after both MCStreamer migration and parallelism work is done.

I have some reservation that adding parallelism now may increase the long-term engineering efforts.

We investigated the performance of both LLVM & gold 's dwp, and found gold's performance is not as good, specifically 0.6 of llvm-dwp when packing clang's dwp. They share a bottleneck in merging the debug_str.dwo section, which for llvm-dwp is the function writeStringsAndOffsets. This is because maintaining a hash set of strings is slow, and each string requires one lookup at least. Gold uses std::unordered_map, while LLVM uses DenseMap, which is faster than std::unordered_map, thus causes Gold to be slower than LLVM.

We changed LLVM's DenseMap to StringMap, and achieved 5% at the very least, and 17%~20% when packing for clang. This is a low-hanging fruit as @dblaikie put it. I believe we can at least make this improvement.

With all of above being said, adding multithreading for the merging of the debug_str.dwo section at least would give the improvement of 17%~22% on average for our projects, with 16 threads. This is not as messy as the patch presently since this patch also added multithreading in other places. If only the code concerning debug_str.dwo is modified, I believe it would be friendly to the long-term engineering efforts.

Furthermore, if we do not deduplicate the string table produced by each worker thread, after they are done with their assigned files, the improvement in performance can be 59%~190% as observed on our projects. The size of the produced DWP file increases less than 9% with respect to the file produced by the original implementation of LLVM.

So we propose 3 options of changes:

Change DenseMap to StringMap.
Add multithreading for the merging of debug_str.dwo.
Add multithreading for the merging of debug_str.dwo and a command line option controlling whether the threads deduplicate the string table.

In D152162#4463413, @zhuna8616 wrote:

We investigated the performance of both LLVM & gold 's dwp, and found gold's performance is not as good, specifically 0.6 of llvm-dwp when packing clang's dwp. They share a bottleneck in merging the debug_str.dwo section, which for llvm-dwp is the function writeStringsAndOffsets. This is because maintaining a hash set of strings is slow, and each string requires one lookup at least. Gold uses std::unordered_map, while LLVM uses DenseMap, which is faster than std::unordered_map, thus causes Gold to be slower than LLVM.

Huh, fascinating. How's the memory usage? (I'd expect the memory usage would be way higher, and the extra copying into/out of buffers would add up to cost more than the savings in string map lookups - but I haven't done detailed profiles, admittedly)

We changed LLVM's DenseMap to StringMap, and achieved 5% at the very least, and 17%~20% when packing for clang. This is a low-hanging fruit as @dblaikie put it. I believe we can at least make this improvement.

Yep, that sounds like a freebie - please send that as a separate review?

With all of above being said, adding multithreading for the merging of the debug_str.dwo section at least would give the improvement of 17%~22% on average for our projects, with 16 threads. This is not as messy as the patch presently since this patch also added multithreading in other places. If only the code concerning debug_str.dwo is modified, I believe it would be friendly to the long-term engineering efforts.

Furthermore, if we do not deduplicate the string table produced by each worker thread, after they are done with their assigned files, the improvement in performance can be 59%~190% as observed on our projects. The size of the produced DWP file increases less than 9% with respect to the file produced by the original implementation of LLVM.

So we propose 3 options of changes:

Change DenseMap to StringMap.

Add multithreading for the merging of debug_str.dwo.

Add multithreading for the merging of debug_str.dwo and a command line option controlling whether the threads deduplicate the string table.

My concern with multithreaded string merging is determinism, I think? It's important that we produce the same output bits given the same inputs - are the multithreaded string merging approaches you have in mind still deterministic?

It'd be great if we could reuse some of lld's string merging support since they've already thought about these sort of issues & I believe figured out ways to do it deterministically and fast/multithreaded.

I posted a new review for using StringMap. https://reviews.llvm.org/D154341

We use a priority queue to maintain the same order in multithreading as in original implementation. I will look into lld presently.

MTC added a subscriber: MTC.Jul 31 2023, 10:55 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

DWP/

DWP.h

1 line

lib/

DWP/

ConsumerQueue.h

70 lines

DWP.cpp

597 lines

DWPStringPool.h

233 lines

OrderedConsumerQueue.h

96 lines

WorkerState.h

135 lines

tools/

llvm-dwp/

llvm-dwp.cpp

13 lines

Diff 528426

llvm/include/llvm/DWP/DWP.h

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

	struct CompileUnitIdentifiers {			struct CompileUnitIdentifiers {
	uint64_t Signature = 0;			uint64_t Signature = 0;
	const char *Name = "";			const char *Name = "";
	const char *DWOName = "";			const char *DWOName = "";
	};			};

	Error write(MCStreamer &Out, ArrayRef<std::string> Inputs);			Error write(MCStreamer &Out, ArrayRef<std::string> Inputs);
				Error writeMultithread(MCStreamer &Out, ArrayRef<std::string> Inputs, uint32_t InputWorkerCount, uint32_t StrWorkerCount);

	unsigned getContributionIndex(DWARFSectionKind Kind, uint32_t IndexVersion);			unsigned getContributionIndex(DWARFSectionKind Kind, uint32_t IndexVersion);

	Error handleSection(			Error handleSection(
	const StringMap<std::pair<MCSection *, DWARFSectionKind>> &KnownSections,			const StringMap<std::pair<MCSection *, DWARFSectionKind>> &KnownSections,
	const MCSection StrSection, const MCSection StrOffsetSection,			const MCSection StrSection, const MCSection StrOffsetSection,
	const MCSection TypesSection, const MCSection CUIndexSection,			const MCSection TypesSection, const MCSection CUIndexSection,
	const MCSection TUIndexSection, const MCSection InfoSection,			const MCSection TUIndexSection, const MCSection InfoSection,
	Show All 26 Lines

llvm/lib/DWP/ConsumerQueue.h

This file was added.

				#ifndef TOOLS_LLVM_DWP_CONQUEUE
				#define TOOLS_LLVM_DWP_CONQUEUE

				#include <list>
				#include <memory>
				#include <thread>
				#include <condition_variable>
				#include <cassert>
				#include "llvm/ADT/PriorityQueue.h"


				namespace llvm {

				template <typename T>
				class ConsumerQueue {
				public:

				ConsumerQueue() = default;
				ConsumerQueue(uint64_t MaxSize) : MaxSize(MaxSize) {}
				ConsumerQueue(const ConsumerQueue &) = delete;
				~ConsumerQueue() = default;

				ConsumerQueue(ConsumerQueue &&rhs) noexcept
				: Queue(std::move(rhs.Queue)), MaxSize(rhs.MaxSize) {}

				ConsumerQueue &operator=(const ConsumerQueue &) = delete;
				ConsumerQueue &operator=(ConsumerQueue &&rhs) = delete;

				uint64_t Size() {
				std::lock_guard<std::mutex> Lock(Mutex);
				return Queue.size();
				}

				void Reset() {
				std::lock_guard<std::mutex> Lock(Mutex);
				Queue.clear();
				}

				void Emplace(std::shared_ptr<T> ObjPtr) {
				std::unique_lock<std::mutex> Lock(Mutex);
				FullCV.wait(Lock,
				[&]() { return Queue.size() < MaxSize; });
				Queue.emplace_back(std::move(ObjPtr));
				Lock.unlock();
				EmptyCV.notify_one();
				}

				std::shared_ptr<T> Pop() {
				std::unique_lock<std::mutex> Lock(Mutex);
				EmptyCV.wait(Lock, [&]() -> bool {
				return !Queue.empty();
				});
				// fetch worker state
				std::shared_ptr<T> ObjPtr = Queue.front();
				Queue.pop_front();
				Lock.unlock();
				FullCV.notify_one();
				return ObjPtr;
				}

				private:
				std::list<std::shared_ptr<T>> Queue;
				uint64_t MaxSize = std::numeric_limits<decltype(MaxSize)>::max();
				std::condition_variable EmptyCV, FullCV;
				std::mutex Mutex;
				};

				}

				#endif

llvm/lib/DWP/DWP.cpp

//===-- llvm-dwp.cpp - Split DWARF merging tool for llvm ------------------===//		//===-- llvm-dwp.cpp - Split DWARF merging tool for llvm ------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// A utility for merging DWARF 5 Split DWARF .dwo files into .dwp (DWARF		// A utility for merging DWARF 5 Split DWARF .dwo files into .dwp (DWARF
// package files).		// package files).
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		#include "ConsumerQueue.h"
		#include "DWPStringPool.h"
		#include "OrderedConsumerQueue.h"
		#include "WorkerState.h"
#include "llvm/DWP/DWP.h"		#include "llvm/DWP/DWP.h"
#include "llvm/DWP/DWPError.h"		#include "llvm/DWP/DWPError.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCObjectFileInfo.h"		#include "llvm/MC/MCObjectFileInfo.h"
#include "llvm/MC/MCTargetOptionsCommandFlags.h"		#include "llvm/MC/MCTargetOptionsCommandFlags.h"
#include "llvm/Object/Decompressor.h"		#include "llvm/Object/Decompressor.h"
#include "llvm/Object/ELFObjectFile.h"		#include "llvm/Object/ELFObjectFile.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include <limits>		#include <limits>

using namespace llvm;		using namespace llvm;
using namespace llvm::object;		using namespace llvm::object;

static mc::RegisterMCTargetOptionsFlags MCTargetOptionsFlags;		static mc::RegisterMCTargetOptionsFlags MCTargetOptionsFlags;

// Returns the size of debug_str_offsets section headers in bytes.
static uint64_t debugStrOffsetsHeaderSize(DataExtractor StrOffsetsData,
uint16_t DwarfVersion) {
if (DwarfVersion <= 4)
return 0; // There is no header before dwarf 5.
uint64_t Offset = 0;
uint64_t Length = StrOffsetsData.getU32(&Offset);
if (Length == llvm::dwarf::DW_LENGTH_DWARF64)
return 16; // unit length: 12 bytes, version: 2 bytes, padding: 2 bytes.
return 8; // unit length: 4 bytes, version: 2 bytes, padding: 2 bytes.
}

static uint64_t getCUAbbrev(StringRef Abbrev, uint64_t AbbrCode) {		static uint64_t getCUAbbrev(StringRef Abbrev, uint64_t AbbrCode) {
uint64_t Offset = 0;		uint64_t Offset = 0;
DataExtractor AbbrevData(Abbrev, true, 0);		DataExtractor AbbrevData(Abbrev, true, 0);
while (AbbrevData.getULEB128(&Offset) != AbbrCode) {		while (AbbrevData.getULEB128(&Offset) != AbbrCode) {
// Tag		// Tag
AbbrevData.getULEB128(&Offset);		AbbrevData.getULEB128(&Offset);
// DW_CHILDREN		// DW_CHILDREN
AbbrevData.getU8(&Offset);		AbbrevData.getU8(&Offset);
▲ Show 20 Lines • Show All 495 Lines • ▼ Show 20 Lines	else if (OutSection == InfoSection)
CurInfoSection.push_back(Contents);		CurInfoSection.push_back(Contents);
else {		else {
Out.switchSection(OutSection);		Out.switchSection(OutSection);
Out.emitBytes(Contents);		Out.emitBytes(Contents);
}		}
return Error::success();		return Error::success();
}		}

		Error handleSectionMultithread(
		const StringMap<std::pair<MCSection *, DWARFSectionKind>> &KnownSections,
		const MCSection StrSection, const MCSection StrOffsetSection,
		const MCSection TypesSection, const MCSection CUIndexSection,
		const MCSection TUIndexSection, const MCSection InfoSection,
		const SectionRef &Section,
		std::deque<SmallString<32>> &UncompressedSections,
		uint32_t (&ContributionOffsets)[8],
		StringRef &CurStrSection, StringRef &CurStrOffsetSection,
		std::vector<StringRef> &CurTypesSection,
		std::vector<StringRef> &CurInfoSection, StringRef &AbbrevSection,
		StringRef &CurCUIndexSection, StringRef &CurTUIndexSection,
		std::vector<std::pair<DWARFSectionKind, uint32_t>> &SectionLength,
		std::vector<std::pair<MCSection *, StringRef>> &HandleSectionOut) {
		if (Section.isBSS())
		return Error::success();

		if (Section.isVirtual())
		return Error::success();


		Expected<StringRef> NameOrErr = Section.getName();
		if (!NameOrErr)
		return NameOrErr.takeError();
		StringRef Name = *NameOrErr;

		Expected<StringRef> ContentsOrErr = Section.getContents();
		if (!ContentsOrErr)
		return ContentsOrErr.takeError();
		StringRef Contents = *ContentsOrErr;

		if (auto Err = handleCompressedSection(UncompressedSections, Section, Name,
		Contents))
		return Err;

		Name = Name.substr(Name.find_first_not_of("._"));

		auto SectionPair = KnownSections.find(Name);
		if (SectionPair == KnownSections.end())
		return Error::success();

		if (DWARFSectionKind Kind = SectionPair->second.second) {
		if (Kind != DW_SECT_EXT_TYPES && Kind != DW_SECT_INFO) {
		SectionLength.push_back(std::make_pair(Kind, Contents.size()));
		}

		if (Kind == DW_SECT_ABBREV) {
		AbbrevSection = Contents;
		}
		}

		MCSection *OutSection = SectionPair->second.first;
		if (OutSection == StrOffsetSection)
		CurStrOffsetSection = Contents;
		else if (OutSection == StrSection)
		CurStrSection = Contents;
		else if (OutSection == TypesSection)
		CurTypesSection.push_back(Contents);
		else if (OutSection == CUIndexSection)
		CurCUIndexSection = Contents;
		else if (OutSection == TUIndexSection)
		CurTUIndexSection = Contents;
		else if (OutSection == InfoSection)
		CurInfoSection.push_back(Contents);
		else {
		// Out.switchSection(OutSection);
		// Out.emitBytes(Contents);
		HandleSectionOut.push_back(std::make_pair(OutSection, Contents));
		}
		return Error::success();
		}

Error write(MCStreamer &Out, ArrayRef<std::string> Inputs) {		Error write(MCStreamer &Out, ArrayRef<std::string> Inputs) {
const auto &MCOFI = *Out.getContext().getObjectFileInfo();		const auto &MCOFI = *Out.getContext().getObjectFileInfo();
MCSection *const StrSection = MCOFI.getDwarfStrDWOSection();		MCSection *const StrSection = MCOFI.getDwarfStrDWOSection();
MCSection *const StrOffsetSection = MCOFI.getDwarfStrOffDWOSection();		MCSection *const StrOffsetSection = MCOFI.getDwarfStrOffDWOSection();
MCSection *const TypesSection = MCOFI.getDwarfTypesDWOSection();		MCSection *const TypesSection = MCOFI.getDwarfTypesDWOSection();
MCSection *const CUIndexSection = MCOFI.getDwarfCUIndexSection();		MCSection *const CUIndexSection = MCOFI.getDwarfCUIndexSection();
MCSection *const TUIndexSection = MCOFI.getDwarfTUIndexSection();		MCSection *const TUIndexSection = MCOFI.getDwarfTUIndexSection();
MCSection *const InfoSection = MCOFI.getDwarfInfoDWOSection();		MCSection *const InfoSection = MCOFI.getDwarfInfoDWOSection();
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	if (Version < 5) {
ContributionOffsets[0] = 1;		ContributionOffsets[0] = 1;
}		}

writeIndex(Out, MCOFI.getDwarfCUIndexSection(), ContributionOffsets,		writeIndex(Out, MCOFI.getDwarfCUIndexSection(), ContributionOffsets,
IndexEntries, IndexVersion);		IndexEntries, IndexVersion);

return Error::success();		return Error::success();
}		}

		Error writeMultithread(MCStreamer &Out, ArrayRef<std::string> Inputs, uint32_t InputWorkerCount, uint32_t StrWorkerCount) {
		// Shared State locks
		std::mutex UncompressedSectionsMutex;
		// Producer-consumer Queue
		OrderedConsumerQueue<DwpWorkerState> WorkerQueue;
		// Input process state
		ConsumerQueue<FileIndexEntry> InputIndexQueue;

		const auto &MCOFI = *Out.getContext().getObjectFileInfo();
		MCSection *const StrSection = MCOFI.getDwarfStrDWOSection();
		MCSection *const StrOffsetSection = MCOFI.getDwarfStrOffDWOSection();
		MCSection *const TypesSection = MCOFI.getDwarfTypesDWOSection();
		MCSection *const CUIndexSection = MCOFI.getDwarfCUIndexSection();
		MCSection *const TUIndexSection = MCOFI.getDwarfTUIndexSection();
		MCSection *const InfoSection = MCOFI.getDwarfInfoDWOSection();
		const StringMap<std::pair<MCSection *, DWARFSectionKind>> KnownSections = {
		{"debug_info.dwo", {InfoSection, DW_SECT_INFO}},
		{"debug_types.dwo", {MCOFI.getDwarfTypesDWOSection(), DW_SECT_EXT_TYPES}},
		{"debug_str_offsets.dwo", {StrOffsetSection, DW_SECT_STR_OFFSETS}},
		{"debug_str.dwo", {StrSection, static_cast<DWARFSectionKind>(0)}},
		{"debug_loc.dwo", {MCOFI.getDwarfLocDWOSection(), DW_SECT_EXT_LOC}},
		{"debug_line.dwo", {MCOFI.getDwarfLineDWOSection(), DW_SECT_LINE}},
		{"debug_macro.dwo", {MCOFI.getDwarfMacroDWOSection(), DW_SECT_MACRO}},
		{"debug_abbrev.dwo", {MCOFI.getDwarfAbbrevDWOSection(), DW_SECT_ABBREV}},
		{"debug_loclists.dwo",
		{MCOFI.getDwarfLoclistsDWOSection(), DW_SECT_LOCLISTS}},
		{"debug_rnglists.dwo",
		{MCOFI.getDwarfRnglistsDWOSection(), DW_SECT_RNGLISTS}},
		{"debug_cu_index", {CUIndexSection, static_cast<DWARFSectionKind>(0)}},
		{"debug_tu_index", {TUIndexSection, static_cast<DWARFSectionKind>(0)}}};

		MapVector<uint64_t, UnitIndexEntry> IndexEntries;
		MapVector<uint64_t, UnitIndexEntry> TypeIndexEntries;

		uint32_t ContributionOffsets[8] = {};
		uint16_t Version = 0;
		uint32_t IndexVersion = 0;

		DWPStringPool Strings(Out, StrSection);

		SmallVector<OwningBinary<object::ObjectFile>, 128> Objects;
		Objects.resize(Inputs.size());

		std::deque<SmallString<32>> UncompressedSections;


		// Str workers
		std::vector<OrderedConsumerQueue<StrWorkerState>> StrWorkerQueues;
		std::vector<StrWorker> StrWorkers;
		StrWorkerQueues.reserve(StrWorkerCount);
		StrWorkers.reserve(StrWorkerCount);
		uint64_t FilePerStrWorker;
		if (Inputs.size() <= StrWorkerCount) {
		StrWorkerCount = Inputs.size();
		FilePerStrWorker = 1;
		} else {
		FilePerStrWorker = Inputs.size() / StrWorkerCount;
		}
		uint64_t RollingInputBegin = 0;
		for (uint32_t I = 0; I < StrWorkerCount - 1; ++I) {
		StrWorkerQueues.emplace_back(OrderedConsumerQueue<StrWorkerState>(RollingInputBegin));
		StrWorkers.emplace_back(StrWorker(I, FilePerStrWorker, StrWorkerQueues.at(I)));
		RollingInputBegin += FilePerStrWorker;
		}
		const uint64_t FileLastStrWorker = Inputs.size() - RollingInputBegin;
		StrWorkerQueues.emplace_back(OrderedConsumerQueue<StrWorkerState>(RollingInputBegin));
		StrWorkers.emplace_back(StrWorker(StrWorkerCount - 1, FileLastStrWorker, StrWorkerQueues.back()));

		std::vector<std::thread> StrWorkerThreads;
		StrWorkerThreads.reserve(StrWorkerCount);
		for (auto &Worker : StrWorkers) {
		StrWorkerThreads.emplace_back(std::thread(std::ref(Worker)));
		}

		auto inputWorker = [&](uint32_t WorkerId) {
		auto emplaceWorkerQueue = [&] (std::unique_ptr<DwpWorkerState> &&State) {
		WorkerQueue.Emplace(std::forward<std::unique_ptr<DwpWorkerState>>(State));
		};
		auto submitComplete = [&]() {
		std::unique_ptr<DwpWorkerState> Result = std::make_unique<DwpWorkerState>(true);
		Result->Id = WorkerId;
		Result->LocalInputIndex = static_cast<uint32_t>(-1);
		WorkerQueue.Emplace(std::move(Result));
		};
		while (true) {
		auto InputIndexEntry = InputIndexQueue.Pop();
		if (InputIndexEntry->Done()) {
		// maintain complete state in the queue
		InputIndexQueue.Emplace(FileIndexEntry::MakeDone());
		submitComplete();
		return;
		}

		const uint32_t LocalInputIndex = InputIndexEntry->Index;
		auto &Input = Inputs[LocalInputIndex];


		std::unique_ptr<DwpWorkerState> WorkerState = std::make_unique<DwpWorkerState>();
		WorkerState->Id = WorkerId;
		WorkerState->LocalInputIndex = LocalInputIndex;

		auto submitState = [&]() {
		emplaceWorkerQueue(std::move(WorkerState));
		};
		auto submitSuccess = [&]() {
		assert(!WorkerState->ErrorState && "worker in error state in SubmitSuccess");
		submitState();
		};
		auto submitError = [&](Error &&e) {
		assert(!WorkerState->ErrorState && "worker in error state before SubmitError");
		WorkerState->ErrorState = std::forward<Error>(e);
		submitState();
		};
		auto ErrOrObj = object::ObjectFile::createObjectFile(Input);
		if (!ErrOrObj) {
		submitError(handleErrors(ErrOrObj.takeError(),
		[&](std::unique_ptr<ECError> EC) -> Error {
		return createFileError(Input, Error(std::move(EC)));
		}));
		return;
		}
		auto &Obj = *ErrOrObj->getBinary();
		Objects[LocalInputIndex] = std::move(*ErrOrObj);

		UnitIndexEntry CurEntry = {};

		StringRef CurTUIndexSection;

		// This maps each section contained in this file to its length.
		// This information is later on used to calculate the contributions,
		// i.e. offset and length, of each compile/type unit to a section.
		// std::vector<std::pair<DWARFSectionKind, uint32_t>> SectionLength;

		for (const auto &Section : Obj.sections())
		if (auto Err = handleSectionMultithread(
		KnownSections, StrSection, StrOffsetSection, TypesSection,
		CUIndexSection, TUIndexSection, InfoSection, Section,
		UncompressedSections, ContributionOffsets,
		WorkerState->CurStrSection, WorkerState->CurStrOffsetSection, WorkerState->CurTypesSection,
		WorkerState->CurInfoSection, WorkerState->AbbrevSection, WorkerState->CurCUIndexSection,
		CurTUIndexSection, WorkerState->SectionLength, WorkerState->HandleSectionOut)) {
		submitError(std::move(Err));
		return;
		}

		if (WorkerState->CurInfoSection.empty()) {
		assert(!WorkerState->ErrorState);
		continue;
		}

		Expected<InfoSectionUnitHeader> HeaderOrErr =
		parseInfoSectionUnitHeader(WorkerState->CurInfoSection.front());
		if (!HeaderOrErr) {
		submitError(HeaderOrErr.takeError());
		return;
		}
		WorkerState->Header = *HeaderOrErr;

		if (Version == 0) {
		Version = WorkerState->Header.Version;
		IndexVersion = Version < 5 ? 2 : 5;
		} else if (Version != WorkerState->Header.Version) {
		submitError(make_error<DWPError>("incompatible DWARF compile unit versions."));
		return;
		}

		// send off str section to StrWorkers
		uint64_t StrWorkerIndex = LocalInputIndex / FilePerStrWorker;
		StrWorkerIndex = StrWorkerIndex < (StrWorkerCount - 1) ? StrWorkerIndex : (StrWorkerCount - 1);
		if (!WorkerState->CurStrSection.empty() && !WorkerState->CurStrOffsetSection.empty()) {
		StrWorkerQueues.at(StrWorkerIndex).Emplace(std::make_shared<StrWorkerState>(
		LocalInputIndex, WorkerState->CurStrSection, WorkerState->CurStrOffsetSection, WorkerState->Header.Version
		));
		}

		if (WorkerState->CurCUIndexSection.empty()) {
		submitSuccess();
		continue;
		}

		if (WorkerState->CurInfoSection.size() != 1) {
		submitError(make_error<DWPError>("expected exactly one occurrence of a debug "
		"info section in a .dwp file"));
		return;
		}
		StringRef DwpSingleInfoSection = WorkerState->CurInfoSection.front();

		DataExtractor CUIndexData(WorkerState->CurCUIndexSection, Obj.isLittleEndian(), 0);
		if (!WorkerState->CUIndex.parse(CUIndexData)) {
		submitError(make_error<DWPError>("failed to parse cu_index"));
		return;
		}
		if (WorkerState->CUIndex.getVersion() != IndexVersion) {
		submitError(make_error<DWPError>("incompatible cu_index versions, found " +
		utostr(WorkerState->CUIndex.getVersion()) +
		" and expecting " + utostr(IndexVersion)));
		return;
		}

		for (uint64_t II = 0; II < WorkerState->CUIndex.getRows().size(); ++II) {
		const DWARFUnitIndex::Entry &E = WorkerState->CUIndex.getRows()[II];
		auto *I = E.getContributions();
		if (!I)
		continue;
		StringRef CUInfoSection =
		getSubsection(DwpSingleInfoSection, E, DW_SECT_INFO);
		Expected<InfoSectionUnitHeader> HeaderOrError =
		parseInfoSectionUnitHeader(CUInfoSection);
		if (!HeaderOrError) {
		submitError(HeaderOrError.takeError());
		return;
		}
		InfoSectionUnitHeader &Header = *HeaderOrError;

		Expected<CompileUnitIdentifiers> EID = getCUIdentifiers(
		Header, getSubsection(WorkerState->AbbrevSection, E, DW_SECT_ABBREV),
		CUInfoSection,
		getSubsection(WorkerState->CurStrOffsetSection, E, DW_SECT_STR_OFFSETS),
		WorkerState->CurStrSection);
		if (!EID) {
		submitError(createFileError(Input, EID.takeError()));
		return;
		}
		const auto &ID = *EID;
		WorkerState->CUIndexIDs.insert({II, ID});
		WorkerState->CUInfoSectionMap.insert({II, std::move(CUInfoSection)});
		}

		if (!CurTUIndexSection.empty()) {
		// Write type units into debug info section for DWARFv5.
		if (Version >= 5) {
		WorkerState->TUSectionKind = DW_SECT_INFO;
		WorkerState->OutSection = InfoSection;
		WorkerState->TypeInputSection = DwpSingleInfoSection;
		} else {
		// Write type units into debug types section for DWARF < 5.
		if (WorkerState->CurTypesSection.size() != 1) {
		submitError(make_error<DWPError>(
		"multiple type unit sections in .dwp file"));
		return;
		}

		WorkerState->TUSectionKind = DW_SECT_EXT_TYPES;
		WorkerState->OutSection = TypesSection;
		WorkerState->TypeInputSection = WorkerState->CurTypesSection.front();
		}
		// DWARFUnitIndex TUIndex(TUSectionKind);
		WorkerState->TUIndex = std::make_unique<DWARFUnitIndex>(WorkerState->TUSectionKind);
		DataExtractor TUIndexData(CurTUIndexSection, Obj.isLittleEndian(), 0);
		if (!WorkerState->TUIndex->parse(TUIndexData)) {
		submitError(make_error<DWPError>("failed to parse tu_index"));
		return;
		}
		if (WorkerState->TUIndex->getVersion() != IndexVersion) {
		submitError(make_error<DWPError>("incompatible tu_index versions, found " +
		utostr(WorkerState->TUIndex->getVersion()) +
		" and expecting " + utostr(IndexVersion)));
		return;
		}
		}
		submitSuccess();
		}
		};

		auto commitWorkerState = [&](DwpWorkerState &WorkerState) -> llvm::Error {
		auto &Input = Inputs[WorkerState.LocalInputIndex];
		UnitIndexEntry CurEntry = {};

		// handleSection
		for (auto &Pair : WorkerState.HandleSectionOut) {
		MCSection *Section = Pair.first;
		StringRef &Contents = Pair.second;
		Out.switchSection(Section);
		Out.emitBytes(Contents);
		}

		for (auto Pair : WorkerState.SectionLength) {
		auto Index = getContributionIndex(Pair.first, IndexVersion);
		CurEntry.Contributions[Index].setOffset(ContributionOffsets[Index]);
		CurEntry.Contributions[Index].setLength(Pair.second);
		ContributionOffsets[Index] += CurEntry.Contributions[Index].getLength32();
		}

		// writeStringsAndOffsets(Out, Strings, StrOffsetSection, WorkerState.CurStrSection,
		// WorkerState.CurStrOffsetSection, WorkerState.Header.Version);

		uint32_t &InfoSectionOffset =
		ContributionOffsets[getContributionIndex(DW_SECT_INFO, IndexVersion)];
		if (WorkerState.CurCUIndexSection.empty()) {
		bool FoundCUUnit = false;
		Out.switchSection(InfoSection);
		for (StringRef Info : WorkerState.CurInfoSection) {
		uint64_t UnitOffset = 0;
		while (Info.size() > UnitOffset) {
		Expected<InfoSectionUnitHeader> HeaderOrError =
		parseInfoSectionUnitHeader(Info.substr(UnitOffset, Info.size()));
		if (!HeaderOrError)
		return HeaderOrError.takeError();
		InfoSectionUnitHeader &Header = *HeaderOrError;

		UnitIndexEntry Entry = CurEntry;
		auto &C = Entry.Contributions[getContributionIndex(DW_SECT_INFO,
		IndexVersion)];
		C.setOffset(InfoSectionOffset);
		C.setLength(Header.Length + 4);

		if (std::numeric_limits<uint32_t>::max() - InfoSectionOffset <
		C.getLength32())
		return make_error<DWPError>(
		"debug information section offset is greater than 4GB");

		UnitOffset += C.getLength32();
		if (Header.Version < 5 \|\|
		Header.UnitType == dwarf::DW_UT_split_compile) {
		Expected<CompileUnitIdentifiers> EID = getCUIdentifiers(
		Header, WorkerState.AbbrevSection,
		Info.substr(UnitOffset - C.getLength32(), C.getLength32()),
		WorkerState.CurStrOffsetSection, WorkerState.CurStrSection);

		if (!EID)
		return createFileError(Input, EID.takeError());
		const auto &ID = *EID;
		auto P = IndexEntries.insert(std::make_pair(ID.Signature, Entry));
		if (!P.second)
		return buildDuplicateError(*P.first, ID, "");
		P.first->second.Name = ID.Name;
		P.first->second.DWOName = ID.DWOName;

		FoundCUUnit = true;
		} else if (Header.UnitType == dwarf::DW_UT_split_type) {
		auto P = TypeIndexEntries.insert(
		std::make_pair(*Header.Signature, Entry));
		if (!P.second)
		continue;
		}
		Out.emitBytes(
		Info.substr(UnitOffset - C.getLength32(), C.getLength32()));
		InfoSectionOffset += C.getLength32();
		}
		}

		if (!FoundCUUnit)
		return make_error<DWPError>("no compile unit found in file: " + Input);

		if (IndexVersion == 2) {
		// Add types from the .debug_types section from DWARF < 5.
		addAllTypesFromTypesSection(
		Out, TypeIndexEntries, TypesSection, WorkerState.CurTypesSection, CurEntry,
		ContributionOffsets[getContributionIndex(DW_SECT_EXT_TYPES, 2)]);
		}
		return Error::success();
		}

		Out.switchSection(InfoSection);
		// for (const DWARFUnitIndex::Entry &E : CUIndex.getRows()) {
		for (uint64_t II = 0; II < WorkerState.CUIndex.getRows().size(); ++II) {
		const DWARFUnitIndex::Entry &E = WorkerState.CUIndex.getRows()[II];
		auto *I = E.getContributions();
		if (!I)
		continue;
		auto P = IndexEntries.insert(std::make_pair(E.getSignature(), CurEntry));
		const auto &ID = WorkerState.CUIndexIDs.find(II)->second;
		if (!P.second)
		return buildDuplicateError(*P.first, ID, Input);
		auto &NewEntry = P.first->second;
		NewEntry.Name = ID.Name;
		NewEntry.DWOName = ID.DWOName;
		NewEntry.DWPName = Input;
		for (auto Kind : WorkerState.CUIndex.getColumnKinds()) {
		if (!isSupportedSectionKind(Kind))
		continue;
		auto &C =
		NewEntry.Contributions[getContributionIndex(Kind, IndexVersion)];
		C.setOffset(C.getOffset() + I->getOffset());
		C.setLength(I->getLength());
		++I;
		}
		unsigned Index = getContributionIndex(DW_SECT_INFO, IndexVersion);
		auto &C = NewEntry.Contributions[Index];
		Out.emitBytes(WorkerState.CUInfoSectionMap.find(II)->second);
		C.setOffset(InfoSectionOffset);
		InfoSectionOffset += C.getLength32();
		}

		if (WorkerState.TUIndex.get() != nullptr) {
		unsigned TypesContributionIndex =
		getContributionIndex(WorkerState.TUSectionKind, IndexVersion);
		addAllTypesFromDWP(Out, TypeIndexEntries, *WorkerState.TUIndex, WorkerState.OutSection,
		WorkerState.TypeInputSection, CurEntry,
		ContributionOffsets[TypesContributionIndex],
		TypesContributionIndex);
		}
		return Error::success();
		};

		// Input workers fixed size thread pool
		std::vector<std::thread> HandleInputThreadPool;
		HandleInputThreadPool.reserve(InputWorkerCount);
		for (uint32_t th = 0; th < InputWorkerCount; ++th) {
		HandleInputThreadPool.push_back(std::thread(inputWorker, th));
		}
		uint64_t ActiveWorkerCount = HandleInputThreadPool.size();

		// distributed order
		for (uint32_t StrWorkerIndex = 0; StrWorkerIndex < StrWorkerCount - 1; ++StrWorkerIndex) {
		for (uint32_t StrWorkerOffset = 0; StrWorkerOffset < FilePerStrWorker; ++StrWorkerOffset) {
		InputIndexQueue.Emplace(std::make_shared<FileIndexEntry>(
		StrWorkerIndex * FilePerStrWorker + StrWorkerOffset
		));
		}
		}
		for (uint32_t StrWorkerOffset = 0; StrWorkerOffset < FileLastStrWorker; ++StrWorkerOffset) {
		InputIndexQueue.Emplace(std::make_shared<FileIndexEntry>(
		(StrWorkerCount - 1) * FilePerStrWorker + StrWorkerOffset
		));
		}
		for (uint32_t StrWorkerIndex = 0; StrWorkerIndex < StrWorkerCount; ++StrWorkerIndex) {
		InputIndexQueue.Emplace(FileIndexEntry::MakeDone());
		}
		auto joinAllWorkers = [&]() {
		// join all dwo workers
		for (auto &Thread : HandleInputThreadPool) {
		if (Thread.joinable()) {
		Thread.join();
		}
		}
		HandleInputThreadPool.clear();
		uint32_t SuccessCount = 0;
		while (WorkerQueue.Size() > 0) {
		auto WorkerState = WorkerQueue.Pop();
		// throw away unchecked error states
		bool IsSuccess = !WorkerState->ErrorState;
		if (IsSuccess) {
		++SuccessCount;
		}
		}
		// join all str workers
		uint32_t StrWorkerJoinableCount = 0;
		for (auto &Thread : StrWorkerThreads) {
		if (Thread.joinable()) {
		StrWorkerJoinableCount++;
		}
		}
		for (auto &Thread : StrWorkerThreads) {
		if (Thread.joinable()) {
		Thread.join();
		}
		}
		};

		while (true) {
		// fetch worker state
		auto WorkerState = WorkerQueue.Pop();
		if (WorkerState->Complete) {
		ActiveWorkerCount--;
		assert(!WorkerState->ErrorState && "Or else why signal Complete?");
		if (ActiveWorkerCount == 0) {
		break;
		}
		continue;
		}
		// commit worker state
		auto shutdownWorkers = [&]() {
		InputIndexQueue.Emplace(FileIndexEntry::MakeDone());
		for (auto &Queue : StrWorkerQueues) {
		Queue.Emplace(std::make_shared<StrWorkerState>(true));
		}
		joinAllWorkers();
		};
		if (WorkerState->ErrorState) {
		shutdownWorkers();
		return std::move(WorkerState->ErrorState);
		}
		llvm::Error Err = commitWorkerState(*WorkerState);
		if (Err) {
		shutdownWorkers();
		return Err;
		}
		}
		// send Complete signal to StrWorkers
		for (auto &Queue : StrWorkerQueues) {
		Queue.Emplace(std::make_shared<StrWorkerState>(true));
		}
		joinAllWorkers();
		StrWorker::WorkersToSectionWithoutDup(StrWorkers, Out, StrSection, StrOffsetSection);

		// original tail before returning
		if (Version < 5) {
		// Lie about there being no info contributions so the TU index only includes
		// the type unit contribution for DWARF < 5. In DWARFv5 the TU index has a
		// contribution to the info section, so we do not want to lie about it.
		ContributionOffsets[0] = 0;
		}
		writeIndex(Out, MCOFI.getDwarfTUIndexSection(), ContributionOffsets,
		TypeIndexEntries, IndexVersion);

		if (Version < 5) {
		// Lie about the type contribution for DWARF < 5. In DWARFv5 the type
		// section does not exist, so no need to do anything about this.
		ContributionOffsets[getContributionIndex(DW_SECT_EXT_TYPES, 2)] = 0;
		// Unlie about the info contribution
		ContributionOffsets[0] = 1;
		}

		writeIndex(Out, MCOFI.getDwarfCUIndexSection(), ContributionOffsets,
		IndexEntries, IndexVersion);
		return Error::success();
		}
} // namespace llvm		} // namespace llvm

llvm/lib/DWP/DWPStringPool.h

This file was added.

				#ifndef TOOLS_LLVM_DWP_DWPSTRINGPOOL
				#define TOOLS_LLVM_DWP_DWPSTRINGPOOL

				#include "llvm/DWP/DWP.h"
				#include "llvm/DWP/DWPStringPool.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/MC/MCSection.h"
				#include "llvm/MC/MCStreamer.h"
				#include "llvm/Support/DataExtractor.h"
				#include <cassert>

				namespace llvm {

				// Returns the size of debug_str_offsets section headers in bytes.
				static uint64_t debugStrOffsetsHeaderSize(DataExtractor StrOffsetsData,
				uint16_t DwarfVersion) {
				if (DwarfVersion <= 4)
				return 0; // There is no header before dwarf 5.
				uint64_t Offset = 0;
				uint64_t Length = StrOffsetsData.getU32(&Offset);
				if (Length == llvm::dwarf::DW_LENGTH_DWARF64)
				return 16; // unit length: 12 bytes, version: 2 bytes, padding: 2 bytes.
				return 8; // unit length: 4 bytes, version: 2 bytes, padding: 2 bytes.
				}

				/*
				class DWPStringPool {

				struct CStrDenseMapInfo {
				static inline const char *getEmptyKey() {
				return reinterpret_cast<const char *>(~static_cast<uintptr_t>(0));
				}
				static inline const char *getTombstoneKey() {
				return reinterpret_cast<const char *>(~static_cast<uintptr_t>(1));
				}
				static unsigned getHashValue(const char *Val) {
				assert(Val != getEmptyKey() && "Cannot hash the empty key!");
				assert(Val != getTombstoneKey() && "Cannot hash the tombstone key!");
				return (unsigned)hash_value(StringRef(Val));
				}
				static bool isEqual(const char LHS, const char RHS) {
				if (RHS == getEmptyKey())
				return LHS == getEmptyKey();
				if (RHS == getTombstoneKey())
				return LHS == getTombstoneKey();
				return strcmp(LHS, RHS) == 0;
				}
				};

				MCStreamer &Out;
				MCSection *Sec;
				DenseMap<const char *, uint32_t, CStrDenseMapInfo> Pool;
				uint32_t Offset = 0;

				public:
				DWPStringPool(MCStreamer &Out, MCSection *Sec) : Out(Out), Sec(Sec) {}

				uint32_t getOffset(const char *Str, unsigned Length) {
				assert(strlen(Str) + 1 == Length && "Ensure length hint is correct");

				auto Pair = Pool.insert(std::make_pair(Str, Offset));
				if (Pair.second) {
				Out.switchSection(Sec);
				Out.emitBytes(StringRef(Str, Length));
				Offset += Length;
				}

				return Pair.first->second;
				}
				};*/

				class DWPStringTable {

				struct CStrDenseMapInfo {
				static inline const char *getEmptyKey() {
				return reinterpret_cast<const char *>(~static_cast<uintptr_t>(0));
				}
				static inline const char *getTombstoneKey() {
				return reinterpret_cast<const char *>(~static_cast<uintptr_t>(1));
				}
				static unsigned getHashValue(const char *Val) {
				assert(Val != getEmptyKey() && "Cannot hash the empty key!");
				assert(Val != getTombstoneKey() && "Cannot hash the tombstone key!");
				return (unsigned)hash_value(StringRef(Val));
				}
				static bool isEqual(const char LHS, const char RHS) {
				if (RHS == getEmptyKey())
				return LHS == getEmptyKey();
				if (RHS == getTombstoneKey())
				return LHS == getTombstoneKey();
				return strcmp(LHS, RHS) == 0;
				}
				};

				static void emitSection(MCStreamer &Out, MCSection *StrSection,
				MCSection *StrOffsetSection, std::vector<StringRef> &OutStrings,
				std::vector<uint64_t> &OutOffsets) {
				// emit debug_str
				Out.switchSection(StrSection);
				for (auto &Str : OutStrings) {
				Out.emitBytes(Str);
				}

				// emit offsets
				Out.switchSection(StrOffsetSection);
				for (uint64_t Offset : OutOffsets) {
				Out.emitIntValue(Offset, 4);
				}
				}

				std::vector<StringRef> OutStringDedup, OutStringAll;
				DenseMap<uint64_t, StringRef> OutHeaders;
				StringMap<uint64_t> StringOffsetMap;
				uint32_t NextInsertOffset = 0;

				public:

				DWPStringTable(uint64_t InputCount) {
				OutStringDedup.reserve(InputCount);
				OutStringAll.reserve(InputCount);
				}

				void insertRawSection(StringRef CurStrSection, StringRef CurStrOffsetSection, uint16_t Version) {
				// Could possibly produce an error or warning if one of these was non-null but
				// the other was null.
				if (CurStrSection.empty() \|\| CurStrOffsetSection.empty())
				return;

				DataExtractor Data(CurStrSection, true, 0);
				uint64_t LocalOffset = 0;
				uint64_t PrevOffset = 0;
				DenseMap<uint64_t, StringRef> OffsetStringMap;
				while (const char *S = Data.getCStr(&LocalOffset)) {
				// OffsetRemapping[PrevOffset] =
				// Strings.getOffset(S, LocalOffset - PrevOffset);
				StringRef Str(S, LocalOffset - PrevOffset);
				auto Pair = StringOffsetMap.insert(std::make_pair(Str, NextInsertOffset));
				if (Pair.second) {
				NextInsertOffset += LocalOffset - PrevOffset;
				OutStringDedup.push_back(Str);
				}
				auto OffsetStringPair = OffsetStringMap.insert(std::make_pair(PrevOffset, Str));
				assert(OffsetStringPair.second && "duplicate offset value");
				PrevOffset = LocalOffset;
				}

				Data = DataExtractor(CurStrOffsetSection, true, 0);

				uint64_t HeaderSize = debugStrOffsetsHeaderSize(Data, Version);
				uint64_t Offset = 0;
				uint64_t Size = CurStrOffsetSection.size();
				// FIXME: This can be caused by bad input and should be handled as such.
				assert(HeaderSize <= Size && "StrOffsetSection size is less than its header");
				// Copy the header to the output.
				auto Pair = OutHeaders.insert(
				std::make_pair(OutStringAll.size(), Data.getBytes(&Offset, HeaderSize)));
				assert(Pair.second && "duplicate header position");
				while (Offset < Size) {
				auto OldOffset = Data.getU32(&Offset);
				auto Pair = OffsetStringMap.find(OldOffset);
				assert(Pair != OffsetStringMap.end() && "str offset not found");
				OutStringAll.push_back(Pair->second);
				}
				}

				static void MergeAndOutWithoutDup(const std::vector<DWPStringTable *> &Tables,
				MCStreamer &Out, MCSection *StrSection,
				MCSection *StrOffsetSection) {
				// build new string offset mapping
				StringMap<uint64_t> StringOffsetMap;
				std::vector<StringRef> OutStringDedup, OutStringAll;
				std::vector<uint64_t> OutOffsets;
				DenseMap<uint64_t, StringRef> OutHeaders;

				// reserve out strings first
				uint64_t StrDedupCount = 0, StrAllCount = 0;
				for (DWPStringTable *Table : Tables) {
				StrDedupCount += Table->OutStringDedup.size();
				StrAllCount += Table->OutStringAll.size();
				}
				OutStringDedup.reserve(StrDedupCount);
				OutStringAll.reserve(StrAllCount);
				OutOffsets.reserve(StrAllCount);

				// build mapping
				uint64_t NextInsertIndex = 0;
				for (DWPStringTable *Table : Tables) {
				for (uint64_t I = 0; I < Table->OutStringDedup.size(); ++I) {
				StringRef &Str = Table->OutStringDedup.at(I);
				auto GlobalPair = StringOffsetMap.insert(std::make_pair(Str, NextInsertIndex));
				if (GlobalPair.second) {
				// add entry for new string
				NextInsertIndex += Str.size();
				OutStringDedup.push_back(Str);
				}
				}
				for (uint64_t I = 0; I < Table->OutStringAll.size(); ++I) {
				auto HeaderPair = Table->OutHeaders.find(I);
				if (HeaderPair != Table->OutHeaders.end()) {
				auto Pair = OutHeaders.insert(std::make_pair(I, HeaderPair->second));
				assert(Pair.second && "duplicate header emitting position when building mapping");
				}
				StringRef &Str = Table->OutStringAll.at(I);
				OutStringAll.push_back(Str);
				auto Pair = StringOffsetMap.find(Str);
				assert((Pair != StringOffsetMap.end()) && "string not found in offsets mapping");
				OutOffsets.push_back(Pair->second);
				}
				}

				// emit strings
				if (!OutStringDedup.empty()) {
				Out.switchSection(StrSection);
				for (StringRef &Str : OutStringDedup) {
				Out.emitBytes(Str);
				}
				}
				// emit offsets
				if (!OutOffsets.empty()) {
				Out.switchSection(StrOffsetSection);
				for (uint64_t I = 0; I < OutOffsets.size(); ++I) {
				auto HeaderPair = OutHeaders.find(I);
				if (HeaderPair != OutHeaders.end()) {
				Out.emitBytes(HeaderPair->second);
				}
				Out.emitIntValue(OutOffsets.at(I), 4);
				}
				}
				}
				};
				}

				#endif

llvm/lib/DWP/OrderedConsumerQueue.h

This file was added.

				#ifndef TOOLS_LLVM_DWP_ORDERCONQUEUE
				#define TOOLS_LLVM_DWP_ORDERCONQUEUE

				#include <memory>
				#include <thread>
				#include <condition_variable>
				#include <cassert>
				#include "llvm/ADT/PriorityQueue.h"


				namespace llvm {

				template <typename T>
				class OrderedConsumerQueue {
				public:

				OrderedConsumerQueue() = default;
				OrderedConsumerQueue(uint64_t InputBegin)
				: OrderedConsumerQueue(std::numeric_limits<decltype(MaxSize)>::max(), InputBegin) {}
				OrderedConsumerQueue(uint64_t MaxSize, uint64_t InputBegin)
				: MaxSize(MaxSize), ExpectingIndex(InputBegin), BeginIndex(InputBegin) {}
				OrderedConsumerQueue(const OrderedConsumerQueue &) = delete;
				~OrderedConsumerQueue() = default;

				OrderedConsumerQueue(OrderedConsumerQueue &&rhs) noexcept
				: Queue(std::move(rhs.Queue)), MaxSize(rhs.MaxSize),
				ExpectingIndex(rhs.ExpectingIndex), BeginIndex(rhs.BeginIndex) {}

				OrderedConsumerQueue &operator=(const OrderedConsumerQueue &) = delete;
				OrderedConsumerQueue &operator=(OrderedConsumerQueue &&rhs) = delete;

				uint64_t Size() {
				std::lock_guard<std::mutex> Lock(Mutex);
				return Queue.size();
				}

				uint64_t GetBeginIndex() const {
				return BeginIndex;
				}

				void Reset() {
				std::lock_guard<std::mutex> Lock(Mutex);
				Queue.clear();
				ExpectingIndex = BeginIndex;
				}

				void Emplace(std::shared_ptr<T> ObjPtr) {
				std::unique_lock<std::mutex> Lock(Mutex);
				FullCV.wait(Lock,
				[&]() { return Queue.size() < MaxSize; });
				Queue.emplace(ObjPtr);
				Lock.unlock();
				EmptyCV.notify_all();
				}

				std::shared_ptr<T> Pop() {
				// assert(Queue.size() > 0 && "empty queue underflow");
				std::unique_lock<std::mutex> Lock(Mutex);
				EmptyCV.wait(Lock, [&]() {
				if (!Queue.empty()) {
				const auto Top = Queue.top();
				if (Top->Done()) {
				return true;
				}
				if (Top->GetOrderIndex() == ExpectingIndex) {
				ExpectingIndex++;
				return true;
				}
				}
				return false;
				});
				// fetch worker state
				std::shared_ptr<T> ObjPtr = Queue.top();
				Queue.pop();
				Lock.unlock();
				FullCV.notify_all();
				return ObjPtr;
				}

				private:
				struct CmpFunctor {
				bool operator()(const std::shared_ptr<T> &X, const std::shared_ptr<T> &Y) const {
				return X->GetOrderIndex() > Y->GetOrderIndex();
				}
				};

				PriorityQueue<std::shared_ptr<T>, std::vector<std::shared_ptr<T>>, CmpFunctor> Queue;
				uint64_t MaxSize = std::numeric_limits<decltype(MaxSize)>::max();
				uint64_t ExpectingIndex = 0, BeginIndex = 0;
				std::condition_variable EmptyCV, FullCV;
				std::mutex Mutex;
				};

				}

				#endif

llvm/lib/DWP/WorkerState.h

This file was added.

				#ifndef TOOLS_LLVM_DWP_WORKERSTATE
				#define TOOLS_LLVM_DWP_WORKERSTATE

				#include <thread>
				#include "llvm/DWP/DWP.h"
				#include "DWPStringPool.h"
				#include "OrderedConsumerQueue.h"
				#include "llvm/DebugInfo/DWARF/DWARFUnitIndex.h"
				#include "llvm/ADT/DenseMap.h"

				namespace llvm {
				struct FileIndexEntry {
				uint32_t Index = -1;
				bool Complete = false;

				bool Done() const { return Complete; }

				static std::shared_ptr<FileIndexEntry> MakeDone() {
				FileIndexEntry Result(true);
				Result.Complete = true;
				return std::make_shared<FileIndexEntry>(Result);
				}
				FileIndexEntry(uint32_t Index) : Index(Index) {}
				FileIndexEntry(const FileIndexEntry &) = default;
				};

				struct DwpWorkerState {
				uint32_t Id = -1;
				uint32_t LocalInputIndex;
				std::vector<std::pair<MCSection *, StringRef>> HandleSectionOut;
				InfoSectionUnitHeader Header;
				std::vector<std::pair<DWARFSectionKind, uint32_t>> SectionLength;
				StringRef CurCUIndexSection, CurStrOffsetSection, CurStrSection, AbbrevSection;
				std::vector<StringRef> CurInfoSection, CurTypesSection;
				DWARFUnitIndex CUIndex {DW_SECT_INFO};
				DenseMap<uint32_t, StringRef> CUInfoSectionMap;
				std::unique_ptr<DWARFUnitIndex> TUIndex {nullptr};
				DenseMap<uint32_t, CompileUnitIdentifiers> CUIndexIDs;
				llvm::DWARFSectionKind TUSectionKind;
				MCSection *OutSection;
				StringRef TypeInputSection;
				Error ErrorState {Error::success()};
				bool Complete = false;

				DwpWorkerState() = default;

				DwpWorkerState(Error InError)
				: ErrorState(std::move(InError)) {}

				explicit DwpWorkerState(bool Complete) : LocalInputIndex(0), Complete(Complete) {
				assert(!ErrorState && "this is certainly success!");
				}

				DwpWorkerState(const DwpWorkerState &) = delete;
				DwpWorkerState(DwpWorkerState &&) = delete;
				~DwpWorkerState() = default;

				bool Done() const { return Complete; }
				uint64_t GetOrderIndex() const { return LocalInputIndex; }
				};

				struct StrWorkerState {
				// uint32_t Id = -1;
				uint32_t FileIndex = -1;
				bool Complete = false;
				StringRef CurStrSection, CurStrOffsetSection;
				uint16_t Version;

				StrWorkerState() = default;
				explicit StrWorkerState(bool Complete) : FileIndex(0), Complete(Complete), Version(-1) {}
				StrWorkerState(const StrWorkerState &) = delete;
				StrWorkerState(StrWorkerState &&) noexcept = default;

				StrWorkerState(uint32_t FileIndex,
				StringRef CurStrSection, StringRef CurStrOffsetSection,
				uint16_t Version)
				: FileIndex(FileIndex), CurStrSection(CurStrSection),
				CurStrOffsetSection(CurStrOffsetSection), Version(Version) {}

				uint32_t GetOrderIndex() const {
				return FileIndex;
				}

				uint32_t Done() const {
				return Complete;
				}

				static std::shared_ptr<StrWorkerState> MakeDone() {
				return std::make_shared<StrWorkerState>(true);
				}
				};

				class StrWorker {
				public:
				StrWorker(uint32_t Id, uint64_t TotalInput, OrderedConsumerQueue<StrWorkerState> &Queue)
				: StringTable(TotalInput), Id(Id), Queue(Queue), TotalInput(TotalInput) {}

				void operator()() {
				while (true) {
				if (InputCount >= TotalInput) {
				break;
				}
				std::shared_ptr<StrWorkerState> WorkerState = Queue.Pop();
				if (WorkerState->Done()) {
				break;
				}
				StringTable.insertRawSection(WorkerState->CurStrSection,
				WorkerState->CurStrOffsetSection,
				WorkerState->Version);
				InputCount++;
				}
				}

				static void WorkersToSectionWithoutDup(std::vector<StrWorker> &Workers,
				MCStreamer &Out, MCSection *StrSection,
				MCSection *StrOffsetSection) {
				std::vector<DWPStringTable *> StringTables;
				StringTables.reserve(Workers.size());
				for (auto &Worker : Workers) {
				StringTables.push_back(&Worker.StringTable);
				}
				DWPStringTable::MergeAndOutWithoutDup(StringTables, Out, StrSection, StrOffsetSection);
				}

				private:
				DWPStringTable StringTable;
				OrderedConsumerQueue<StrWorkerState> &Queue;
				const uint64_t TotalInput;
				uint64_t InputCount = 0;
				uint32_t Id = -1;
				};

				}

				#endif

llvm/tools/llvm-dwp/llvm-dwp.cpp

Show All 40 Lines	static cl::list<std::string>
InputFiles(cl::Positional, cl::desc("<input files>"), cl::cat(DwpCategory));		InputFiles(cl::Positional, cl::desc("<input files>"), cl::cat(DwpCategory));

static cl::list<std::string> ExecFilenames(		static cl::list<std::string> ExecFilenames(
"e",		"e",
cl::desc(		cl::desc(
"Specify the executable/library files to get the list of *.dwo from"),		"Specify the executable/library files to get the list of *.dwo from"),
cl::value_desc("filename"), cl::cat(DwpCategory));		cl::value_desc("filename"), cl::cat(DwpCategory));

		static cl::opt<uint32_t> InputWorkerCount(
		"input-nth", cl::desc("Specify the number of input parsing thread."),
		cl::value_desc("#input_workers"), cl::init(1), cl::cat(DwpCategory));

		static cl::opt<uint32_t> StrWorkerCount(
		"str-nth",
		cl::desc("Specify the number of str & str_offset section merging thread."),
		cl::value_desc("#str_workers"), cl::init(1), cl::cat(DwpCategory));

static cl::opt<std::string> OutputFilename(cl::Required, "o",		static cl::opt<std::string> OutputFilename(cl::Required, "o",
cl::desc("Specify the output file."),		cl::desc("Specify the output file."),
cl::value_desc("filename"),		cl::value_desc("filename"),
cl::cat(DwpCategory));		cl::cat(DwpCategory));

static Expected<SmallVector<std::string, 16>>		static Expected<SmallVector<std::string, 16>>
getDWOFilenames(StringRef ExecFilename) {		getDWOFilenames(StringRef ExecFilename) {
auto ErrOrObj = object::ObjectFile::createObjectFile(ExecFilename);		auto ErrOrObj = object::ObjectFile::createObjectFile(ExecFilename);
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
std::unique_ptr<MCStreamer> MS(TheTarget->createMCObjectStreamer(		std::unique_ptr<MCStreamer> MS(TheTarget->createMCObjectStreamer(
*ErrOrTriple, MC, std::unique_ptr<MCAsmBackend>(MAB),		*ErrOrTriple, MC, std::unique_ptr<MCAsmBackend>(MAB),
MAB->createObjectWriter(OS), std::unique_ptr<MCCodeEmitter>(MCE), MSTI,		MAB->createObjectWriter(OS), std::unique_ptr<MCCodeEmitter>(MCE), MSTI,
MCOptions.MCRelaxAll, MCOptions.MCIncrementalLinkerCompatible,		MCOptions.MCRelaxAll, MCOptions.MCIncrementalLinkerCompatible,
/DWARFMustBeAtTheEnd/ false));		/DWARFMustBeAtTheEnd/ false));
if (!MS)		if (!MS)
return error("no object streamer for target " + TripleName, Context);		return error("no object streamer for target " + TripleName, Context);

if (auto Err = write(*MS, DWOFilenames)) {		// if (auto Err = write(*MS, DWOFilenames)) {
		if (auto Err = writeMultithread(*MS, DWOFilenames, InputWorkerCount,
		StrWorkerCount)) {
logAllUnhandledErrors(std::move(Err), WithColor::error());		logAllUnhandledErrors(std::move(Err), WithColor::error());
return 1;		return 1;
}		}

MS->finish();		MS->finish();
OutFile.keep();		OutFile.keep();
return 0;		return 0;
}		}