Download Raw Diff

Details

Reviewers

tejohnson
mehdi_amini
pcc
evgeny777
rupprecht

Commits

rG617d64f6c5f8: Re-land [ThinLTO] Re-order modules for optimal multi-threaded processing
rG6537004913f3: [ThinLTO] Re-order modules for optimal multi-threaded processing

Summary

This is a reimplementation of D60495 but with Teresa's suggestion applied: https://reviews.llvm.org/D60495#1562871

I've tested a 3-stage compilation, the graph below shows linking of clang.exe with -flto=thin, /opt:lldltojobs=all, no LTO cache, and -DLLVM_INTEGRATED_CRT_ALLOC=d:\git\rpmalloc on stage 1 & 2 to work around Windows Heap's scaling issues on many-core machines. Test running on 36-core Xeon 6140.

Before (total run is 100 sec):

After patch (total run is 85 sec):

The remaining issue after the falloff in the graph is PassBuilder.cpp which takes a long time to opt+codegen. If that file was split into several .CPPs, I suppose the linking could complete in 70 sec.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aganea created this revision.Sep 19 2020, 8:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2020, 8:31 AM

Herald added subscribers: llvm-commits, dexonsmith, steven_wu and 3 others. · View Herald Transcript

aganea requested review of this revision.Sep 19 2020, 8:31 AM

aganea edited the summary of this revision. (Show Details)

aganea edited the summary of this revision. (Show Details)Sep 19 2020, 8:38 AM

Harbormaster completed remote builds in B72273: Diff 292967.Sep 19 2020, 9:00 AM

Thanks for doing this!

What should I do about that? Some tests could be fixed (ordering) but I am unsure about others.

I took a quick look and the ones I saw seemed to be due to disassembling the wrong temp file, which is based on the numbering provided by "Task". Presumably you want to pass in the original ordering here, so I guess RegularLTO.ParallelCodeGenParallelismLevel + IndexCount?

Works as intended, thanks @tejohnson!
All tests pass. Please have another look.

LGTM with comment suggestion.

llvm/include/llvm/LTO/LTO.h
94	Nit, it isn't actually producing a reordered container. How about something like "Produces a container ordering for optimal multi-threaded processing. Returns ordered indices to elements in the input array."

This revision is now accepted and ready to land.Sep 22 2020, 8:03 AM

aganea marked an inline comment as done.Sep 22 2020, 8:07 AM

aganea added inline comments.

llvm/include/llvm/LTO/LTO.h
94	Changed, thanks for suggestion!

aganea edited the summary of this revision. (Show Details)Sep 22 2020, 8:23 AM

This revision was landed with ongoing or failed builds.Sep 22 2020, 8:26 AM

Closed by commit rG6537004913f3: [ThinLTO] Re-order modules for optimal multi-threaded processing (authored by aganea). · Explain Why

This revision was automatically updated to reflect the committed changes.

aganea marked an inline comment as done.

aganea added a commit: rG6537004913f3: [ThinLTO] Re-order modules for optimal multi-threaded processing.

Harbormaster completed remote builds in B72519: Diff 293455.Sep 22 2020, 8:33 AM

rupprecht added a reverting change: rG9b5b3050237d: Temporarily revert "[ThinLTO] Re-order modules for optimal multi-threaded….Oct 9 2020, 2:44 PM

We had to revert due to unexpected side effects on distributed ThinLTO. See the comments on the revert commit about the type of failures. Essentially, for distributed ThinLTO, the backends is of type WriteIndexesThinBackend. It runs serially, not in parallel via threads, and writes all the files needed by the distributed backend processes. One of the files it writes is a list of files to include in the final native link, which is provided to the final link process. The order modules are written into this file is the order they will be linked. With this patch, the order is changed and no longer matches the original link order, and that can produce unexpected failures. Probably the best fix is to simply skip the reordering when BackendProc is a WriteIndexesThinBackend. It won't help anyway given that it is a serial backend.

Check for WriteIndexesThinBackend before re-ordering modules.

aganea reopened this revision.Oct 13 2020, 8:21 AM

This revision is now accepted and ready to land.Oct 13 2020, 8:21 AM

aganea requested review of this revision.Oct 13 2020, 8:21 AM

Harbormaster completed remote builds in B74939: Diff 297869.Oct 13 2020, 8:52 AM

It would be great to test that the ordering of the files in the object file list doesn't change with the write indexes backend. Unfortunately, currently llvm-lto2 doesn't have an option to trigger this output file, but you could use either the gold plugin or lld. There is an existing gold plugin test that does test this output file:
llvm/test/tools/gold/X86/thinlto_emit_linked_objects.ll

It looks like that test would have failed if the object files had been reordered, but it looks like the first one listed is bigger so due to luck we ended up with the same ordering with your original patch. You could probably make a new version of this test that doesn't use --start-lib/--end-lib but rather just passes both bitcode objects to gold normally, and reorder the order the two bitcode files are passed so the Input version which appears smaller is passed first, then confirm that you get the expected ordering in the object list file %t3 (which would be the order that they were passed to gold). Would be good to confirm that without the new guard against reordering with the write indexes backend the ordering gets switched, which is what we want to prevent.

llvm/lib/LTO/LTO.cpp
1468	The ParallelCodeGenParallelismLevel is unrelated to the number of threads we will be using for ThinLTO backends (rather it is related to regular LTO code generation). For the latter you would need to do something like call getThreadCount() on the BackendThreadPool on the InProcessThinBackend.
1469	Needs comment as to why we are not doing this for WriteIndexesKind

Simplify & address comments.

@tejohnson I added the test in LLD, since the gold tests only run on Linux, which is harder for me to test & debug. The test fails when the following block is removed: if (BackendProc->getThreadCount() == 1) { ... }.
This test also implictly covers the "InProcessThinBackend" codepath with /opt:lldltojobs=1 which I don't see how to cover explicitly otherwise.

llvm/lib/LTO/LTO.cpp
1468	Fixed by adding a `virtual ThinBackendProc::getThreadCount()` API.
1469	With the new `ThinBackendProc::getThreadCount()` things are a bit more explicit I think.

In D87966#2328736, @aganea wrote:

@tejohnson I added the test in LLD, since the gold tests only run on Linux, which is harder for me to test & debug. The test fails when the following block is removed: if (BackendProc->getThreadCount() == 1) { ... }.

Great, thanks! LGTM with request for comment as described below.

This test also implictly covers the "InProcessThinBackend" codepath with /opt:lldltojobs=1 which I don't see how to cover explicitly otherwise.

I don't see how this test covers that code path since it isn't using in process backends, but I'm not really sure how to test this case effectively - like the original patch, it is just a performance change on the in process backends case.

llvm/lib/LTO/LTO.cpp
1469	The virtual method looks good and cleaner. But I'd like to capture in a comment somewhere that for write indexes thin backend there is a user visible effect if we were to reorder, in that the LinkedObjectsFile written for distributed ThinLTO will contain the objects in a different order, which will in turn affect the final link order assuming that file is used as intended. I.e. we don't just skip it in the thread = 1 case because it is unnecessary.

This revision is now accepted and ready to land.Oct 13 2020, 3:51 PM

Harbormaster completed remote builds in B74988: Diff 297971.Oct 13 2020, 4:31 PM

This revision was landed with ongoing or failed builds.Oct 13 2020, 6:54 PM

Closed by commit rG617d64f6c5f8: Re-land [ThinLTO] Re-order modules for optimal multi-threaded processing (authored by aganea). · Explain Why

This revision was automatically updated to reflect the committed changes.

aganea marked an inline comment as done.

aganea added a commit: rG617d64f6c5f8: Re-land [ThinLTO] Re-order modules for optimal multi-threaded processing.

In D87966#2328818, @tejohnson wrote:

In D87966#2328736, @aganea wrote:

@tejohnson I added the test in LLD, since the gold tests only run on Linux, which is harder for me to test & debug. The test fails when the following block is removed: if (BackendProc->getThreadCount() == 1) { ... }.

Great, thanks! LGTM with request for comment as described below.

This test also implictly covers the "InProcessThinBackend" codepath with /opt:lldltojobs=1 which I don't see how to cover explicitly otherwise.

I don't see how this test covers that code path since it isn't using in process backends, but I'm not really sure how to test this case effectively - like the original patch, it is just a performance change on the in process backends case.

I meant if we have:
; RUN: lld-link /lldsavetemps /out:%t4.exe /entry:main /subsystem:console %t2.obj %t1.obj /opt:lldltojobs=1
We end up in if (BackendProc->getThreadCount() == 1) as if we did:
RUN: lld-link -thinlto-index-only:%t3 /entry:main %t1.obj %t2.obj

llvm/lib/LTO/LTO.cpp
1469	I've added more relevant comments, as suggested.

In D87966#2329086, @aganea wrote:

In D87966#2328818, @tejohnson wrote:

In D87966#2328736, @aganea wrote:

@tejohnson I added the test in LLD, since the gold tests only run on Linux, which is harder for me to test & debug. The test fails when the following block is removed: if (BackendProc->getThreadCount() == 1) { ... }.

Great, thanks! LGTM with request for comment as described below.

This test also implictly covers the "InProcessThinBackend" codepath with /opt:lldltojobs=1 which I don't see how to cover explicitly otherwise.

I don't see how this test covers that code path since it isn't using in process backends, but I'm not really sure how to test this case effectively - like the original patch, it is just a performance change on the in process backends case.

I meant if we have:
; RUN: lld-link /lldsavetemps /out:%t4.exe /entry:main /subsystem:console %t2.obj %t1.obj /opt:lldltojobs=1
We end up in if (BackendProc->getThreadCount() == 1) as if we did:
RUN: lld-link -thinlto-index-only:%t3 /entry:main %t1.obj %t2.obj

Oh, I see what you are saying. Makes sense. New comment looks good, thanks.

Diff 298010

lld/test/COFF/thinlto-module-order.ll

This file was added.

				; REQUIRES: x86

				; RUN: opt -thinlto-bc %s -o %t1.obj
				; RUN: opt -thinlto-bc %p/Inputs/thinlto.ll -o %t2.obj

				; Ensure module re-ordering in LTO::runThinLTO does not affect the processing order.

				; RUN: lld-link -thinlto-index-only:%t3 /entry:main %t1.obj %t2.obj
				; RUN: cat %t3 \| FileCheck %s --check-prefix=NORMAL
				; NORMAL: thinlto-module-order.ll.tmp1.o
				; NORMAL: thinlto-module-order.ll.tmp2.o

				; RUN: lld-link -thinlto-index-only:%t3 /entry:main %t2.obj %t1.obj
				; RUN: cat %t3 \| FileCheck %s --check-prefix=REVERSED
				; REVERSED: thinlto-module-order.ll.tmp2.o
				; REVERSED: thinlto-module-order.ll.tmp1.o

				target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-pc-windows-msvc19.0.24215"

				declare void @g(...)

				define void @main() {
				call void (...) @g()
				ret void
				}

llvm/include/llvm/LTO/LTO.h

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	setupLLVMOptimizationRemarks(LLVMContext &Context, StringRef RemarksFilename,			setupLLVMOptimizationRemarks(LLVMContext &Context, StringRef RemarksFilename,
	StringRef RemarksPasses, StringRef RemarksFormat,			StringRef RemarksPasses, StringRef RemarksFormat,
	bool RemarksWithHotness, int Count = -1);			bool RemarksWithHotness, int Count = -1);

	/// Setups the output file for saving statistics.			/// Setups the output file for saving statistics.
	Expected<std::unique_ptr<ToolOutputFile>>			Expected<std::unique_ptr<ToolOutputFile>>
	setupStatsFile(StringRef StatsFilename);			setupStatsFile(StringRef StatsFilename);

				/// Produces a container ordering for optimal multi-threaded processing. Returns
				tejohnsonUnsubmitted Done Reply Inline Actions Nit, it isn't actually producing a reordered container. How about something like "Produces a container ordering for optimal multi-threaded processing. Returns ordered indices to elements in the input array." tejohnson: Nit, it isn't actually producing a reordered container. How about something like "Produces a…
				aganeaAuthorUnsubmitted Done Reply Inline Actions Changed, thanks for suggestion! aganea: Changed, thanks for suggestion!
				/// ordered indices to elements in the input array.
				std::vector<int> generateModulesOrdering(ArrayRef<BitcodeModule *> R);

	class LTO;			class LTO;
	struct SymbolResolution;			struct SymbolResolution;
	class ThinBackendProc;			class ThinBackendProc;

	/// An input file. This is a symbol table wrapper that only exposes the			/// An input file. This is a symbol table wrapper that only exposes the
	/// information that an LTO client should need in order to do symbol resolution.			/// information that an LTO client should need in order to do symbol resolution.
	class InputFile {			class InputFile {
	public:			public:
	▲ Show 20 Lines • Show All 359 Lines • Show Last 20 Lines

llvm/lib/LTO/LTO.cpp

Show First 20 Lines • Show All 1,101 Lines • ▼ Show 20 Lines	public:
virtual ~ThinBackendProc() {}		virtual ~ThinBackendProc() {}
virtual Error start(		virtual Error start(
unsigned Task, BitcodeModule BM,		unsigned Task, BitcodeModule BM,
const FunctionImporter::ImportMapTy &ImportList,		const FunctionImporter::ImportMapTy &ImportList,
const FunctionImporter::ExportSetTy &ExportList,		const FunctionImporter::ExportSetTy &ExportList,
const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR,		const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR,
MapVector<StringRef, BitcodeModule> &ModuleMap) = 0;		MapVector<StringRef, BitcodeModule> &ModuleMap) = 0;
virtual Error wait() = 0;		virtual Error wait() = 0;
		virtual unsigned getThreadCount() = 0;
};		};

namespace {		namespace {
class InProcessThinBackend : public ThinBackendProc {		class InProcessThinBackend : public ThinBackendProc {
ThreadPool BackendThreadPool;		ThreadPool BackendThreadPool;
AddStreamFn AddStream;		AddStreamFn AddStream;
NativeObjectCache Cache;		NativeObjectCache Cache;
std::set<GlobalValue::GUID> CfiFunctionDefs;		std::set<GlobalValue::GUID> CfiFunctionDefs;
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	public:

Error wait() override {		Error wait() override {
BackendThreadPool.wait();		BackendThreadPool.wait();
if (Err)		if (Err)
return std::move(*Err);		return std::move(*Err);
else		else
return Error::success();		return Error::success();
}		}

		unsigned getThreadCount() override {
		return BackendThreadPool.getThreadCount();
		}
};		};
} // end anonymous namespace		} // end anonymous namespace

ThinBackend lto::createInProcessThinBackend(ThreadPoolStrategy Parallelism) {		ThinBackend lto::createInProcessThinBackend(ThreadPoolStrategy Parallelism) {
return [=](const Config &Conf, ModuleSummaryIndex &CombinedIndex,		return [=](const Config &Conf, ModuleSummaryIndex &CombinedIndex,
const StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,		const StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,
AddStreamFn AddStream, NativeObjectCache Cache) {		AddStreamFn AddStream, NativeObjectCache Cache) {
return std::make_unique<InProcessThinBackend>(		return std::make_unique<InProcessThinBackend>(
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	Error start(
}		}

if (OnWrite)		if (OnWrite)
OnWrite(std::string(ModulePath));		OnWrite(std::string(ModulePath));
return Error::success();		return Error::success();
}		}

Error wait() override { return Error::success(); }		Error wait() override { return Error::success(); }

		// WriteIndexesThinBackend should always return 1 to prevent module
		// re-ordering and avoid non-determinism in the final link.
		unsigned getThreadCount() override { return 1; }
};		};
} // end anonymous namespace		} // end anonymous namespace

ThinBackend lto::createWriteIndexesThinBackend(		ThinBackend lto::createWriteIndexesThinBackend(
std::string OldPrefix, std::string NewPrefix, bool ShouldEmitImportsFiles,		std::string OldPrefix, std::string NewPrefix, bool ShouldEmitImportsFiles,
raw_fd_ostream *LinkedObjectsFile, IndexWriteCallback OnWrite) {		raw_fd_ostream *LinkedObjectsFile, IndexWriteCallback OnWrite) {
return [=](const Config &Conf, ModuleSummaryIndex &CombinedIndex,		return [=](const Config &Conf, ModuleSummaryIndex &CombinedIndex,
const StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,		const StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	Error LTO::runThinLTO(AddStreamFn AddStream, NativeObjectCache Cache,

std::unique_ptr<ThinBackendProc> BackendProc =		std::unique_ptr<ThinBackendProc> BackendProc =
ThinLTO.Backend(Conf, ThinLTO.CombinedIndex, ModuleToDefinedGVSummaries,		ThinLTO.Backend(Conf, ThinLTO.CombinedIndex, ModuleToDefinedGVSummaries,
AddStream, Cache);		AddStream, Cache);

auto &ModuleMap =		auto &ModuleMap =
ThinLTO.ModulesToCompile ? *ThinLTO.ModulesToCompile : ThinLTO.ModuleMap;		ThinLTO.ModulesToCompile ? *ThinLTO.ModulesToCompile : ThinLTO.ModuleMap;

// Tasks 0 through ParallelCodeGenParallelismLevel-1 are reserved for combined		auto ProcessOneModule = [&](int I) -> Error {
// module and parallel code generation partitions.		auto &Mod = *(ModuleMap.begin() + I);
unsigned Task = RegularLTO.ParallelCodeGenParallelismLevel;		// Tasks 0 through ParallelCodeGenParallelismLevel-1 are reserved for
for (auto &Mod : ModuleMap) {		// combined module and parallel code generation partitions.
if (Error E = BackendProc->start(Task, Mod.second, ImportLists[Mod.first],		return BackendProc->start(RegularLTO.ParallelCodeGenParallelismLevel + I,
ExportLists[Mod.first],		Mod.second, ImportLists[Mod.first],
ResolvedODR[Mod.first], ThinLTO.ModuleMap))		ExportLists[Mod.first], ResolvedODR[Mod.first],
		ThinLTO.ModuleMap);
		};

		if (BackendProc->getThreadCount() == 1) {
		// Process the modules in the order they were provided on the command-line.
		// It is important for this codepath to be used for WriteIndexesThinBackend,
		// to ensure the emitted LinkedObjectsFile lists ThinLTO objects in the same
		tejohnsonUnsubmitted Done Reply Inline Actions The ParallelCodeGenParallelismLevel is unrelated to the number of threads we will be using for ThinLTO backends (rather it is related to regular LTO code generation). For the latter you would need to do something like call getThreadCount() on the BackendThreadPool on the InProcessThinBackend. tejohnson: The ParallelCodeGenParallelismLevel is unrelated to the number of threads we will be using for…
		aganeaAuthorUnsubmitted Done Reply Inline Actions Fixed by adding a `virtual ThinBackendProc::getThreadCount()` API. aganea: Fixed by adding a `virtual ThinBackendProc::getThreadCount()` API.
		// order as the inputs, which otherwise would affect the final link order.
		tejohnsonUnsubmitted Done Reply Inline Actions Needs comment as to why we are not doing this for WriteIndexesKind tejohnson: Needs comment as to why we are not doing this for WriteIndexesKind
		aganeaAuthorUnsubmitted Done Reply Inline Actions With the new `ThinBackendProc::getThreadCount()` things are a bit more explicit I think. aganea: With the new `ThinBackendProc::getThreadCount()` things are a bit more explicit I think.
		tejohnsonUnsubmitted Done Reply Inline Actions The virtual method looks good and cleaner. But I'd like to capture in a comment somewhere that for write indexes thin backend there is a user visible effect if we were to reorder, in that the LinkedObjectsFile written for distributed ThinLTO will contain the objects in a different order, which will in turn affect the final link order assuming that file is used as intended. I.e. we don't just skip it in the thread = 1 case because it is unnecessary. tejohnson: The virtual method looks good and cleaner. But I'd like to capture in a comment somewhere that…
		aganeaAuthorUnsubmitted Done Reply Inline Actions I've added more relevant comments, as suggested. aganea: I've added more relevant comments, as suggested.
		for (int I = 0, E = ModuleMap.size(); I != E; ++I)
		if (Error E = ProcessOneModule(I))
		return E;
		} else {
		// When executing in parallel, process largest bitsize modules first to
		// improve parallelism, and avoid starving the thread pool near the end.
		// This saves about 15 sec on a 36-core machine while link `clang.exe` (out
		// of 100 sec).
		std::vector<BitcodeModule *> ModulesVec;
		ModulesVec.reserve(ModuleMap.size());
		for (auto &Mod : ModuleMap)
		ModulesVec.push_back(&Mod.second);
		for (int I : generateModulesOrdering(ModulesVec))
		if (Error E = ProcessOneModule(I))
return E;		return E;
++Task;
}		}

return BackendProc->wait();		return BackendProc->wait();
}		}

Expected<std::unique_ptr<ToolOutputFile>> lto::setupLLVMOptimizationRemarks(		Expected<std::unique_ptr<ToolOutputFile>> lto::setupLLVMOptimizationRemarks(
LLVMContext &Context, StringRef RemarksFilename, StringRef RemarksPasses,		LLVMContext &Context, StringRef RemarksFilename, StringRef RemarksPasses,
StringRef RemarksFormat, bool RemarksWithHotness, int Count) {		StringRef RemarksFormat, bool RemarksWithHotness, int Count) {
std::string Filename = std::string(RemarksFilename);		std::string Filename = std::string(RemarksFilename);
// For ThinLTO, file.opt.<format> becomes		// For ThinLTO, file.opt.<format> becomes
Show All 25 Lines	lto::setupStatsFile(StringRef StatsFilename) {
auto StatsFile =		auto StatsFile =
std::make_unique<ToolOutputFile>(StatsFilename, EC, sys::fs::OF_None);		std::make_unique<ToolOutputFile>(StatsFilename, EC, sys::fs::OF_None);
if (EC)		if (EC)
return errorCodeToError(EC);		return errorCodeToError(EC);

StatsFile->keep();		StatsFile->keep();
return std::move(StatsFile);		return std::move(StatsFile);
}		}

		// Compute the ordering we will process the inputs: the rough heuristic here
		// is to sort them per size so that the largest module get schedule as soon as
		// possible. This is purely a compile-time optimization.
		std::vector<int> lto::generateModulesOrdering(ArrayRef<BitcodeModule *> R) {
		std::vector<int> ModulesOrdering;
		ModulesOrdering.resize(R.size());
		std::iota(ModulesOrdering.begin(), ModulesOrdering.end(), 0);
		llvm::sort(ModulesOrdering, [&](int LeftIndex, int RightIndex) {
		auto LSize = R[LeftIndex]->getBuffer().size();
		auto RSize = R[RightIndex]->getBuffer().size();
		return LSize > RSize;
		});
		return ModulesOrdering;
		}

llvm/lib/LTO/ThinLTOCodeGenerator.cpp

Show First 20 Lines • Show All 1,048 Lines • ▼ Show 20 Lines	void ThinLTOCodeGenerator::run() {
for (auto &Module : Modules) {		for (auto &Module : Modules) {
auto ModuleIdentifier = Module->getName();		auto ModuleIdentifier = Module->getName();
ExportLists[ModuleIdentifier];		ExportLists[ModuleIdentifier];
ImportLists[ModuleIdentifier];		ImportLists[ModuleIdentifier];
ResolvedODR[ModuleIdentifier];		ResolvedODR[ModuleIdentifier];
ModuleToDefinedGVSummaries[ModuleIdentifier];		ModuleToDefinedGVSummaries[ModuleIdentifier];
}		}

// Compute the ordering we will process the inputs: the rough heuristic here		std::vector<BitcodeModule *> ModulesVec;
// is to sort them per size so that the largest module get schedule as soon as		ModulesVec.reserve(Modules.size());
// possible. This is purely a compile-time optimization.		for (auto &Mod : Modules)
std::vector<int> ModulesOrdering;		ModulesVec.push_back(&Mod->getSingleBitcodeModule());
ModulesOrdering.resize(Modules.size());		std::vector<int> ModulesOrdering = lto::generateModulesOrdering(ModulesVec);
std::iota(ModulesOrdering.begin(), ModulesOrdering.end(), 0);
llvm::sort(ModulesOrdering, [&](int LeftIndex, int RightIndex) {
auto LSize =
Modules[LeftIndex]->getSingleBitcodeModule().getBuffer().size();
auto RSize =
Modules[RightIndex]->getSingleBitcodeModule().getBuffer().size();
return LSize > RSize;
});

// Parallel optimizer + codegen		// Parallel optimizer + codegen
{		{
ThreadPool Pool(heavyweight_hardware_concurrency(ThreadCount));		ThreadPool Pool(heavyweight_hardware_concurrency(ThreadCount));
for (auto IndexCount : ModulesOrdering) {		for (auto IndexCount : ModulesOrdering) {
auto &Mod = Modules[IndexCount];		auto &Mod = Modules[IndexCount];
Pool.async([&](int count) {		Pool.async([&](int count) {
auto ModuleIdentifier = Mod->getName();		auto ModuleIdentifier = Mod->getName();
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Re-order modules for optimal multi-threaded processing
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 298010

lld/test/COFF/thinlto-module-order.ll

llvm/include/llvm/LTO/LTO.h

llvm/lib/LTO/LTO.cpp

llvm/lib/LTO/ThinLTOCodeGenerator.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Re-order modules for optimal multi-threaded processingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 298010

lld/test/COFF/thinlto-module-order.ll

llvm/include/llvm/LTO/LTO.h

llvm/lib/LTO/LTO.cpp

llvm/lib/LTO/ThinLTOCodeGenerator.cpp

[ThinLTO] Re-order modules for optimal multi-threaded processing
ClosedPublic