Download Raw Diff

Details

Reviewers

Commits

rGb014cc0f655d: [ORC] Add a LLJITWithThinLTOSummaries example in OrcV2Examples

Summary

The example demonstrates how to use a module summary index file produced for ThinLTO to:

find the module that defines the main entry point
find all extra modules that are required for the build

A LIT test runs the example as part of the LLVM test suite [1] and shows how to create a module summary index file.
The code also provides two Error types that can be useful when working with ThinLTO summaries.

[1] if LLVM_BUILD_EXAMPLES=ON and platform is not Windows

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sgraenitz created this revision.Aug 14 2020, 7:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2020, 7:47 AM

Herald added subscribers: dexonsmith, steven_wu, mgrang and 3 others. · View Herald Transcript

sgraenitz requested review of this revision.Aug 14 2020, 7:47 AM

I would like to remove the ThinLtoJIT example. It needs a more decent threading library to speed-up multithreaded compile times and that's easier to do out-of-tree.
This might be a useful (minimal) portion to keep in-tree. What do you think?

Harbormaster completed remote builds in B68414: Diff 285651.Aug 14 2020, 7:58 AM

Test discovery should ignore subdirectories that contain test inputs.

Harbormaster completed remote builds in B68416: Diff 285655.Aug 14 2020, 8:20 AM

LGTM.

What performance issues did you run in to with threading and performance? I haven't had a chance to look in to that yet.

In D85974#2218405, @lhames wrote:

What performance issues did you run in to with threading and performance? I haven't had a chance to look in to that yet.

Performance gains inherently depended on a smart handling of fine-grained async tasks. Otherwise the runtime cost for handling concurrency is easily eating up the gains quickly. I have the impression that the LLVM ThreadPool implementation is too limited here, e.g. there is no mechanism for priority-based scheduling. Trying to walk around the limitations added at lot of complexity that I didn't manage to handle. It might be easier with a decent threading library at hand or maybe a Rust-like async/await. More experiments to come out-of-tree :)

Fix clang-format issue and rebase

Harbormaster completed remote builds in B68434: Diff 285693.Aug 14 2020, 11:35 AM

Fix clang-tidy warnings

Harbormaster completed remote builds in B68517: Diff 285838.Aug 15 2020, 5:48 AM

In D85974#2218483, @sgraenitz wrote:

In D85974#2218405, @lhames wrote:

What performance issues did you run in to with threading and performance? I haven't had a chance to look in to that yet.

Performance gains inherently depended on a smart handling of fine-grained async tasks. Otherwise the runtime cost for handling concurrency is easily eating up the gains quickly. I have the impression that the LLVM ThreadPool implementation is too limited here, e.g. there is no mechanism for priority-based scheduling. Trying to walk around the limitations added at lot of complexity that I didn't manage to handle. It might be easier with a decent threading library at hand or maybe a Rust-like async/await. More experiments to come out-of-tree :)

Yep. We have the ExtensibleRTTI system available now, but I haven't had time to hook it up to MaterializationUnit -- doing so might give you some of the prioritization information that you need.

I also want to generalize the ExecutionSession dispatch API to handle arbitrary tasks, rather than just MaterializationUnits. If query handlers were dispatched rather than running on the thread that satisfies the last query dependence it should expose some new opportunities for concurrency.

I will be very interested to hear how your experiments go -- I'd love to get all this tuned to improve performance.

I will get back to performance evaluation maybe in a few weeks and sure I am happy to share my progress.

If query handlers were dispatched rather than running on the thread that satisfies the last query dependence it should expose some new opportunities for concurrency.

Indeed, that sounds promising. I hope it doesn't require adding more locking to the engine? In general, performance analysis only works in combination with comprehensive benchmark data. Maybe a good opportunity to create a benchmark suite for tracking both, single- and multi-threaded performance over time?

Back to the review: I'd like to keep the test for the example and see how the build servers behave. Generally it might be useful to have tests for all the "LLJITWith..." examples right?
Do you think it makes sense to land the patch on the weekend in order to keep the number of people getting annoyed by me breaking their builds at a minimum? :)

lhames accepted this revision.Aug 21 2020, 2:26 PM

This revision is now accepted and ready to land.Aug 21 2020, 2:26 PM

In D85974#2229156, @sgraenitz wrote:

I will get back to performance evaluation maybe in a few weeks and sure I am happy to share my progress.

If query handlers were dispatched rather than running on the thread that satisfies the last query dependence it should expose some new opportunities for concurrency.

Indeed, that sounds promising. I hope it doesn't require adding more locking to the engine? In general, performance analysis only works in combination with comprehensive benchmark data. Maybe a good opportunity to create a benchmark suite for tracking both, single- and multi-threaded performance over time?

It wouldn't introduce new static locking points. To the extent that it enables extra concurrency there's more opportunities for lock contention, but that's a good thing. :)

I 100% agree on the benchmarking idea -- we definitely want one of those.

In D85974#2229168, @sgraenitz wrote:

Back to the review: I'd like to keep the test for the example and see how the build servers behave. Generally it might be useful to have tests for all the "LLJITWith..." examples right?

Sorry -- I LGTM'd earlier but forgot to hit "accept". I think LLJITWith... is a good home for this.

Do you think it makes sense to land the patch on the weekend in order to keep the number of people getting annoyed by me breaking their builds at a minimum? :)

Up to you, but reverts are cheap -- if it works for you locally I'd say land away.

Closed by commit rGb014cc0f655d: [ORC] Add a LLJITWithThinLTOSummaries example in OrcV2Examples (authored by sgraenitz). · Explain WhyAug 23 2020, 5:05 AM

This revision was automatically updated to reflect the committed changes.

sgraenitz added a commit: rGb014cc0f655d: [ORC] Add a LLJITWithThinLTOSummaries example in OrcV2Examples.

Looks good. Two bots failed, both unrelated to my change.
http://green.lab.llvm.org/green/job/lldb-cmake/23457/
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/17430/

Diff 287250

llvm/examples/OrcV2Examples/CMakeLists.txt

	add_subdirectory(LLJITDumpObjects)			add_subdirectory(LLJITDumpObjects)
	add_subdirectory(LLJITWithCustomObjectLinkingLayer)			add_subdirectory(LLJITWithCustomObjectLinkingLayer)
	add_subdirectory(LLJITWithGDBRegistrationListener)			add_subdirectory(LLJITWithGDBRegistrationListener)
	add_subdirectory(LLJITWithInitializers)			add_subdirectory(LLJITWithInitializers)
	add_subdirectory(LLJITWithLazyReexports)			add_subdirectory(LLJITWithLazyReexports)
	add_subdirectory(LLJITWithObjectCache)			add_subdirectory(LLJITWithObjectCache)
	add_subdirectory(LLJITWithObjectLinkingLayerPlugin)			add_subdirectory(LLJITWithObjectLinkingLayerPlugin)
	add_subdirectory(LLJITWithTargetProcessControl)			add_subdirectory(LLJITWithTargetProcessControl)
				add_subdirectory(LLJITWithThinLTOSummaries)
	add_subdirectory(OrcV2CBindingsAddObjectFile)			add_subdirectory(OrcV2CBindingsAddObjectFile)
	add_subdirectory(OrcV2CBindingsBasicUsage)			add_subdirectory(OrcV2CBindingsBasicUsage)
	add_subdirectory(OrcV2CBindingsReflectProcessSymbols)			add_subdirectory(OrcV2CBindingsReflectProcessSymbols)

	if(CMAKE_HOST_UNIX)			if(CMAKE_HOST_UNIX)
	add_subdirectory(LLJITWithChildProcess)			add_subdirectory(LLJITWithChildProcess)
	endif()			endif()

llvm/examples/OrcV2Examples/LLJITWithThinLTOSummaries/CMakeLists.txt

This file was added.

				set(LLVM_LINK_COMPONENTS
				Core
				ExecutionEngine
				IRReader
				OrcJIT
				Support
				nativecodegen
				)

				add_llvm_example(LLJITWithThinLTOSummaries
				LLJITWithThinLTOSummaries.cpp
				)

llvm/examples/OrcV2Examples/LLJITWithThinLTOSummaries/LLJITWithThinLTOSummaries.cpp

This file was added.

				//===--- LLJITWithThinLTOSummaries.cpp - Module summaries as LLJIT input --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// In this example we will use a module summary index file produced for ThinLTO
				// to (A) find the module that defines the main entry point and (B) find all
				// extra modules that we need. We will do this in five steps:
				//
				// (1) Read the index file and parse the module summary index.
				// (2) Find the path of the module that defines "main".
				// (3) Parse the main module and create a matching LLJIT.
				// (4) Add all modules to the LLJIT that are covered by the index.
				// (5) Look up and run the JIT'd function.
				//
				// The index file name must be passed in as command line argument. Please find
				// this test for instructions on creating the index file:
				//
				// llvm/test/Examples/OrcV2Examples/lljit-with-thinlto-summaries.test
				//
				// If you use "build" as the build directory, you can run the test from the root
				// of the monorepo like this:
				//
				// > build/bin/llvm-lit -a \
				// llvm/test/Examples/OrcV2Examples/lljit-with-thinlto-summaries.test
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/STLExtras.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Bitcode/BitcodeReader.h"
				#include "llvm/ExecutionEngine/Orc/Core.h"
				#include "llvm/ExecutionEngine/Orc/ExecutionUtils.h"
				#include "llvm/ExecutionEngine/Orc/LLJIT.h"
				#include "llvm/ExecutionEngine/Orc/ThreadSafeModule.h"
				#include "llvm/IR/GlobalValue.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/IR/ModuleSummaryIndex.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Error.h"
				#include "llvm/Support/InitLLVM.h"
				#include "llvm/Support/MemoryBuffer.h"
				#include "llvm/Support/TargetSelect.h"
				#include "llvm/Support/raw_ostream.h"

				#include <string>
				#include <system_error>
				#include <vector>

				using namespace llvm;
				using namespace llvm::orc;

				// Path of the module summary index file.
				cl::opt<std::string> IndexFile{cl::desc("<module summary index>"),
				cl::Positional, cl::init("-")};

				// Describe a fail state that is caused by the given ModuleSummaryIndex
				// providing multiple definitions of the given global value name. It will dump
				// name and GUID for the global value and list the paths of the modules covered
				// by the index.
				class DuplicateDefinitionInSummary
				: public ErrorInfo<DuplicateDefinitionInSummary> {
				public:
				static char ID;

				DuplicateDefinitionInSummary(std::string GlobalValueName, ValueInfo VI)
				: GlobalValueName(std::move(GlobalValueName)) {
				ModulePaths.reserve(VI.getSummaryList().size());
				for (const auto &S : VI.getSummaryList())
				ModulePaths.push_back(S->modulePath().str());
				llvm::sort(ModulePaths);
				}

				void log(raw_ostream &OS) const override {
				OS << "Duplicate symbol for global value '" << GlobalValueName
				<< "' (GUID: " << GlobalValue::getGUID(GlobalValueName) << ") in:\n";
				for (const std::string &Path : ModulePaths) {
				OS << " " << Path << "\n";
				}
				}

				std::error_code convertToErrorCode() const override {
				return inconvertibleErrorCode();
				}

				private:
				std::string GlobalValueName;
				std::vector<std::string> ModulePaths;
				};

				// Describe a fail state where the given global value name was not found in the
				// given ModuleSummaryIndex. It will dump name and GUID for the global value and
				// list the paths of the modules covered by the index.
				class DefinitionNotFoundInSummary
				: public ErrorInfo<DefinitionNotFoundInSummary> {
				public:
				static char ID;

				DefinitionNotFoundInSummary(std::string GlobalValueName,
				ModuleSummaryIndex &Index)
				: GlobalValueName(std::move(GlobalValueName)) {
				ModulePaths.reserve(Index.modulePaths().size());
				for (const auto &Entry : Index.modulePaths())
				ModulePaths.push_back(Entry.first().str());
				llvm::sort(ModulePaths);
				}

				void log(raw_ostream &OS) const override {
				OS << "No symbol for global value '" << GlobalValueName
				<< "' (GUID: " << GlobalValue::getGUID(GlobalValueName) << ") in:\n";
				for (const std::string &Path : ModulePaths) {
				OS << " " << Path << "\n";
				}
				}

				std::error_code convertToErrorCode() const override {
				return llvm::inconvertibleErrorCode();
				}

				private:
				std::string GlobalValueName;
				std::vector<std::string> ModulePaths;
				};

				char DuplicateDefinitionInSummary::ID = 0;
				char DefinitionNotFoundInSummary::ID = 0;

				// Lookup the a function in the ModuleSummaryIndex and return the path of the
				// module that defines it. Paths in the ModuleSummaryIndex are relative to the
				// build directory of the covered modules.
				Expected<StringRef> getMainModulePath(StringRef FunctionName,
				ModuleSummaryIndex &Index) {
				// Summaries use unmangled names.
				GlobalValue::GUID G = GlobalValue::getGUID(FunctionName);
				ValueInfo VI = Index.getValueInfo(G);

				// We need a unique definition, otherwise don't try further.
				if (!VI \|\| VI.getSummaryList().empty())
				return make_error<DefinitionNotFoundInSummary>(FunctionName.str(), Index);
				if (VI.getSummaryList().size() > 1)
				return make_error<DuplicateDefinitionInSummary>(FunctionName.str(), VI);

				GlobalValueSummary *S = VI.getSummaryList().front()->getBaseObject();
				if (!isa<FunctionSummary>(S))
				return createStringError(inconvertibleErrorCode(),
				"Entry point is not a function: " + FunctionName);

				// Return a reference. ModuleSummaryIndex owns the module paths.
				return S->modulePath();
				}

				// Parse the bitcode module from the given path into a ThreadSafeModule.
				Expected<ThreadSafeModule> loadModule(StringRef Path,
				orc::ThreadSafeContext TSCtx) {
				outs() << "About to load module: " << Path << "\n";

				Expected<std::unique_ptr<MemoryBuffer>> BitcodeBuffer =
				errorOrToExpected(MemoryBuffer::getFile(Path));
				if (!BitcodeBuffer)
				return BitcodeBuffer.takeError();

				MemoryBufferRef BitcodeBufferRef = (**BitcodeBuffer).getMemBufferRef();
				Expected<std::unique_ptr<Module>> M =
				parseBitcodeFile(BitcodeBufferRef, *TSCtx.getContext());
				if (!M)
				return M.takeError();

				return ThreadSafeModule(std::move(*M), std::move(TSCtx));
				}

				int main(int Argc, char *Argv[]) {
				InitLLVM X(Argc, Argv);

				InitializeNativeTarget();
				InitializeNativeTargetAsmPrinter();

				cl::ParseCommandLineOptions(Argc, Argv, "LLJITWithThinLTOSummaries");

				ExitOnError ExitOnErr;
				ExitOnErr.setBanner(std::string(Argv[0]) + ": ");

				// (1) Read the index file and parse the module summary index.
				std::unique_ptr<MemoryBuffer> SummaryBuffer =
				ExitOnErr(errorOrToExpected(MemoryBuffer::getFile(IndexFile)));

				std::unique_ptr<ModuleSummaryIndex> SummaryIndex =
				ExitOnErr(getModuleSummaryIndex(SummaryBuffer->getMemBufferRef()));

				// (2) Find the path of the module that defines "main".
				std::string MainFunctionName = "main";
				StringRef MainModulePath =
				ExitOnErr(getMainModulePath(MainFunctionName, *SummaryIndex));

				// (3) Parse the main module and create a matching LLJIT.
				ThreadSafeContext TSCtx(std::make_unique<LLVMContext>());
				ThreadSafeModule MainModule = ExitOnErr(loadModule(MainModulePath, TSCtx));

				auto Builder = LLJITBuilder();

				MainModule.withModuleDo([&](Module &M) {
				if (M.getTargetTriple().empty()) {
				Builder.setJITTargetMachineBuilder(
				ExitOnErr(JITTargetMachineBuilder::detectHost()));
				} else {
				Builder.setJITTargetMachineBuilder(
				JITTargetMachineBuilder(Triple(M.getTargetTriple())));
				}
				if (!M.getDataLayout().getStringRepresentation().empty())
				Builder.setDataLayout(M.getDataLayout());
				});

				auto J = ExitOnErr(Builder.create());

				// (4) Add all modules to the LLJIT that are covered by the index.
				JITDylib &JD = J->getMainJITDylib();

				for (const auto &Entry : SummaryIndex->modulePaths()) {
				StringRef Path = Entry.first();
				ThreadSafeModule M = (Path == MainModulePath)
				? std::move(MainModule)
				: ExitOnErr(loadModule(Path, TSCtx));
				ExitOnErr(J->addIRModule(JD, std::move(M)));
				}

				// (5) Look up and run the JIT'd function.
				auto MainSym = ExitOnErr(J->lookup(MainFunctionName));

				using MainFnPtr = int ()(int, char []);
				MainFnPtr MainFunction =
				jitTargetAddressToFunction<MainFnPtr>(MainSym.getAddress());

				int Result = runAsMain(MainFunction, {}, MainModulePath);
				outs() << "'" << MainFunctionName << "' finished with exit code: " << Result
				<< "\n";

				return 0;
				}

llvm/test/Examples/OrcV2Examples/Inputs/bar-mod.ll

This file was added.

				define i32 @bar() {
				ret i32 0
				}

				^0 = module: (path: "bar-mod.o", hash: (3482110761, 1514484043, 2322286514, 2767576375, 2807967785))
				^1 = gv: (name: "bar", summaries: (function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0)))) ; guid = 16434608426314478903
				^2 = blockcount: 0

llvm/test/Examples/OrcV2Examples/Inputs/foo-mod.ll

This file was added.

				define i32 @foo() {
				ret i32 0
				}

				^0 = module: (path: "foo-mod.o", hash: (3133549885, 2087596051, 4175159200, 756405190, 968713858))
				^1 = gv: (name: "foo", summaries: (function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0)))) ; guid = 6699318081062747564
				^2 = blockcount: 0

llvm/test/Examples/OrcV2Examples/Inputs/main-mod.ll

This file was added.

				define i32 @main(i32 %argc, i8** %argv) {
				entry:
				%and = and i32 %argc, 1
				%tobool = icmp eq i32 %and, 0
				br i1 %tobool, label %if.end, label %if.then

				if.then: ; preds = %entry
				%call = tail call i32 @foo() #2
				br label %return

				if.end: ; preds = %entry
				%call1 = tail call i32 @bar() #2
				br label %return

				return: ; preds = %if.end, %if.then
				%retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.end ]
				ret i32 %retval.0
				}

				declare i32 @foo()
				declare i32 @bar()

				^0 = module: (path: "main-mod.o", hash: (1466373418, 2110622332, 1230295500, 3229354382, 2004933020))
				^1 = gv: (name: "foo") ; guid = 6699318081062747564
				^2 = gv: (name: "main", summaries: (function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 22, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 1, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0), calls: ((callee: ^1), (callee: ^3))))) ; guid = 15822663052811949562
				^3 = gv: (name: "bar") ; guid = 16434608426314478903
				^4 = blockcount: 0

llvm/test/Examples/OrcV2Examples/lljit-with-thinlto-summaries.test

This file was added.

				# RUN: opt -module-summary %p/Inputs/main-mod.ll -o main-mod.bc
				# RUN: opt -module-summary %p/Inputs/foo-mod.ll -o foo-mod.bc
				# RUN: opt -module-summary %p/Inputs/bar-mod.ll -o bar-mod.bc

				# RUN: llvm-lto -thinlto -o main-foo-bar main-mod.bc foo-mod.bc bar-mod.bc

				# RUN: LLJITWithThinLTOSummaries main-foo-bar.thinlto.bc 2>&1 \| FileCheck %s

				# CHECK: About to load module: main-mod.bc
				# CHECK: About to load module: foo-mod.bc
				# CHECK: About to load module: bar-mod.bc
				# CHECK: 'main' finished with exit code: 0

llvm/test/Examples/lit.local.cfg

	if not config.build_examples or sys.platform in ['win32']:			if not config.build_examples or sys.platform in ['win32']:
	config.unsupported = True			config.unsupported = True
	No newline at end of file
				# Test discovery should ignore subdirectories that contain test inputs.
				config.excludes = ['Inputs']

This is an archive of the discontinued LLVM Phabricator instance.

[ORC] Add a LLJITWithThinLTOSummaries example in OrcV2Examples
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 287250

llvm/examples/OrcV2Examples/CMakeLists.txt

llvm/examples/OrcV2Examples/LLJITWithThinLTOSummaries/CMakeLists.txt

llvm/examples/OrcV2Examples/LLJITWithThinLTOSummaries/LLJITWithThinLTOSummaries.cpp

llvm/test/Examples/OrcV2Examples/Inputs/bar-mod.ll

llvm/test/Examples/OrcV2Examples/Inputs/foo-mod.ll

llvm/test/Examples/OrcV2Examples/Inputs/main-mod.ll

llvm/test/Examples/OrcV2Examples/lljit-with-thinlto-summaries.test

llvm/test/Examples/lit.local.cfg

This is an archive of the discontinued LLVM Phabricator instance.

[ORC] Add a LLJITWithThinLTOSummaries example in OrcV2ExamplesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 287250

llvm/examples/OrcV2Examples/CMakeLists.txt

llvm/examples/OrcV2Examples/LLJITWithThinLTOSummaries/CMakeLists.txt

llvm/examples/OrcV2Examples/LLJITWithThinLTOSummaries/LLJITWithThinLTOSummaries.cpp

llvm/test/Examples/OrcV2Examples/Inputs/bar-mod.ll

llvm/test/Examples/OrcV2Examples/Inputs/foo-mod.ll

llvm/test/Examples/OrcV2Examples/Inputs/main-mod.ll

llvm/test/Examples/OrcV2Examples/lljit-with-thinlto-summaries.test

llvm/test/Examples/lit.local.cfg

[ORC] Add a LLJITWithThinLTOSummaries example in OrcV2Examples
ClosedPublic