Download Raw Diff

Details

Reviewers

tejohnson
mehdi_amini

Commits

rG002c2d538021: ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible…
rL295021: ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible…

Summary

Depends on D29695

Diff Detail

Repository: rL LLVM

Event Timeline

pcc created this revision.Feb 7 2017, 7:52 PM

Herald added a subscriber: Prazek. · View Herald TranscriptFeb 7 2017, 7:52 PM

Note: I patched in https://github.com/pcc/llvm-project/commit/5a5904d6721f895eafdd2fc476872b98806c36e6 to measure perf impact. Total wall time spent in addRegularLTO when linking chrome was 8.2743s, as compared to about 6 seconds before (see D27324).

Use AARGetter to simplify and make this pass easier to use from the new PM

pcc mentioned this in D29803: WholeProgramDevirt: Examine the function body when deciding whether functions are readnone..Feb 9 2017, 5:49 PM

I assume these get dropped in the usual place where we eliminate available externally. Does VCP benefit from any of the other optimizations we would typically apply before that point (e.g. inlining)?

llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
284 ↗	(On Diff #87896)	Add comment about what these are testing (I think I have correctly reasoned them out, but would be good to be explicit).
297 ↗	(On Diff #87896)	There are now comments about the eligible virtual funcs being added to the regular LTO, but I see there are no comments about the original condition here for putting a GVar in the regular LTO module - maybe add that or point to someplace that describes what we're doing.
llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll
9 ↗	(On Diff #87896)	Where do the long hex strings come from in the names in this test?

In D29701#670366, @pcc wrote:

Note: I patched in https://github.com/pcc/llvm-project/commit/5a5904d6721f895eafdd2fc476872b98806c36e6 to measure perf impact. Total wall time spent in addRegularLTO when linking chrome was 8.2743s, as compared to about 6 seconds before (see D27324).

So >30% overhead on addRegularLTO, but what is the overhead for runRegularLTO?

mehdi_amini added inline comments.Feb 10 2017, 9:30 PM

llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
311 ↗	(On Diff #87896)	(One-line comment)

In D29701#674285, @mehdi_amini wrote:

In D29701#670366, @pcc wrote:

Note: I patched in https://github.com/pcc/llvm-project/commit/5a5904d6721f895eafdd2fc476872b98806c36e6 to measure perf impact. Total wall time spent in addRegularLTO when linking chrome was 8.2743s, as compared to about 6 seconds before (see D27324).

So >30% overhead on addRegularLTO, but what is the overhead for runRegularLTO?

Just did a fresh set of runs on chrome with https://github.com/pcc/llvm-project/commits/lto-timers patched in, and took median of 5 for each measurement:

               before after
addRegularLTO 8.0407s 9.7432s (+21%)
runRegularLTO 9.6064s 10.4748s (+9%)

Note that we haven't done everything we can -- a couple more things I can think of are to exclude functions that take non-integer arguments (other than "this"), and to set optnone on functions in the merged module. Taking a look at chrome's merged module, there are 6727 available_externally functions, of which 1340 take a pointer argument other than "this".

In D29701#674280, @tejohnson wrote:

I assume these get dropped in the usual place where we eliminate available externally.

Yes (i.e. at the end of the regular LTO pipeline).

Does VCP benefit from any of the other optimizations we would typically apply before that point (e.g. inlining)?

I suppose that is possible in principle, but I wouldn't typically expect it to; I've found that VCP eligible functions tend to be relatively trivial functions that are fully inlined into leaf functions at compile time.

Address review comments

Harbormaster completed remote builds in B3944: Diff 88293.Feb 13 2017, 6:37 PM

pcc added inline comments.Feb 13 2017, 6:37 PM

llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll
9 ↗	(On Diff #87896)	They are module IDs. See `getModuleId` in ThinLTOBitcodeWriter.

In D29701#675809, @pcc wrote:
In D29701#674285, @mehdi_amini wrote:

In D29701#670366, @pcc wrote:

Note: I patched in https://github.com/pcc/llvm-project/commit/5a5904d6721f895eafdd2fc476872b98806c36e6 to measure perf impact. Total wall time spent in addRegularLTO when linking chrome was 8.2743s, as compared to about 6 seconds before (see D27324).

So >30% overhead on addRegularLTO, but what is the overhead for runRegularLTO?

Just did a fresh set of runs on chrome with https://github.com/pcc/llvm-project/commits/lto-timers patched in, and took median of 5 for each measurement:
               before after
addRegularLTO 8.0407s 9.7432s (+21%)
runRegularLTO 9.6064s 10.4748s (+9%)
Note that we haven't done everything we can -- a couple more things I can think of are to exclude functions that take non-integer arguments (other than "this"), and to set optnone on functions in the merged module. Taking a look at chrome's merged module, there are 6727 available_externally functions, of which 1340 take a pointer argument other than "this".

Another data point for chrome is that this change causes us to merge 53487 struct type definitions, most of which are likely unnecessary (I think they are either associated with "this", which we make sure is unused, or one of the other pointer arguments, which means we don't need the function). So something else we could do that would be a little more tricky is to try to drop the type information for "this". (Of course, this would happen for free if we had typeless pointers.)

Check function arguments

Harbormaster completed remote builds in B3945: Diff 88295.Feb 13 2017, 7:01 PM

In D29701#675840, @pcc wrote:
In D29701#675809, @pcc wrote:
In D29701#674285, @mehdi_amini wrote:

In D29701#670366, @pcc wrote:

Note: I patched in https://github.com/pcc/llvm-project/commit/5a5904d6721f895eafdd2fc476872b98806c36e6 to measure perf impact. Total wall time spent in addRegularLTO when linking chrome was 8.2743s, as compared to about 6 seconds before (see D27324).

So >30% overhead on addRegularLTO, but what is the overhead for runRegularLTO?

Just did a fresh set of runs on chrome with https://github.com/pcc/llvm-project/commits/lto-timers patched in, and took median of 5 for each measurement:
               before after
addRegularLTO 8.0407s 9.7432s (+21%)
runRegularLTO 9.6064s 10.4748s (+9%)

Quite interesting that we take more time to build and merge the IR than transforming and emitting code :)

llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
283–284 ↗	(On Diff #88295)	great comments!

This revision is now accepted and ready to land.Feb 13 2017, 7:30 PM

Closed by commit rL295021: ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible… (authored by pcc). · Explain WhyFeb 13 2017, 7:54 PM

This revision was automatically updated to reflect the committed changes.

Diff 88296

llvm/trunk/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp

Show All 9 Lines
// This pass prepares a module containing type metadata for ThinLTO by splitting		// This pass prepares a module containing type metadata for ThinLTO by splitting
// it into regular and thin LTO parts if possible, and writing both parts to		// it into regular and thin LTO parts if possible, and writing both parts to
// a multi-module bitcode file. Modules that do not contain type metadata are		// a multi-module bitcode file. Modules that do not contain type metadata are
// written unmodified as a single module.		// written unmodified as a single module.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/ModuleSummaryAnalysis.h"		#include "llvm/Analysis/ModuleSummaryAnalysis.h"
#include "llvm/Analysis/TypeMetadataUtils.h"		#include "llvm/Analysis/TypeMetadataUtils.h"
#include "llvm/Bitcode/BitcodeWriter.h"		#include "llvm/Bitcode/BitcodeWriter.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DebugInfo.h"		#include "llvm/IR/DebugInfo.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/ScopedPrinter.h"		#include "llvm/Support/ScopedPrinter.h"
		#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
using namespace llvm;		using namespace llvm;

namespace {		namespace {

// Produce a unique identifier for this module by taking the MD5 sum of the		// Produce a unique identifier for this module by taking the MD5 sum of the
// names of the module's strong external symbols. This identifier is		// names of the module's strong external symbols. This identifier is
// normally guaranteed to be unique, or the program would fail to link due to		// normally guaranteed to be unique, or the program would fail to link due to
▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	else
(Constant )nullptr, "", (GlobalVariable )nullptr,		(Constant )nullptr, "", (GlobalVariable )nullptr,
GA->getThreadLocalMode(), GA->getType()->getAddressSpace());		GA->getThreadLocalMode(), GA->getType()->getAddressSpace());
GO->takeName(GA);		GO->takeName(GA);
GA->replaceAllUsesWith(GO);		GA->replaceAllUsesWith(GO);
GA->eraseFromParent();		GA->eraseFromParent();
}		}
}		}

		void forEachVirtualFunction(Constant C, function_ref<void(Function )> Fn) {
		if (auto *F = dyn_cast<Function>(C))
		return Fn(F);
		for (Value *Op : C->operands())
		forEachVirtualFunction(cast<Constant>(Op), Fn);
		}

// If it's possible to split M into regular and thin LTO parts, do so and write		// If it's possible to split M into regular and thin LTO parts, do so and write
// a multi-module bitcode file with the two parts to OS. Otherwise, write only a		// a multi-module bitcode file with the two parts to OS. Otherwise, write only a
// regular LTO bitcode file to OS.		// regular LTO bitcode file to OS.
void splitAndWriteThinLTOBitcode(raw_ostream &OS, Module &M) {		void splitAndWriteThinLTOBitcode(
		raw_ostream &OS, function_ref<AAResults &(Function &)> AARGetter,
		Module &M) {
std::string ModuleId = getModuleId(&M);		std::string ModuleId = getModuleId(&M);
if (ModuleId.empty()) {		if (ModuleId.empty()) {
// We couldn't generate a module ID for this module, just write it out as a		// We couldn't generate a module ID for this module, just write it out as a
// regular LTO module.		// regular LTO module.
WriteBitcodeToFile(&M, OS);		WriteBitcodeToFile(&M, OS);
return;		return;
}		}

promoteTypeIds(M, ModuleId);		promoteTypeIds(M, ModuleId);

auto IsInMergedM = [&](const GlobalValue *GV) {		// Returns whether a global has attached type metadata. Such globals may
auto *GVar = dyn_cast<GlobalVariable>(GV->getBaseObject());		// participate in CFI or whole-program devirtualization, so they need to
if (!GVar)		// appear in the merged module instead of the thin LTO module.
return false;		auto HasTypeMetadata = [&](const GlobalObject *GO) {

SmallVector<MDNode *, 1> MDs;		SmallVector<MDNode *, 1> MDs;
GVar->getMetadata(LLVMContext::MD_type, MDs);		GO->getMetadata(LLVMContext::MD_type, MDs);
return !MDs.empty();		return !MDs.empty();
};		};

		// Collect the set of virtual functions that are eligible for virtual constant
		// propagation. Each eligible function must not access memory, must return
		// an integer of width <=64 bits, must take at least one argument, must not
		// use its first argument (assumed to be "this") and all arguments other than
		// the first one must be of <=64 bit integer type.
		//
		// Note that we test whether this copy of the function is readnone, rather
		// than testing function attributes, which must hold for any copy of the
		// function, even a less optimized version substituted at link time. This is
		// sound because the virtual constant propagation optimizations effectively
		// inline all implementations of the virtual function into each call site,
		// rather than using function attributes to perform local optimization.
		std::set<const Function *> EligibleVirtualFns;
		for (GlobalVariable &GV : M.globals())
		if (HasTypeMetadata(&GV))
		forEachVirtualFunction(GV.getInitializer(), [&](Function *F) {
		auto *RT = dyn_cast<IntegerType>(F->getReturnType());
		if (!RT \|\| RT->getBitWidth() > 64 \|\| F->arg_empty() \|\|
		!F->arg_begin()->use_empty())
		return;
		for (auto &Arg : make_range(std::next(F->arg_begin()), F->arg_end())) {
		auto *ArgT = dyn_cast<IntegerType>(Arg.getType());
		if (!ArgT \|\| ArgT->getBitWidth() > 64)
		return;
		}
		if (computeFunctionBodyMemoryAccess(F, AARGetter(F)) == MAK_ReadNone)
		EligibleVirtualFns.insert(F);
		});

ValueToValueMapTy VMap;		ValueToValueMapTy VMap;
std::unique_ptr<Module> MergedM(CloneModule(&M, VMap, IsInMergedM));		std::unique_ptr<Module> MergedM(
		CloneModule(&M, VMap, [&](const GlobalValue *GV) -> bool {
		if (auto *F = dyn_cast<Function>(GV))
		return EligibleVirtualFns.count(F);
		if (auto *GVar = dyn_cast_or_null<GlobalVariable>(GV->getBaseObject()))
		return HasTypeMetadata(GVar);
		return false;
		}));
StripDebugInfo(*MergedM);		StripDebugInfo(*MergedM);

filterModule(&M, [&](const GlobalValue *GV) { return !IsInMergedM(GV); });		for (Function &F : *MergedM)
		if (!F.isDeclaration()) {
		// Reset the linkage of all functions eligible for virtual constant
		// propagation. The canonical definitions live in the thin LTO module so
		// that they can be imported.
		F.setLinkage(GlobalValue::AvailableExternallyLinkage);
		F.setComdat(nullptr);
		}

		// Remove all globals with type metadata, as well as aliases pointing to them,
		// from the thin LTO module.
		filterModule(&M, [&](const GlobalValue *GV) {
		if (auto *GVar = dyn_cast_or_null<GlobalVariable>(GV->getBaseObject()))
		return !HasTypeMetadata(GVar);
		return true;
		});

promoteInternals(*MergedM, M, ModuleId);		promoteInternals(*MergedM, M, ModuleId);
promoteInternals(M, *MergedM, ModuleId);		promoteInternals(M, *MergedM, ModuleId);

simplifyExternals(*MergedM);		simplifyExternals(*MergedM);

SmallVector<char, 0> Buffer;		SmallVector<char, 0> Buffer;
BitcodeWriter W(Buffer);		BitcodeWriter W(Buffer);
Show All 15 Lines	for (auto &GO : M.global_objects()) {
GO.getMetadata(LLVMContext::MD_type, MDs);		GO.getMetadata(LLVMContext::MD_type, MDs);
if (!MDs.empty())		if (!MDs.empty())
return true;		return true;
}		}

return false;		return false;
}		}

void writeThinLTOBitcode(raw_ostream &OS, Module &M,		void writeThinLTOBitcode(raw_ostream &OS,
const ModuleSummaryIndex *Index) {		function_ref<AAResults &(Function &)> AARGetter,
		Module &M, const ModuleSummaryIndex *Index) {
// See if this module has any type metadata. If so, we need to split it.		// See if this module has any type metadata. If so, we need to split it.
if (requiresSplit(M))		if (requiresSplit(M))
return splitAndWriteThinLTOBitcode(OS, M);		return splitAndWriteThinLTOBitcode(OS, AARGetter, M);

// Otherwise we can just write it out as a regular module.		// Otherwise we can just write it out as a regular module.
WriteBitcodeToFile(&M, OS, /ShouldPreserveUseListOrder=/false, Index,		WriteBitcodeToFile(&M, OS, /ShouldPreserveUseListOrder=/false, Index,
/GenerateHash=/true);		/GenerateHash=/true);
}		}

class WriteThinLTOBitcode : public ModulePass {		class WriteThinLTOBitcode : public ModulePass {
raw_ostream &OS; // raw_ostream to print on		raw_ostream &OS; // raw_ostream to print on
Show All 9 Lines	explicit WriteThinLTOBitcode(raw_ostream &o)
initializeWriteThinLTOBitcodePass(*PassRegistry::getPassRegistry());		initializeWriteThinLTOBitcodePass(*PassRegistry::getPassRegistry());
}		}

StringRef getPassName() const override { return "ThinLTO Bitcode Writer"; }		StringRef getPassName() const override { return "ThinLTO Bitcode Writer"; }

bool runOnModule(Module &M) override {		bool runOnModule(Module &M) override {
const ModuleSummaryIndex *Index =		const ModuleSummaryIndex *Index =
&(getAnalysis<ModuleSummaryIndexWrapperPass>().getIndex());		&(getAnalysis<ModuleSummaryIndexWrapperPass>().getIndex());
writeThinLTOBitcode(OS, M, Index);		writeThinLTOBitcode(OS, LegacyAARGetter(*this), M, Index);
return true;		return true;
}		}
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesAll();		AU.setPreservesAll();
		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<ModuleSummaryIndexWrapperPass>();		AU.addRequired<ModuleSummaryIndexWrapperPass>();
		AU.addRequired<TargetLibraryInfoWrapperPass>();
}		}
};		};
} // anonymous namespace		} // anonymous namespace

char WriteThinLTOBitcode::ID = 0;		char WriteThinLTOBitcode::ID = 0;
INITIALIZE_PASS_BEGIN(WriteThinLTOBitcode, "write-thinlto-bitcode",		INITIALIZE_PASS_BEGIN(WriteThinLTOBitcode, "write-thinlto-bitcode",
"Write ThinLTO Bitcode", false, true)		"Write ThinLTO Bitcode", false, true)
		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(ModuleSummaryIndexWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ModuleSummaryIndexWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(WriteThinLTOBitcode, "write-thinlto-bitcode",		INITIALIZE_PASS_END(WriteThinLTOBitcode, "write-thinlto-bitcode",
"Write ThinLTO Bitcode", false, true)		"Write ThinLTO Bitcode", false, true)

ModulePass *llvm::createWriteThinLTOBitcodePass(raw_ostream &Str) {		ModulePass *llvm::createWriteThinLTOBitcodePass(raw_ostream &Str) {
return new WriteThinLTOBitcode(Str);		return new WriteThinLTOBitcode(Str);
}		}

llvm/trunk/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll

				; RUN: opt -thinlto-bc -o %t %s
				; RUN: llvm-modextract -b -n 0 -o - %t \| llvm-dis \| FileCheck --check-prefix=M0 %s
				; RUN: llvm-modextract -b -n 1 -o - %t \| llvm-dis \| FileCheck --check-prefix=M1 %s

				define [1 x i8] @source() {
				ret [1 x i8] @g
				}

				; M0: @"g$84f59439b469192440047efc8de357fb" = external hidden constant [1 x i8*]{{$}}
				; M1: @"g$84f59439b469192440047efc8de357fb" = hidden constant [1 x i8] [i8 bitcast (i64 (i8) @"ok$84f59439b469192440047efc8de357fb" to i8*)]
				@g = internal constant [1 x i8*] [
				i8* bitcast (i64 (i8) @ok to i8*)
				], !type !0

				; M0: define hidden i64 @"ok$84f59439b469192440047efc8de357fb"
				; M1: define available_externally hidden i64 @"ok$84f59439b469192440047efc8de357fb"
				define internal i64 @ok(i8* %this) {
				ret i64 42
				}

				!0 = !{i32 0, !"typeid"}

llvm/trunk/test/Transforms/ThinLTOBitcodeWriter/split-vfunc.ll

				; RUN: opt -thinlto-bc -o %t %s
				; RUN: llvm-modextract -b -n 0 -o - %t \| llvm-dis \| FileCheck --check-prefix=M0 %s
				; RUN: llvm-modextract -b -n 1 -o - %t \| llvm-dis \| FileCheck --check-prefix=M1 %s

				; M0: @g = external constant [9 x i8*]{{$}}
				; M1: @g = constant [9 x i8*]
				@g = constant [9 x i8*] [
				i8* bitcast (i64 (i8) @ok1 to i8*),
				i8* bitcast (i64 (i8, i64) @ok2 to i8*),
				i8* bitcast (void (i8) @wrongtype1 to i8*),
				i8* bitcast (i128 (i8) @wrongtype2 to i8*),
				i8* bitcast (i64 ()* @wrongtype3 to i8*),
				i8* bitcast (i64 (i8, i8)* @wrongtype4 to i8*),
				i8* bitcast (i64 (i8, i128) @wrongtype5 to i8*),
				i8* bitcast (i64 (i8) @usesthis to i8*),
				i8* bitcast (i8 (i8) @reads to i8*)
				], !type !0

				; M0: define i64 @ok1
				; M1: define available_externally i64 @ok1
				define i64 @ok1(i8* %this) {
				ret i64 42
				}

				; M0: define i64 @ok2
				; M1: define available_externally i64 @ok2
				define i64 @ok2(i8* %this, i64 %arg) {
				ret i64 %arg
				}

				; M0: define void @wrongtype1
				; M1: declare void @wrongtype1()
				define void @wrongtype1(i8*) {
				ret void
				}

				; M0: define i128 @wrongtype2
				; M1: declare void @wrongtype2()
				define i128 @wrongtype2(i8*) {
				ret i128 0
				}

				; M0: define i64 @wrongtype3
				; M1: declare void @wrongtype3()
				define i64 @wrongtype3() {
				ret i64 0
				}

				; M0: define i64 @wrongtype4
				; M1: declare void @wrongtype4()
				define i64 @wrongtype4(i8, i8) {
				ret i64 0
				}

				; M0: define i64 @wrongtype5
				; M1: declare void @wrongtype5()
				define i64 @wrongtype5(i8*, i128) {
				ret i64 0
				}

				; M0: define i64 @usesthis
				; M1: declare void @usesthis()
				define i64 @usesthis(i8* %this) {
				%i = ptrtoint i8* %this to i64
				ret i64 %i
				}

				; M0: define i8 @reads
				; M1: declare void @reads()
				define i8 @reads(i8* %this) {
				%l = load i8, i8* %this
				ret i8 %l
				}

				!0 = !{i32 0, !"typeid"}

This is an archive of the discontinued LLVM Phabricator instance.

ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible functions to merged module.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 88296

llvm/trunk/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp

llvm/trunk/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll

llvm/trunk/test/Transforms/ThinLTOBitcodeWriter/split-vfunc.ll

This is an archive of the discontinued LLVM Phabricator instance.

ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible functions to merged module.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 88296

llvm/trunk/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp

llvm/trunk/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll

llvm/trunk/test/Transforms/ThinLTOBitcodeWriter/split-vfunc.ll

ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible functions to merged module.
ClosedPublic