This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
PassRegistry.h
-
lib/IR/
-
IR/
3
PassRegistry.cpp

Differential D7787

Add a lock() function in PassRegistry to speed up multi-thread synchronization.
Needs ReviewPublic

Authored by eeckstein on Feb 20 2015, 5:24 AM.

Download Raw Diff

Details

Reviewers

resistor

Summary

With this patch I'm trying to solve the following problem:

I'm running llvm in multiple threads (each instance uses its private LLVMContext).
With a release-asserts build of llvm I got terrible multi-thread performance. I found that the problem is the mutex in PassRegistry::getPassInfo(), mainly called from PMDataManager::verifyPreservedAnalysis().

My first idea was to make an option to disable verifyPreservedAnalysis(), even in an asserts-build.
But I didn't like it because I don't want to give up this verification and the mutex also cause overhead outside verifyPreservedAnalysis().

So I did the following:
I added a lock() function in PassRegistry which I call when I'm sure that all passes are registered.
After the PassRegistry is locked it can safely access its maps without using a mutex.

This completely solves the multi-thread performance problem.
The use of the lock() function is optional. If not used, nothing changes.

Diff Detail

Event Timeline

eeckstein updated this revision to Diff 20392.Feb 20 2015, 5:24 AM

eeckstein retitled this revision from to Add a lock() function in PassRegistry to speed up multi-thread synchronization..

eeckstein updated this object.

eeckstein edited the test plan for this revision. (Show Details)

eeckstein added a subscriber: Unknown Object (MLST).

eeckstein updated this revision to Diff 20744.Feb 26 2015, 4:34 AM

eeckstein added a reviewer: resistor.

Hi Erik,

Hi Owen, Chandler,

Do you mean with a release *+* asserts build?

Yes, sorry I wrote it wrong.

I think my initial description was not very good. I'll try to explailn again: there are two issues

multithread-performance problem in the assert build: I understand that assert builds are slower than release builds, but in this case it's a factor of 4 when running with 4 threads, i.e. the multithreaded version is slower than the single-threaded version! This makes the asserts build useless for my purpose.

Even in the release build there is a performance penalty of about 5% because of the mutex.

With my proposed patch both problems would be solved.

mehdi_amini added a subscriber: mehdi_amini.Feb 27 2015, 7:51 AM

mehdi_amini added inline comments.

lib/IR/PassRegistry.cpp
52	It is unfortunate to lose RAII, is there a good reason why SmartScopedReader cannot take an extra parameter to be constructed without locking? (like std::unique_lock())

New version using llvm::Optional

The patch LGTM.
However Chandler was not convinced in the first place, and asked "Do you have a test case that shows a severe problem without asserts?".
You are mentioning 5% perf improvement in single-threaded release build, can you tell how to reproduce?

lib/IR/PassRegistry.cpp
46	"Pedantic" comment: since your motivation is purely performance here, you may want to replace all uses of `locked`` with `locked.load(memory_order_consume)``. Maybe in a separate private method like that: bool isLocked() { return locked.load(memory_order_consume); } Especially since I don't think there is a guarantee that atomic<bool> is lock_free.

You are mentioning 5% perf improvement in single-threaded release build, can you tell how to reproduce?

It is reproducible in following scenario:

4 threads on a 3,2 GHz Intel Core i5

A shared working queue contains ~100 independent llvm modules (each llvm module has its own LLVMContext).
Each thread runs following loop:

while queue is not empty {

fetch an llvm module from the working queue
create llvm passes
call PassRegistry::lock()
run the llvm passes

}

Committed in r231276.

Thanks all for reviewing!

lib/IR/PassRegistry.cpp
46	Thanks for this suggestion. But I'm not sure if memory_order_consume is OK here. I think it must be memory_order_acquire. I prefer to be conservative in this case and let it as it is now.

Revision Contents

Path

Size

include/

llvm/

PassRegistry.h

9 lines

lib/

IR/

PassRegistry.cpp

31 lines

Diff 20744

include/llvm/PassRegistry.h

Context not available.
	class PassRegistry {	class PassRegistry {
	mutable sys::SmartRWMutex<true> Lock;	mutable sys::SmartRWMutex<true> Lock;

		/// Only if false, synchronization must use the Lock mutex.
		std::atomic<bool> locked;

	/// PassInfoMap - Keep track of the PassInfo object for each registered pass.	/// PassInfoMap - Keep track of the PassInfo object for each registered pass.
	typedef DenseMap<const void , const PassInfo > MapType;	typedef DenseMap<const void , const PassInfo > MapType;
	MapType PassInfoMap;	MapType PassInfoMap;
Context not available.
	std::vector<PassRegistrationListener *> Listeners;	std::vector<PassRegistrationListener *> Listeners;

	public:	public:
	PassRegistry() {}	PassRegistry() : locked(false) {}
	~PassRegistry();	~PassRegistry();

	/// getPassRegistry - Access the global registry object, which is	/// getPassRegistry - Access the global registry object, which is
Context not available.
	/// llvm_shutdown.	/// llvm_shutdown.
	static PassRegistry *getPassRegistry();	static PassRegistry *getPassRegistry();

		/// Enables fast thread synchronization in getPassInfo().
		/// After calling lock() no more passes may be registered.
		void lock() { locked = true; }

	/// getPassInfo - Look up a pass' corresponding PassInfo, indexed by the pass'	/// getPassInfo - Look up a pass' corresponding PassInfo, indexed by the pass'
	/// type identifier (&MyPass::ID).	/// type identifier (&MyPass::ID).
	const PassInfo getPassInfo(const void TI) const;	const PassInfo getPassInfo(const void TI) const;
Context not available.

lib/IR/PassRegistry.cpp

Context not available.
	PassRegistry::~PassRegistry() {}	PassRegistry::~PassRegistry() {}

	const PassInfo PassRegistry::getPassInfo(const void TI) const {	const PassInfo PassRegistry::getPassInfo(const void TI) const {
	sys::SmartScopedReader<true> Guard(Lock);	// We don't need thread synchronization after the PassRegistry is locked
		// (that means: is read-only).
		bool needMutex = !locked;
		if (needMutex)
		Lock.lock_shared();
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions "Pedantic" comment: since your motivation is purely performance here, you may want to replace all uses of `locked`` with `locked.load(memory_order_consume)``. Maybe in a separate private method like that: bool isLocked() { return locked.load(memory_order_consume); } Especially since I don't think there is a guarantee that atomic<bool> is lock_free. mehdi_amini: "Pedantic" comment: since your motivation is purely performance here, you may want to replace…
		eecksteinAuthorUnsubmitted Not Done Reply Inline Actions Thanks for this suggestion. But I'm not sure if memory_order_consume is OK here. I think it must be memory_order_acquire. I prefer to be conservative in this case and let it as it is now. eeckstein: Thanks for this suggestion. But I'm not sure if memory_order_consume is OK here. I think it…

	MapType::const_iterator I = PassInfoMap.find(TI);	MapType::const_iterator I = PassInfoMap.find(TI);
	return I != PassInfoMap.end() ? I->second : nullptr;	const PassInfo *PI = I != PassInfoMap.end() ? I->second : nullptr;

		if (needMutex)
		Lock.unlock_shared();
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions It is unfortunate to lose RAII, is there a good reason why SmartScopedReader cannot take an extra parameter to be constructed without locking? (like std::unique_lock()) mehdi_amini: It is unfortunate to lose RAII, is there a good reason why SmartScopedReader cannot take an…
		return PI;
	}	}

	const PassInfo *PassRegistry::getPassInfo(StringRef Arg) const {	const PassInfo *PassRegistry::getPassInfo(StringRef Arg) const {
	sys::SmartScopedReader<true> Guard(Lock);	// We don't need thread synchronization after the PassRegistry is locked
		// (that means: is read-only).
		bool needMutex = !locked;
		if (needMutex)
		Lock.lock_shared();

	StringMapType::const_iterator I = PassInfoStringMap.find(Arg);	StringMapType::const_iterator I = PassInfoStringMap.find(Arg);
	return I != PassInfoStringMap.end() ? I->second : nullptr;	const PassInfo *PI = I != PassInfoStringMap.end() ? I->second : nullptr;

		if (needMutex)
		Lock.unlock_shared();
		return PI;
	}	}

	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
Context not available.
	//	//

	void PassRegistry::registerPass(const PassInfo &PI, bool ShouldFree) {	void PassRegistry::registerPass(const PassInfo &PI, bool ShouldFree) {

		assert(!locked && "Trying to register a pass in a locked PassRegistry");

	sys::SmartScopedWriter<true> Guard(Lock);	sys::SmartScopedWriter<true> Guard(Lock);
	bool Inserted =	bool Inserted =
	PassInfoMap.insert(std::make_pair(PI.getTypeInfo(), &PI)).second;	PassInfoMap.insert(std::make_pair(PI.getTypeInfo(), &PI)).second;
Context not available.

	if (ShouldFree)	if (ShouldFree)
	ToFree.push_back(std::unique_ptr<const PassInfo>(&PI));	ToFree.push_back(std::unique_ptr<const PassInfo>(&PI));

		assert(!locked && "PassRegistry locked during registering a pass");
	}	}

	void PassRegistry::enumerateWith(PassRegistrationListener *L) {	void PassRegistry::enumerateWith(PassRegistrationListener *L) {
Context not available.