This is an archive of the discontinued LLVM Phabricator instance.

Add a lock() function in PassRegistry to speed up multi-thread synchronization.
Needs ReviewPublic

Authored by eeckstein on Feb 20 2015, 5:24 AM.

Details

Reviewers
resistor
Summary

With this patch I'm trying to solve the following problem:

I'm running llvm in multiple threads (each instance uses its private LLVMContext).
With a release-asserts build of llvm I got terrible multi-thread performance. I found that the problem is the mutex in PassRegistry::getPassInfo(), mainly called from PMDataManager::verifyPreservedAnalysis().

My first idea was to make an option to disable verifyPreservedAnalysis(), even in an asserts-build.
But I didn't like it because I don't want to give up this verification and the mutex also cause overhead outside verifyPreservedAnalysis().

So I did the following:
I added a lock() function in PassRegistry which I call when I'm sure that all passes are registered.
After the PassRegistry is locked it can safely access its maps without using a mutex.

This completely solves the multi-thread performance problem.
The use of the lock() function is optional. If not used, nothing changes.

Diff Detail

Event Timeline

eeckstein updated this revision to Diff 20392.Feb 20 2015, 5:24 AM
eeckstein retitled this revision from to Add a lock() function in PassRegistry to speed up multi-thread synchronization..
eeckstein updated this object.
eeckstein edited the test plan for this revision. (Show Details)
eeckstein added a subscriber: Unknown Object (MLST).
eeckstein updated this revision to Diff 20744.Feb 26 2015, 4:34 AM
eeckstein added a reviewer: resistor.
resistor edited edge metadata.Feb 26 2015, 7:55 PM

Hi Erik,

eeckstein added a comment.EditedFeb 27 2015, 7:21 AM

Hi Owen, Chandler,

Do you mean with a release *+* asserts build?

Yes, sorry I wrote it wrong.

I think my initial description was not very good. I'll try to explailn again: there are two issues

  1. multithread-performance problem in the assert build: I understand that assert builds are slower than release builds, but in this case it's a factor of 4 when running with 4 threads, i.e. the multithreaded version is slower than the single-threaded version! This makes the asserts build useless for my purpose.
  1. Even in the release build there is a performance penalty of about 5% because of the mutex.

With my proposed patch both problems would be solved.

mehdi_amini added inline comments.
lib/IR/PassRegistry.cpp
52

It is unfortunate to lose RAII, is there a good reason why SmartScopedReader cannot take an extra parameter to be constructed without locking? (like std::unique_lock())

eeckstein updated this revision to Diff 20857.Feb 27 2015, 9:26 AM
eeckstein edited edge metadata.

New version using llvm::Optional

The patch LGTM.
However Chandler was not convinced in the first place, and asked "Do you have a test case that shows a severe problem without asserts?".
You are mentioning 5% perf improvement in single-threaded release build, can you tell how to reproduce?

lib/IR/PassRegistry.cpp
46

"Pedantic" comment: since your motivation is purely performance here, you may want to replace all uses of `locked`` with `locked.load(memory_order_consume)``. Maybe in a separate private method like that:

bool isLocked() { return locked.load(memory_order_consume); }

Especially since I don't think there is a guarantee that atomic<bool> is lock_free.

You are mentioning 5% perf improvement in single-threaded release build, can you tell how to reproduce?

It is reproducible in following scenario:

4 threads on a 3,2 GHz Intel Core i5

A shared working queue contains ~100 independent llvm modules (each llvm module has its own LLVMContext).
Each thread runs following loop:

while queue is not empty {

fetch an llvm module from the working queue
create llvm passes
call PassRegistry::lock()
run the llvm passes

}

Committed in r231276.

Thanks all for reviewing!

lib/IR/PassRegistry.cpp
46

Thanks for this suggestion. But I'm not sure if memory_order_consume is OK here. I think it must be memory_order_acquire.
I prefer to be conservative in this case and let it as it is now.