This task is listed in TODO.txt. The implementation swaps mutex against a spinlock based on atomic_flag. The spin lock itself is implemented as a nested class in a protected context of the associated state.
Details
Diff Detail
- Build Status
Buildable 10068 Build 10068: arc lint + arc unit
Event Timeline
Is there a benchmark where this demonstrates some performance improvement? I fear that the switch to condition_variable_any will swamp any performance gains from the switch to a spin lock.
Also, the spin lock is being held during allocating operations (the exception throws and at_thread_exit code). That's a little long to be holding a spin lock.
Thanks @bcraig, I will devise a benchmark and report back with its numbers for X86_64.
TODO.txt says "future should use <atomic> for synchronization." I would have interpreted this as meaning that the line
unsigned __state_;
should become
std::atomic<unsigned> __state_;
and appropriate loads and stores used so that __is_ready() can be checked without having to take the mutex first. OTOH, I don't actually see how that would help: if it's not ready, you probably want to take a unique_lock so you can wait, and if it *is* ready, you probably want to take a lock so that you can get the value out.
Atomics might allow functions like __set_future_attached() to stop taking the lock as well. But again I'm not sure what the benefit would be; the cost is obviously "risk of subtle bugs and maintenance nightmare."
This current patch just swaps out std::mutex for a std::mutex-alike class that claims to be faster for uncontested accesses. Definitely safer than my interpretation. :) If this patch actually helps, then I would offer that the class could be provided as a reusable class std::__spin_lock in the <mutex> header instead of being hidden inside __assoc_shared_state.
I think the bar for accepting this should be significantly faster, and not just a little faster. Spinlocks don't behave as well as mutexes in abnormal conditions. Spinlocks are more likely to cause priority inversion. They are more likely to cause throughput issues when there is a lot of contention, as the spinlock'd thread will consume a full time slice before relinquishing a cpu. On Windows, CRITICAL_SECTION and SRWLOCK become electrified during process termination to avoid indefinite hangs. We shouldn't give all of that up for a minor perf gain. We might give it up for a large perf gain though.
I agree with the general consensus that we should only make this change if it's significantly faster, and only after we have a test that demonstrates this.
Unfortunately I don't recall exactly why I wrote that TODO in the first place, but I'm sure I meant changing __state_, and not the lock. I suspect it had to do with http://llvm.org/PR24692 .
include/future | ||
---|---|---|
538 | It seems this change is ABI breaking, since mutex and __spin_lock don't have the same layout. The change will need to be guarded behind a _LIBCPP_ABI_FOO macro. See __config for examples. |
I talked to Marshall during CPPCon and he mentioned that the remark to use atomic was related to using std::call_once. I will rework this patch and use the defines from __config to mark it as an ABI breaking change.
It seems this change is ABI breaking, since mutex and __spin_lock don't have the same layout. The change will need to be guarded behind a _LIBCPP_ABI_FOO macro. See __config for examples.