Threads may work on different loops concurrently, thus iterations stealing algorithm should check that thief and victim work on the same loop.
The changes are:
- th_steal_lock moved from thread-private to thread-loop-private buffer;
- fixed dispatch_init to check possibly corrected in dispatch_init_algorithm schedule;
- fixed stealing process and stealing finalization to work with thread-loop-private buffers of the same index, instead of th_dispatch_pr_current which might point to different loops in different threads.