Data race occurs when acquiring lock for critical section
triggering assertion failure. Added barrier to ensure
all memory is commited before checking assertion.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Added memory barrier to solve potential data race when acquiring lock for critical section.
Potential data race occurs when acquiring lock for a critical section which triggers the assertion failure (KMP_DEBUG_ASSERT(this_thr->th.th_next_waiting == 0)). This could be due to the weaker memory consistency model in ARM. Putting debug traces in the data race prone region (after thread is done spinning and acquiring the lock) makes the issue not reproducible. I add a memory barrier to ensure all threads see consistent view of shared memory (the gtid for the next thread waiting) before the failing assertion.
Yes, this looks related to memory consistency. As far as I understand the threads synchronize on th_spin_here, so this is guaranteed to be updated. Any other write before this in __kmp_release_queuing_lock is not guaranteed to be synchronized by a weak memory model. This includes th_next_waiting (which triggers the assertion), but also writes by the user application. That's particularly bad because this should be taken care of by the runtime!
So long story short: LGTM (provided we can move the KMP_MB a bit as mentioned inline), and this might actually fix also fix real user code.
openmp/runtime/src/kmp_lock.cpp | ||
---|---|---|
1241 | This thread waits for th_spin_here = FALSE (pointed to by spin_here_p). | |
1243–1251 | Please move the KMP_MB before the debug output / right after KMP_WAIT, so __kmp_dump_queuing_lock isn't called for the wrong reasons. Also I would reword the comment, the barrier should also take care of writes from the user code. | |
1476–1483 | Here's the release code: First set th_next_waiting = 0, then KMP_MB to sync this write to memory and finally th_spin_here = FALSE to release the locked thread. |
Why doesn't the KMP_MB after the store to th_next_waiting guarantee that the unblocked thread sees that store?
Answering my own question, the ARM architecture reference manual states that the DMB instruction ensures that all affected memory accesses by the PE executing the DMB that appear in program order before the DMB and those which originate from a different PE, ...which have been Observed-by the PE before the DMB is executed, are Observed-by each PE, ...before any affected memory accesses that appear in program order after the DMB are Observed-by that PE.
I think this means, in this case, that the store of FALSE to th_spin_here is observed in the correct order only by the releasing thread that issued the DMB. Other threads (e.g. the spinning thread) could still see the updates to th_spin_here and th_next_waiting out of order, unless they also issue DMB, which is the fix in this patch.
Yes, without reference to a specific architecture I'd formulate it as follows:
- releasing thread
th_next_waiting = 0; KMP_MB(); th_spin_here = FALSE;
ensures that the write to th_next_waiting is committed to memory before th_spin_here is set.
- spinning thread
KMP_WAIT(spin_here_p, FALSE, KMP_EQ, lck); KMP_MB(); th_next_waiting != 0
ensures that the read of th_next_waiting comes from memory after spinning on th_spin_here.
Taken together this is the idiom to synchronize a value from one thread to another. (Well I think a full memory barrier is actually more than would be needed...)
This thread waits for th_spin_here = FALSE (pointed to by spin_here_p).