This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Added memory barrier to solve data race
ClosedPublic

Authored by hkao13 on Mar 25 2020, 9:26 AM.

Details

Summary

Data race occurs when acquiring lock for critical section
triggering assertion failure. Added barrier to ensure
all memory is commited before checking assertion.

Diff Detail

Event Timeline

hkao13 created this revision.Mar 25 2020, 9:26 AM
Herald added a project: Restricted Project. · View Herald Transcript
hkao13 updated this revision to Diff 252601.Mar 25 2020, 9:33 AM

Added memory barrier to solve potential data race when acquiring lock for critical section.

Potential data race occurs when acquiring lock for a critical section which triggers the assertion failure (KMP_DEBUG_ASSERT(this_thr->th.th_next_waiting == 0)). This could be due to the weaker memory consistency model in ARM. Putting debug traces in the data race prone region (after thread is done spinning and acquiring the lock) makes the issue not reproducible. I add a memory barrier to ensure all threads see consistent view of shared memory (the gtid for the next thread waiting) before the failing assertion.

Hahnfeld accepted this revision.Mar 25 2020, 10:59 AM

Yes, this looks related to memory consistency. As far as I understand the threads synchronize on th_spin_here, so this is guaranteed to be updated. Any other write before this in __kmp_release_queuing_lock is not guaranteed to be synchronized by a weak memory model. This includes th_next_waiting (which triggers the assertion), but also writes by the user application. That's particularly bad because this should be taken care of by the runtime!

So long story short: LGTM (provided we can move the KMP_MB a bit as mentioned inline), and this might actually fix also fix real user code.

openmp/runtime/src/kmp_lock.cpp
1241

This thread waits for th_spin_here = FALSE (pointed to by spin_here_p).

1243–1252

Please move the KMP_MB before the debug output / right after KMP_WAIT, so __kmp_dump_queuing_lock isn't called for the wrong reasons. Also I would reword the comment, the barrier should also take care of writes from the user code.

1476–1483

Here's the release code: First set th_next_waiting = 0, then KMP_MB to sync this write to memory and finally th_spin_here = FALSE to release the locked thread.

This revision is now accepted and ready to land.Mar 25 2020, 10:59 AM
hkao13 updated this revision to Diff 252637.Mar 25 2020, 11:29 AM

Moving KMP_MB above debug output.

@Hahnfeld Thanks for the review. I made your suggested changes in the latest update.

Yes, this looks related to memory consistency. As far as I understand the threads synchronize on th_spin_here, so this is guaranteed to be updated. Any other write before this in __kmp_release_queuing_lock is not guaranteed to be synchronized by a weak memory model. This includes th_next_waiting (which triggers the assertion), but also writes by the user application. That's particularly bad because this should be taken care of by the runtime!

Why doesn't the KMP_MB after the store to th_next_waiting guarantee that the unblocked thread sees that store?

bryanpkc added a comment.EditedMar 25 2020, 1:58 PM

Why doesn't the KMP_MB after the store to th_next_waiting guarantee that the unblocked thread sees that store?

Answering my own question, the ARM architecture reference manual states that the DMB instruction ensures that all affected memory accesses by the PE executing the DMB that appear in program order before the DMB and those which originate from a different PE, ...which have been Observed-by the PE before the DMB is executed, are Observed-by each PE, ...before any affected memory accesses that appear in program order after the DMB are Observed-by that PE.

I think this means, in this case, that the store of FALSE to th_spin_here is observed in the correct order only by the releasing thread that issued the DMB. Other threads (e.g. the spinning thread) could still see the updates to th_spin_here and th_next_waiting out of order, unless they also issue DMB, which is the fix in this patch.

Why doesn't the KMP_MB after the store to th_next_waiting guarantee that the unblocked thread sees that store?

Answering my own question, the ARM architecture reference manual states that the DMB instruction ensures that all affected memory accesses by the PE executing the DMB that appear in program order before the DMB and those which originate from a different PE, ...which have been Observed-by the PE before the DMB is executed, are Observed-by each PE, ...before any affected memory accesses that appear in program order after the DMB are Observed-by that PE.

I think this means, in this case, that the store of FALSE to th_spin_here is observed in the correct order only by the releasing thread that issued the DMB. Other threads (e.g. the spinning thread) could still see the updates to th_spin_here and th_next_waiting out of order, unless they also issue DMB, which is the fix in this patch.

Yes, without reference to a specific architecture I'd formulate it as follows:

  1. releasing thread
th_next_waiting = 0;
KMP_MB();
th_spin_here = FALSE;

ensures that the write to th_next_waiting is committed to memory before th_spin_here is set.

  1. spinning thread
KMP_WAIT(spin_here_p, FALSE, KMP_EQ, lck);
KMP_MB();
th_next_waiting != 0

ensures that the read of th_next_waiting comes from memory after spinning on th_spin_here.

Taken together this is the idiom to synchronize a value from one thread to another. (Well I think a full memory barrier is actually more than would be needed...)

@Hahnfeld Thanks for the explanation!

This revision was automatically updated to reflect the committed changes.