This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/runtime/src/
-
runtime/
-
src/
3
kmp_lock.cpp

Differential D76780

[OpenMP] Added memory barrier to solve data race
ClosedPublic

Authored by hkao13 on Mar 25 2020, 9:26 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
Hahnfeld
jlpeyton

Commits

rG236ac68fa5b1: [OpenMP] Add memory barrier to solve data race

Summary

Data race occurs when acquiring lock for critical section
triggering assertion failure. Added barrier to ensure
all memory is commited before checking assertion.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hkao13 created this revision.Mar 25 2020, 9:26 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptMar 25 2020, 9:26 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: openmp-commits, guansong. · View Herald Transcript

Added memory barrier to solve potential data race when acquiring lock for critical section.

Potential data race occurs when acquiring lock for a critical section which triggers the assertion failure (KMP_DEBUG_ASSERT(this_thr->th.th_next_waiting == 0)). This could be due to the weaker memory consistency model in ARM. Putting debug traces in the data race prone region (after thread is done spinning and acquiring the lock) makes the issue not reproducible. I add a memory barrier to ensure all threads see consistent view of shared memory (the gtid for the next thread waiting) before the failing assertion.

hkao13 added reviewers: Hahnfeld, jlpeyton.Mar 25 2020, 9:55 AM

Harbormaster failed remote builds in B50402: Diff 252600!Mar 25 2020, 10:16 AM

Harbormaster failed remote builds in B50403: Diff 252601!

Yes, this looks related to memory consistency. As far as I understand the threads synchronize on th_spin_here, so this is guaranteed to be updated. Any other write before this in __kmp_release_queuing_lock is not guaranteed to be synchronized by a weak memory model. This includes th_next_waiting (which triggers the assertion), but also writes by the user application. That's particularly bad because this should be taken care of by the runtime!

So long story short: LGTM (provided we can move the KMP_MB a bit as mentioned inline), and this might actually fix also fix real user code.

openmp/runtime/src/kmp_lock.cpp
1241–1244	This thread waits for `th_spin_here = FALSE` (pointed to by `spin_here_p`).
1246–1252	Please move the `KMP_MB` before the debug output / right after `KMP_WAIT`, so `__kmp_dump_queuing_lock` isn't called for the wrong reasons. Also I would reword the comment, the barrier should also take care of writes from the user code.
1476–1483	Here's the release code: First set `th_next_waiting = 0`, then `KMP_MB` to sync this write to memory and finally `th_spin_here = FALSE` to release the locked thread.

This revision is now accepted and ready to land.Mar 25 2020, 10:59 AM

Moving KMP_MB above debug output.

@Hahnfeld Thanks for the review. I made your suggested changes in the latest update.

Harbormaster failed remote builds in B50419: Diff 252637!Mar 25 2020, 12:27 PM

In D76780#1941764, @Hahnfeld wrote:

Yes, this looks related to memory consistency. As far as I understand the threads synchronize on th_spin_here, so this is guaranteed to be updated. Any other write before this in __kmp_release_queuing_lock is not guaranteed to be synchronized by a weak memory model. This includes th_next_waiting (which triggers the assertion), but also writes by the user application. That's particularly bad because this should be taken care of by the runtime!

Why doesn't the KMP_MB after the store to th_next_waiting guarantee that the unblocked thread sees that store?

In D76780#1942144, @bryanpkc wrote:

Why doesn't the KMP_MB after the store to th_next_waiting guarantee that the unblocked thread sees that store?

Answering my own question, the ARM architecture reference manual states that the DMB instruction ensures that all affected memory accesses by the PE executing the DMB that appear in program order before the DMB and those which originate from a different PE, ...which have been Observed-by the PE before the DMB is executed, are Observed-by each PE, ...before any affected memory accesses that appear in program order after the DMB are Observed-by that PE.

I think this means, in this case, that the store of FALSE to th_spin_here is observed in the correct order only by the releasing thread that issued the DMB. Other threads (e.g. the spinning thread) could still see the updates to th_spin_here and th_next_waiting out of order, unless they also issue DMB, which is the fix in this patch.

In D76780#1942190, @bryanpkc wrote:

In D76780#1942144, @bryanpkc wrote:

Why doesn't the KMP_MB after the store to th_next_waiting guarantee that the unblocked thread sees that store?

Answering my own question, the ARM architecture reference manual states that the DMB instruction ensures that all affected memory accesses by the PE executing the DMB that appear in program order before the DMB and those which originate from a different PE, ...which have been Observed-by the PE before the DMB is executed, are Observed-by each PE, ...before any affected memory accesses that appear in program order after the DMB are Observed-by that PE.

I think this means, in this case, that the store of FALSE to th_spin_here is observed in the correct order only by the releasing thread that issued the DMB. Other threads (e.g. the spinning thread) could still see the updates to th_spin_here and th_next_waiting out of order, unless they also issue DMB, which is the fix in this patch.

Yes, without reference to a specific architecture I'd formulate it as follows:

releasing thread

th_next_waiting = 0;
KMP_MB();
th_spin_here = FALSE;

ensures that the write to th_next_waiting is committed to memory before th_spin_here is set.

spinning thread

KMP_WAIT(spin_here_p, FALSE, KMP_EQ, lck);
KMP_MB();
th_next_waiting != 0

ensures that the read of th_next_waiting comes from memory after spinning on th_spin_here.

Taken together this is the idiom to synchronize a value from one thread to another. (Well I think a full memory barrier is actually more than would be needed...)

@Hahnfeld Thanks for the explanation!

Closed by commit rG236ac68fa5b1: [OpenMP] Add memory barrier to solve data race (authored by hkao13, committed by bryanpkc). · Explain WhyMar 27 2020, 1:43 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

openmp/

runtime/

src/

kmp_lock.cpp

3 lines

Diff 253211

openmp/runtime/src/kmp_lock.cpp

Show First 20 Lines • Show All 1,232 Lines • ▼ Show 20 Lines	if (enqueued) {
/* corresponding wait for this write in release code */		/* corresponding wait for this write in release code */
}		}
KA_TRACE(1000,		KA_TRACE(1000,
("__kmp_acquire_queuing_lock: lck:%p, T#%d waiting for lock\n",		("__kmp_acquire_queuing_lock: lck:%p, T#%d waiting for lock\n",
lck, gtid));		lck, gtid));

KMP_MB();		KMP_MB();
// ToDo: Use __kmp_wait_sleep or similar when blocktime != inf		// ToDo: Use __kmp_wait_sleep or similar when blocktime != inf
KMP_WAIT(spin_here_p, FALSE, KMP_EQ, lck);		KMP_WAIT(spin_here_p, FALSE, KMP_EQ, lck);
		// Synchronize writes to both runtime thread structures
		// and writes in user code.
		KMP_MB();
		HahnfeldUnsubmitted Not Done Reply Inline Actions This thread waits for `th_spin_here = FALSE` (pointed to by `spin_here_p`). Hahnfeld: This thread waits for `th_spin_here = FALSE` (pointed to by `spin_here_p`).

#ifdef DEBUG_QUEUING_LOCKS		#ifdef DEBUG_QUEUING_LOCKS
TRACE_LOCK(gtid + 1, "acq spin");		TRACE_LOCK(gtid + 1, "acq spin");

if (this_thr->th.th_next_waiting != 0)		if (this_thr->th.th_next_waiting != 0)
__kmp_dump_queuing_lock(this_thr, gtid, lck, head_id_p, tail_id_p);		__kmp_dump_queuing_lock(this_thr, gtid, lck, head_id_p, tail_id_p);
#endif		#endif
KMP_DEBUG_ASSERT(this_thr->th.th_next_waiting == 0);		KMP_DEBUG_ASSERT(this_thr->th.th_next_waiting == 0);
		HahnfeldUnsubmitted Not Done Reply Inline Actions Please move the `KMP_MB` before the debug output / right after `KMP_WAIT`, so `__kmp_dump_queuing_lock` isn't called for the wrong reasons. Also I would reword the comment, the barrier should also take care of writes from the user code. Hahnfeld: Please move the `KMP_MB` before the debug output / right after `KMP_WAIT`, so…
KA_TRACE(1000, ("__kmp_acquire_queuing_lock: lck:%p, T#%d exiting: after "		KA_TRACE(1000, ("__kmp_acquire_queuing_lock: lck:%p, T#%d exiting: after "
"waiting on queue\n",		"waiting on queue\n",
lck, gtid));		lck, gtid));

#ifdef DEBUG_QUEUING_LOCKS		#ifdef DEBUG_QUEUING_LOCKS
TRACE_LOCK(gtid + 1, "acq exit 2");		TRACE_LOCK(gtid + 1, "acq exit 2");
#endif		#endif

▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines
#ifdef DEBUG_QUEUING_LOCKS		#ifdef DEBUG_QUEUING_LOCKS
if (head <= 0 \|\| tail <= 0)		if (head <= 0 \|\| tail <= 0)
__kmp_dump_queuing_lock(this_thr, gtid, lck, head, tail);		__kmp_dump_queuing_lock(this_thr, gtid, lck, head, tail);
#endif		#endif
KMP_DEBUG_ASSERT(head > 0 && tail > 0);		KMP_DEBUG_ASSERT(head > 0 && tail > 0);

/* For clean code only. Thread not released until next statement prevents		/* For clean code only. Thread not released until next statement prevents
race with acquire code. */		race with acquire code. */
head_thr->th.th_next_waiting = 0;		head_thr->th.th_next_waiting = 0;
#ifdef DEBUG_QUEUING_LOCKS		#ifdef DEBUG_QUEUING_LOCKS
TRACE_LOCK_T(gtid + 1, "rel nw=0 for t=", head);		TRACE_LOCK_T(gtid + 1, "rel nw=0 for t=", head);
#endif		#endif

KMP_MB();		KMP_MB();
/* reset spin value */		/* reset spin value */
head_thr->th.th_spin_here = FALSE;		head_thr->th.th_spin_here = FALSE;
		HahnfeldUnsubmitted Not Done Reply Inline Actions Here's the release code: First set `th_next_waiting = 0`, then `KMP_MB` to sync this write to memory and finally `th_spin_here = FALSE` to release the locked thread. Hahnfeld: Here's the release code: First set `th_next_waiting = 0`, then `KMP_MB` to sync this write to…

KA_TRACE(1000, ("__kmp_release_queuing_lock: lck:%p, T#%d exiting: after "		KA_TRACE(1000, ("__kmp_release_queuing_lock: lck:%p, T#%d exiting: after "
"dequeuing\n",		"dequeuing\n",
lck, gtid));		lck, gtid));
#ifdef DEBUG_QUEUING_LOCKS		#ifdef DEBUG_QUEUING_LOCKS
TRACE_LOCK(gtid + 1, "rel exit 2");		TRACE_LOCK(gtid + 1, "rel exit 2");
#endif		#endif
return KMP_LOCK_RELEASED;		return KMP_LOCK_RELEASED;
▲ Show 20 Lines • Show All 2,454 Lines • Show Last 20 Lines