This is an archive of the discontinued LLVM Phabricator instance.

Add missing memory barrier for queuing locks
ClosedPublic

Authored by Hahnfeld on Dec 1 2017, 6:35 AM.

Details

Summary

Otherwise I see hangs in the omp_single_copyprivate test when
compiling in release mode. With the debug assertions, I get a
failure head > 0 && tail > 0.

Diff Detail

Repository
rL LLVM

Event Timeline

Hahnfeld created this revision.Dec 1 2017, 6:35 AM
pawosm01 accepted this revision.Dec 8 2017, 6:55 AM

+1, although I ran this omp_single_copyprivate.c example in a loop 10000 times on two different AArch64 machines (ARMv8 and ARMv8.1) and didn't observe any hangs, before and after applying this patch.

This revision is now accepted and ready to land.Dec 8 2017, 6:55 AM

+1, although I ran this omp_single_copyprivate.c example in a loop 10000 times on two different AArch64 machines (ARMv8 and ARMv8.1) and didn't observe any hangs, before and after applying this patch.

That's the problem with memory consistency... Did you add the num_threads clause when testing without the memory barrier? This made it much more likely for me

This revision was automatically updated to reflect the committed changes.

I tried various number of threads with OMP_NUM_THREADS (in range between 4 and 96 on 96 cores dual socket ARMv8.1 machine, and between 2 and 4 on 4 cores single socket ARMv8 machine) and enforced affinity with OMP_PROC_BIND=spread, so I guess it answers your question.

@Hahnfeld it does, I was looking into this bug here at PGI and I came across your last fix.
Looks like the hang does not happen anymore, thanks!