LLVM currently emits mfence for __atomic_thread_fence(seq_cst). On
modern CPUs lock or is more efficient and provides the same sequential
consistency. GCC 11 made this switch as well (see https://gcc.gnu.org/pipermail/gcc-cvs/2020-July/314418.html)
and https://reviews.llvm.org/D61863 and https://reviews.llvm.org/D58632
moved into this direction as well, but didn't touch fence seq_cst.
Details
- Reviewers
reames craig.topper loladiro
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
I just looked at this again also, since this was requested in https://github.com/JuliaLang/julia/pull/48123 and came to the same conclusion, so LGTM, but it would be good to know from @craig.topper or @reames if there was any reason this wasn't done in D61863.
We talked at LLVMdev and the reason why it wasn't done where non-temporal memory ops. The LLVM langref and the C standard says nothing about them, but currently this is the only way to obtain a fence operation that affects them.
They asked me to do a bit of canvasing to find out if folks rely on this and/or wait to see how the GCC change shake out.
Hmm, both semantics seems reasonable, and I don't think we can just make that decision for the frontend here. Perhaps at the IR level, we need a different syncscope property that declares whether it's expected to synchronize with non-temporal operations or not and then in clang we set it properly to match GCC (potentially with a matching __builtin_nontemporal_fence()?). I can see that frontends that make use of !nontemporal would expect fence to synchronize it.