This is an archive of the discontinued LLVM Phabricator instance.

[X86] Prefer `lock or` over mfence.
AcceptedPublic

Authored by vchuravy on Jul 16 2022, 5:20 PM.

Details

Summary

LLVM currently emits mfence for __atomic_thread_fence(seq_cst). On
modern CPUs lock or is more efficient and provides the same sequential
consistency. GCC 11 made this switch as well (see https://gcc.gnu.org/pipermail/gcc-cvs/2020-July/314418.html)
and https://reviews.llvm.org/D61863 and https://reviews.llvm.org/D58632
moved into this direction as well, but didn't touch fence seq_cst.

Diff Detail

Event Timeline

vchuravy created this revision.Jul 16 2022, 5:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 16 2022, 5:20 PM
vchuravy requested review of this revision.Jul 16 2022, 5:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 16 2022, 5:20 PM
vchuravy added a project: Restricted Project.Jul 16 2022, 5:20 PM
loladiro accepted this revision.Jan 4 2023, 3:44 PM
loladiro added a reviewer: craig.topper.
loladiro added subscribers: craig.topper, loladiro.

I just looked at this again also, since this was requested in https://github.com/JuliaLang/julia/pull/48123 and came to the same conclusion, so LGTM, but it would be good to know from @craig.topper or @reames if there was any reason this wasn't done in D61863.

This revision is now accepted and ready to land.Jan 4 2023, 3:44 PM

We talked at LLVMdev and the reason why it wasn't done where non-temporal memory ops. The LLVM langref and the C standard says nothing about them, but currently this is the only way to obtain a fence operation that affects them.

They asked me to do a bit of canvasing to find out if folks rely on this and/or wait to see how the GCC change shake out.

Hmm, both semantics seems reasonable, and I don't think we can just make that decision for the frontend here. Perhaps at the IR level, we need a different syncscope property that declares whether it's expected to synchronize with non-temporal operations or not and then in clang we set it properly to match GCC (potentially with a matching __builtin_nontemporal_fence()?). I can see that frontends that make use of !nontemporal would expect fence to synchronize it.