This is an archive of the discontinued LLVM Phabricator instance.

[DRAFT] Implementing new atomic orderings in LLVM and generate barriers for legacy __sync builtins. Support corresponding memory model in outline atomics as well.
Needs ReviewPublic

Authored by ilinpv on Jul 14 2022, 1:39 PM.

Details

Summary

The patch represents current effort to introduce new "sync_acq", "sync_rel", "sync_seq_cst" atomic orderings in LLVM. They used in sync builtins to generate additional barriers.
It also includes new outline atomics used for these orderings.
The work is in progress. I decided to share it with community early on to get feedback and opinions if it is the right way to implement it. At this moment it is implemented for AArch64 ( aarch64-sync-builtins.ll tests) and I give it a try for RISCV ( but it is not working yet, sync-builtins.ll tests show wrong codegen ). It can affect all targets.
It would be good to know if there are need on other targets to make __sync builtins stonger ( https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html ) like gcc does. I would appriciate any comments and knowledge sharing on this topic. Thank you in advance!

Motivation and requests for this work:
https://reviews.llvm.org/D91157
https://github.com/llvm/llvm-project/issues/29472

Diff Detail

Event Timeline

ilinpv created this revision.Jul 14 2022, 1:39 PM
Herald added a project: Restricted Project. · View Herald Transcript
ilinpv requested review of this revision.Jul 14 2022, 1:39 PM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptJul 14 2022, 1:39 PM
Herald added subscribers: llvm-commits, Restricted Project, cfe-commits and 2 others. · View Herald Transcript
ilinpv edited the summary of this revision. (Show Details)Jul 14 2022, 1:42 PM
ilinpv retitled this revision from [DRAFT] Implementing new atomic orderings in LLVM and generate barriers for legacy __sync builtins . Support corresponding memory model in outline atomics as well. to [DRAFT] Implementing new atomic orderings in LLVM and generate barriers for legacy __sync builtins. Support corresponding memory model in outline atomics as well..Jul 14 2022, 1:46 PM
ilinpv edited the summary of this revision. (Show Details)

I can't obviously see a description of what the additional barriers implied by the sync variants is (which should be in an update to LangRef at the very least, if not also in the summary itself). Inferring it from the AArch64 assembly is also difficult, and the RISC-V lowering being identical to the non-sync forms confuses me further. I don't particularly want to trawl through the web of mailing list posts to try and find out which emails have the right information and which aren't relevant, either.

llvm/include/llvm-c/Core.h
360

These comments are not particularly insightful

llvm/include/llvm/IR/Instructions.h
660

You mean SyncSequentiallyConsistent?

670

Are these correct? Without a clear description of what SyncFoo add over and above Foo it's hard to know.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 is the relevant discussion thread that added these to GCC. I personally never found it very convincing, even back in 2015 when they first made the change.

And, now, 7 years later, I'd be even more reluctant to add this to LLVM's core atomic memory model. By now, the __sync_* routines have been obsolete for over a decade. And they never really had a proper memory model description in the first place -- I don't see how they even make sense to use at all on a non-TSO (which is to say: non-X86) memory architecture, since they don't include proper atomic load/store operations.

atomicrmw add [...] sync_seq_cst is supposed to be equivalent to fence seq_cst; atomicrmw add seq_cst; fence seq_cst, I think.

The extra fences aren't necessary for most uses of __sync_*, but the difference is theoretically visible for exotic synchronization primitives. At least the gcc developers thought the difference might be visible. They therefore decided to add the extra barriers to __sync_* on AArch64 (and maybe other targets?).

As far as I know, the issue is entirely theoretical; nobody has demonstrated any issue on actual hardware, even with a synthetic testcase.

If we do decide we need to do something here, I'd prefer to model this using explicit fences in the IR, as opposed to a new atomic ordering. Adding extra atomic orderings just makes everything harder to reason about.

Wilco1 added a subscriber: Wilco1.EditedJul 15 2022, 6:46 AM

The general requirement is that inline and outline atomics have identical behaviour, and that GCC and LLVM emit the same sequences. I agree sync is badly documented, so it's hard to figure whether an extra DMB barrier could actually make a difference, but it's best to be conservative with atomics. Also it was trivial to add (GCC just adds an extra flag for sync which then emits the extra barrier if the flag is set, so you don't need to introduce new atomic models).

However if sync primitives are hardly used in the real world then perhaps it is about time to deprecate them with annoying warnings, and completely remove support next year. Does that sound reasonable?

asb added a comment.Jul 17 2022, 7:35 AM

If we do decide we need to do something here, I'd prefer to model this using explicit fences in the IR, as opposed to a new atomic ordering. Adding extra atomic orderings just makes everything harder to reason about.

+1. I think the bar for adding new memory orderings should be incredibly high. Memory models are complicated enough without introducing esoteric orderings that aren't clearly specified.

lkail added a subscriber: lkail.Jul 17 2022, 7:46 AM

The general requirement is that inline and outline atomics have identical behaviour, and that GCC and LLVM emit the same sequences. I agree sync is badly documented, so it's hard to figure whether an extra DMB barrier could actually make a difference, but it's best to be conservative with atomics. Also it was trivial to add (GCC just adds an extra flag for sync which then emits the extra barrier if the flag is set, so you don't need to introduce new atomic models).

But it _is_ a new atomic model. If we add this, then for every architecture LLVM supports, now or in the future, we'll need to figure out whether more barriers are needed in the "SYNC" mode vs C++11 standard memory orders. That's a non-trivial thing to ask.

However if sync primitives are hardly used in the real world then perhaps it is about time to deprecate them with annoying warnings, and completely remove support next year. Does that sound reasonable?

I don't know that they are "hardly used" in the real world -- there is certainly legacy code using them, although I haven't attempted to quantify the amount. Another interesting question would be: how much of that code is actually correct, at all. I'd be rather surprised if someone could identify a single piece of software which is actually correct when using the __sync_* operations on GCC, but is incorrect on Clang.

@sebpop could you ellaborate on __sync_* operations usage, are you getting issues with current Clang implementation? Do Clang need to keep supporting them and fix introducing new memory model? It seems we need compelling reasons to do that.

However if sync primitives are hardly used in the real world then perhaps it is about time to deprecate them with annoying warnings, and completely remove support next year. Does that sound reasonable?

I don't know that they are "hardly used" in the real world -- there is certainly legacy code using them, although I haven't attempted to quantify the amount. Another interesting question would be: how much of that code is actually correct, at all. I'd be rather surprised if someone could identify a single piece of software which is actually correct when using the __sync_* operations on GCC, but is incorrect on Clang.