This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Strengthen atomic ordering for sequentially consistent stores
ClosedPublic

Authored by paulkirth on Apr 28 2023, 2:49 PM.

Details

Summary

This is a similar change to one proposed for GCC:
https://inbox.sourceware.org/gcc-patches/20230414170942.1695672-1-patrick@rivosinc.com/

The changes in this patch are based on the proposal by Hans Boehm to more
closely match the intended semantics for sequentially consistent stores
and to allow some platforms to avoid an ABI break when switching to more
performant atomic instructions. Platforms that have already compiled
code using the existing mappings will also have more time to gradually
replace that code in preparation of the switch.

Further details can be found in the psABI proposal:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378.

This patch implements a mapping that is stronger than the one outlined in table
A.6 of the RISC-V unprivileged spec to be future compatible with table A.7 of
the same document. The related discussion can be found at
https://lists.riscv.org/g/tech-unprivileged/topic/risc_v_memory_model_topics/92916241

The major change to RISC-V code generation is that we will now emit a trailing
fence for sequentially consistent stores.

The new code sequence should have the following form:

fence rw,w; s{b|h|w|d}; fence rw,rw;

Other changes and optimizations like using amoswap will be handled separately.

Diff Detail

Event Timeline

paulkirth created this revision.Apr 28 2023, 2:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2023, 2:49 PM
paulkirth requested review of this revision.Apr 28 2023, 2:49 PM

avoid an ABI break in the future.

There will always be an ABI break if a model conflicting with the existing de-facto standard that has been in use for multiple years is adopted. That makes me very nervous.

avoid an ABI break in the future.

There will always be an ABI break if a model conflicting with the existing de-facto standard that has been in use for multiple years is adopted. That makes me very nervous.

I'll update the summary with a more substantial explanation, closer to the overall sentiment from the psABI PR.

From https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378#issue-1689133614

We believe that the Table A.7 mapping, together with the new instructions it requires, will be necessary to be performance competitive with other architectures for seq_cst operations, especially for processor designs similar to current out-of-order mobile cores. Starting with the ABI described here will allow some platforms to completely avoid an ABI break when switching to A.7. Platforms that already have code compiled to (unmodified) A.6 will get more time to gradually replace that code in preparation for such a switch.
paulkirth updated this revision to Diff 518570.May 1 2023, 2:52 PM
paulkirth retitled this revision from [RISCV] Strengthen atomic ordering for sequentially consistent Loads to [RISCV] Strengthen atomic ordering for sequentially consistent stores.
paulkirth edited the summary of this revision. (Show Details)

Update summary text.

jrtc27 added a comment.May 1 2023, 3:02 PM

With my psABI hat on I'm still very concerned about proposing breaking the ABI in future. Getting memory ordering wrong leads to a world of pain with subtle bugs that can only be reproduced on some systems some of the time. The current mapping is the de-facto ABI that should have been written down more formally, and there needs to be an extremely strong motivator to break it, not just "a small amount of code will get slightly faster". Note that I'm not claiming that's what's going on, but reading the mailing list threads doesn't immediately give a clear picture, and so those that care about this need to come forward with a clearly laid out explanation for why the current state is insufficient and must be changed to the proposed state, bearing in mind the pain that comes from breaking ABI and that there will be a many year interim period where performance is *worse* than both of the before and after states.

With my psABI hat on I'm still very concerned about proposing breaking the ABI in future. Getting memory ordering wrong leads to a world of pain with subtle bugs that can only be reproduced on some systems some of the time. The current mapping is the de-facto ABI that should have been written down more formally, and there needs to be an extremely strong motivator to break it, not just "a small amount of code will get slightly faster". Note that I'm not claiming that's what's going on, but reading the mailing list threads doesn't immediately give a clear picture, and so those that care about this need to come forward with a clearly laid out explanation for why the current state is insufficient and must be changed to the proposed state, bearing in mind the pain that comes from breaking ABI and that there will be a many year interim period where performance is *worse* than both of the before and after states.

I think there are two key things:

  1. from https://github.com/riscv/riscv-isa-manual/pull/992/files
Assuming suitable mappings for other atomic operations, setting the `aq` bit on the LR instruction, 
and setting the `rl` bit on the SC instruction makes the LR/SC sequence sequentially consistent in the 
C++ `memory_order_seq_cst` sense. Such a sequence does not act as a fence for ordering ordinary 
load and store instructions before and after the sequence.
  1. from https://github.com/riscv/riscv-isa-manual/pull/992/files#diff-b199c03e79944ce7dfb81f721f27d40252590e4daaacc8d9a414a6c4d881f737R1228
Even more importantly, a Table A.6 sequentially consistent store, followed by a Table A.7 sequentially consistent load 
can be reordered unless the Table A.6 mapping of stores is strengthened by either adding a second fence or mapping 
the store to `amoswap.rl` instead.

What these tell me is that the existing A.6 mappings are actually insufficient to implement the semantics they say they do, and users will experience some surprising results when when switching to A.7(or even if they don't). I'm not an expert in memory models, so I'll hapily defer to people w/ more expertise than I have on the subtleties here, but from the linked discussions it sounds to me that what we currently have isn't quite right.

Another point I've heard brought up a lot in these discussions, is that right now there isn't a huge amount of atomics code for RISC-V, but that will likely rapidly change. So, we need to decide if we would rather go through the pain now and impact a much smaller group of people, or defer the change and impacting significantly more code/users. Given that view, I think waiting longer to fix this will only make the problem worse. We have an opportunity to allow most code moving forward to avoid ABI breaks in the future. This is unfortunate, but we should fix these things as we are able, IMO.

Lastly, I think it's worth considering that there is a discrepancy between what GCC has emitted (and would emit after the patches linked above) and what we're doing now, so there are already some issues with the RISC-V atomics ABI. That's based on the discussions in the GCC patch and the tech-unprivileged thread.

However, I think a lot of this discussion is better suited to the psABI and related communities than on this patch. Perhaps we should recap the discusison so far on https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378?

asb added a comment.May 3 2023, 4:03 AM

The change looks good to me from a strictly code review perspective. I agree it would be good to see the concerns about a future ABI break fleshed out in the psABI issue tracker or mailing list.

As people might want to find the first version of LLVM that changed to this lowering that's more compatible with A.7, perhaps you could compose an appropriate release note? I'll admit it's a bit subtle to explain briefly.

The change looks good to me from a strictly code review perspective. I agree it would be good to see the concerns about a future ABI break fleshed out in the psABI issue tracker or mailing list.

As people might want to find the first version of LLVM that changed to this lowering that's more compatible with A.7, perhaps you could compose an appropriate release note? I'll admit it's a bit subtle to explain briefly.

Thanks for raising that issue. This will definitely need a release note. I'll try to keep it brief but it may take a few iterations to get something that explains the issue succinctly.

hboehm added a subscriber: hboehm.May 5 2023, 4:57 PM

I replied to jrtc27's comment in the psABI discussion at https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378 .

Re: Paul's concern about LR.aq; SC.rl not acting as a fence: That feature is shared by C and C++ atomic operations, so it should be perfectly fine for the translation. If I write in C++:

r1 = x.load(memory_order_relaxed);
y.fetch_add(1, memory_order_seq_cst);
r2 = z.load(memory_order_relaxed);

the loads from x and z can be reordered. fetch_add() is not a fence either.

This is desirable for two reasons:
(1) It makes atomics implementable with conventional locks (lock(); unlock() is also not a fence), and
(2) It allows cheaper seq_cst implementations that still preserve the property that if you only use seq_cst operations and locks you get sequentiial consistency.

pirama added a subscriber: pirama.May 25 2023, 9:42 AM
paulkirth planned changes to this revision.May 25 2023, 10:14 AM

Based on the conversation at the RISC-V LLVM sync, I'm gong to move this stack behind some experimental flags.

evandro removed a subscriber: evandro.May 25 2023, 1:42 PM
paulkirth updated this revision to Diff 527221.May 31 2023, 3:55 PM

Put the trailing fence behind an experimental flag.

I'm not sure this is really the best way to go about this. Most of the
"experimental" flags for RISC-V are defined in the tablegen files, but those
are mostly extensions whose entire implementation is experimental, which
doesn't seem to fit well with something that is a fairly minor transform.

Right now the flag also won't be exposed to clang very well. We could add a
module flag and try to plumb this through clang's tablegen, but that also feels
pretty wrong.

If possible I'd appreciate some guidance on the best way to acheive this from
somone with more familiarty with how this should to be done in the RISC-V
backend.

paulkirth updated this revision to Diff 527236.May 31 2023, 4:40 PM

Add back forced-atomics tests

asb added a comment.Jun 6 2023, 1:29 AM

Put the trailing fence behind an experimental flag.

I'm not sure this is really the best way to go about this. Most of the
"experimental" flags for RISC-V are defined in the tablegen files, but those
are mostly extensions whose entire implementation is experimental, which
doesn't seem to fit well with something that is a fairly minor transform.

Right now the flag also won't be exposed to clang very well. We could add a
module flag and try to plumb this through clang's tablegen, but that also feels
pretty wrong.

If possible I'd appreciate some guidance on the best way to acheive this from
somone with more familiarty with how this should to be done in the RISC-V
backend.

I'd definitely welcome input from others, but IMHO the way you've done this is fine. I agree that "experimental" is perhaps not the right concept here given it's not an ABI break and is more of an alternate lowering (which of course does have the benefit of compatibility with a future planned atomics ABI). With that in mind,-enable-seq-cst-fence would probably be fine.

In terms of plumbing to Clang properly etc, I think the main thought was to find a way to get this landed without being stuck in review limbo, where incremental follow-up patches can then follow (potentially even just flipping this to the default and leaving the old codegen as an option).

paulkirth updated this revision to Diff 529041.Jun 6 2023, 2:07 PM
paulkirth added a subscriber: mysterymath.

Switch to using tablegen to create a flag instead of using cl::opt

  • @mysterymath pointed out that I could do what I wanted similar to save/restore, which seems better.
paulkirth added inline comments.Jun 6 2023, 2:11 PM
llvm/lib/Target/RISCV/RISCVFeatures.td
810

I think I set the default wrong here

@asb, @jrtc27, @craig.topper, since psABI has merged https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378, are there any issues that we still need to iron out with the current approach?

I assume we still want this off by default until at least the ELF attributes thing is ironed out, but the experimental moniker can probably just get dropped since its in the ABI now.

asb added a comment.Jun 12 2023, 7:49 AM

There is, but I don't think this applies here - the change is completely compatible with the psABI that was being targeted before (to the best of our knowledge).

asb added a comment.Jun 12 2023, 7:51 AM

@asb, @jrtc27, @craig.topper, since psABI has merged https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378, are there any issues that we still need to iron out with the current approach?

I assume we still want this off by default until at least the ELF attributes thing is ironed out, but the experimental moniker can probably just get dropped since its in the ABI now.

Sounds like a sensible path forward to me. Drop the "experimental" moniker and I think we can land this option, IMHO.

paulkirth updated this revision to Diff 532285.Jun 16 2023, 1:49 PM

Remove experimental and update tests

paulkirth added inline comments.Jun 16 2023, 1:50 PM
llvm/lib/Target/RISCV/RISCVFeatures.td
810

so .. the behavior I'm observing here seems wrong. Passing "true" here actually seems to cause the generated macro to be GET_SUBTARGETINFO_MACRO(EnableSeqCstTrailingFence, false, enableSeqCstTrailingFence), which is what we want, but I would have thought that would only happen if we passed "false" here ... am I missing something in how the tablegen is working w/ SubtargetFeature?

craig.topper added inline comments.Jun 16 2023, 1:56 PM
llvm/lib/Target/RISCV/RISCVFeatures.td
810

The value in tablegen is not the default value. It’s the value to represent enabled. I think it can be “true” or an integer/enum. “true” means it’s a bool that starts as false. Anything else is an integer that’s starts at 0.

For the integer/enum case you can have multiple features touching the same field that all get maxed together.

paulkirth added inline comments.Jun 16 2023, 3:38 PM
llvm/lib/Target/RISCV/RISCVFeatures.td
810

Ah, thanks. I knew I was missing something in how this was intended to work. I've looked a few times through the tablegen documentation and tried to walk through the relevant definitions in the tablegen files & C++, but I've been unable to figure out exactly where I should have been looking. Could you point me to the relevant bits?

craig.topper added inline comments.Jun 16 2023, 3:47 PM
llvm/lib/Target/RISCV/RISCVFeatures.td
810

I can't find good documentation for this. The best I can find is this https://github.com/llvm/llvm-project/blob/main/llvm/utils/TableGen/SubtargetEmitter.cpp#L1803 where the initialization code for what to do when a feature is enabled is generated.

If the value is "true" or "false" it's treated as a bool and does a direct assignment. Otherwise it generates a comparison to take the maximum.

GET_SUBTARGETINFO_MACRO is relatively new within the last year or two. In the old says we had to write code in *Subtarget.h to initialize the field. I think we might still need to do that for non-bool features.

Here's the code for GET_SUBTARGET_MACRO that picks the initial value by flipping "true" or "false" SubtargetEmitter::EmitSubtargetInfoMacroCalls. Looks like it skips non-bool features.

paulkirth marked 2 inline comments as done.Jun 16 2023, 3:55 PM
paulkirth added inline comments.
llvm/lib/Target/RISCV/RISCVFeatures.td
810

Thanks so much for the context, and pointing me to that reference. Tablegen is definitely an area I'd like to be more competent with, but so far my need to interact with it has been fairly limited, since I've mostly only used it to add options to clang, or modify some attributes in the middle-end.

craig.topper added inline comments.Jun 16 2023, 3:58 PM
llvm/lib/Target/RISCV/RISCVFeatures.td
810

I'm going to go document this in https://llvm.org/docs/WritingAnLLVMBackend.html#subtarget-support and the SubtargetFeature class definition in llvm/include/llvm/Target/Target.td

paulkirth marked an inline comment as done.Jun 21 2023, 6:16 PM

ping. I think all the feedback in this patch has been addressed.

asb accepted this revision.Jun 22 2023, 7:13 AM

LGTM, thanks.

This revision is now accepted and ready to land.Jun 22 2023, 7:13 AM