This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add codegen support for atomic load/stores with RV32A
ClosedPublic

Authored by asb on May 31 2018, 6:59 AM.

Details

Summary

Fences are inserted according to table A.6 in the current draft of version 2.3 of the RISC-V Instruction Set Manual, which incorporates the memory model changes and definitions contributed by the RISC-V Memory Consistency Model
task group.

Instruction selection failures will now occur for 8/16/32-bit atomicrmw and cmpxchg operations when targeting RV32IA until lowering for these operations is added in a follow-on patch.

Diff Detail

Event Timeline

asb created this revision.May 31 2018, 6:59 AM
jfb added a comment.May 31 2018, 8:42 AM

When the A extension is supported, __atomic libcalls will be generated for any atomic that isn't the native word size or has less than natural alignment.

When do you expect non-natural alignment to occur? Is this purely for C++ support? If so you're guaranteed natural alignment. Otherwise (for intrinsics or for other languages) I'd like to understand what you expect, and whether you have the guarantee that the alignment information you have is correct. Without knowing that it's absolutely correct you're going to codegen bad code (a libcall in one place, and instructions in another).

jfb added a comment.May 31 2018, 8:52 AM
In D47589#1117778, @jfb wrote:

When the A extension is supported, __atomic libcalls will be generated for any atomic that isn't the native word size or has less than natural alignment.

When do you expect non-natural alignment to occur? Is this purely for C++ support? If so you're guaranteed natural alignment. Otherwise (for intrinsics or for other languages) I'd like to understand what you expect, and whether you have the guarantee that the alignment information you have is correct. Without knowing that it's absolutely correct you're going to codegen bad code (a libcall in one place, and instructions in another).

In fact, let me go further and quote the langref:

'load' instruction

align must be explicitly specified on atomic loads, and the load has undefined behavior if the alignment is not set to a value which is at least the size in bytes of the pointee.

It sounds like you'll want to remove there UB from the IR before your work proceeds. I'm still not sure that you want what you said you did :-)

asb added a comment.EditedMay 31 2018, 11:55 AM
In D47589#1117778, @jfb wrote:

When the A extension is supported, __atomic libcalls will be generated for any atomic that isn't the native word size or has less than natural alignment.

When do you expect non-natural alignment to occur? Is this purely for C++ support? If so you're guaranteed natural alignment. Otherwise (for intrinsics or for other languages) I'd like to understand what you expect, and whether you have the guarantee that the alignment information you have is correct. Without knowing that it's absolutely correct you're going to codegen bad code (a libcall in one place, and instructions in another).

That comment simply reflects the status quo for behaviour of AtomicExpandPass, that I replicated. Do you think it would be worth doing report_fatal_error if Align < Size?

asb added a subscriber: regehr.EditedMay 31 2018, 12:13 PM
In D47589#1117978, @asb wrote:
In D47589#1117778, @jfb wrote:

When the A extension is supported, __atomic libcalls will be generated for any atomic that isn't the native word size or has less than natural alignment.

When do you expect non-natural alignment to occur? Is this purely for C++ support? If so you're guaranteed natural alignment. Otherwise (for intrinsics or for other languages) I'd like to understand what you expect, and whether you have the guarantee that the alignment information you have is correct. Without knowing that it's absolutely correct you're going to codegen bad code (a libcall in one place, and instructions in another).

That comment simply reflects the status quo for behaviour of AtomicExpandPass, that I replicated. Do you think it would be worth doing report_fatal_error if Align < Size?

Acutally I'm not sure it's reasonable to quit at compile-time when encountering UB (@regehr - thoughts?). Emitting an __atomic_* libcall in this case seems like a reasonable choice, so perhaps the fix is simply to be clear that an instruction with Align < Size has UB and emitting the __atomic_* call is the behaviour we choose.

LangRef should be fixed... it doesn't make sense to have undefined behavior with respect to a static property of the instruction, and I think it's out-of-date with recent changes. (Support for unaligned atomic load/store was recently added as an extension for AVR.)

jfb added a comment.May 31 2018, 12:46 PM

Making it the behavior we choose is fine with me. I just understood your comment to mean that you wanted to give this semantics, and that's more work than just stating the semantics you want.

LangRef should be fixed... it doesn't make sense to have undefined behavior with respect to a static property of the instruction, and I think it's out-of-date with recent changes. (Support for unaligned atomic load/store was recently added as an extension for AVR.)

Say I have two TUs which share an atomic location, but have a different idea of its alignment. They both access that location, one with libcall and one with an instruction. That's clearly broken. Can this happen?

Say I have two TUs which share an atomic location, but have a different idea of its alignment. They both access that location, one with libcall and one with an instruction. That's clearly broken. Can this happen?

No, it's not broken (assuming the location is actually aligned at runtime). In the translation unit where the location is known aligned, it'll use the native lock-free instruction. In the other translation unit, the libatomic call will check the alignment of the address at runtime, see it's aligned, and use a compatible lock-free implementation.

jfb added a comment.May 31 2018, 1:10 PM

Say I have two TUs which share an atomic location, but have a different idea of its alignment. They both access that location, one with libcall and one with an instruction. That's clearly broken. Can this happen?

No, it's not broken (assuming the location is actually aligned at runtime). In the translation unit where the location is known aligned, it'll use the native lock-free instruction. In the other translation unit, the libatomic call will check the alignment of the address at runtime, see it's aligned, and use a compatible lock-free implementation.

Is there a guarantee that __atomic_* functions start off with an alignment check, and use a compatible instruction if suitably aligned? This code doesn't seem to do so.

The compiler-rt "__atomic_*" are known to be buggy; see https://reviews.llvm.org/D45321

jfb added a comment.May 31 2018, 2:03 PM

The compiler-rt "__atomic_*" are known to be buggy; see https://reviews.llvm.org/D45321

Seems you're advocating that this patch (and the LangRef) follow the reality you'd like to have, not the one we actually have :-)
So I'll re-iterate: is the alignment check a documented guarantee that __atomic_* functions must provide, in all of their implementations?

Seems you're advocating that this patch (and the LangRef) follow the reality you'd like to have, not the one we actually have :-)

My description matches the way GNU libatomic works in practice, as far as I know. (compiler-rt's implementation is incomplete and broken in other ways anyway; see https://reviews.llvm.org/D47606.)

So I'll re-iterate: is the alignment check a documented guarantee that __atomic_* functions must provide, in all of their implementations?

The specialized forms (__atomic_load_8 etc.) assume natural alignment, but the forms which take a size argument check alignment. This is why __atomic_is_lock_free has a pointer argument. See https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary .

jfb added a comment.May 31 2018, 2:49 PM

Seems you're advocating that this patch (and the LangRef) follow the reality you'd like to have, not the one we actually have :-)

My description matches the way GNU libatomic works in practice, as far as I know. (compiler-rt's implementation is incomplete and broken in other ways anyway; see https://reviews.llvm.org/D47606.)

I can't look at the implementation, but the docs are kinda unclear: https://gcc.gnu.org/wiki/Atomic/GCCMM/UnalignedPolicy
If we intend to make this guarantee official it would be nice for our GCC friends to do the same. Will send a ping.

So I'll re-iterate: is the alignment check a documented guarantee that __atomic_* functions must provide, in all of their implementations?

The specialized forms (__atomic_load_8 etc.) assume natural alignment, but the forms which take a size argument check alignment. This is why __atomic_is_lock_free has a pointer argument. See https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary .

That page says:

All the optimized routines expect that the object will be properly aligned for a data type of the specified size. Results are undefined for objects not properly aligned if they are invoked directly. (ie, may or may not work as expected). The compiler will not map the generic routine to an optimized routine unless the alignment is correct.

Is that not the case?

My understanding was that __atomic_is_lock_free took a pointer argument because of C's specification, but C later backtracked on alignment (because it was another of their changes which C++ didn't have) and now ignores it. From the docs it looks like __atomic_is_lock_free indeed guarantees it'll look at alignment. I'm not sure about the actual atomic functions though.

jyknight added a comment.EditedJun 1 2018, 8:44 AM
In D47589#1118174, @jfb wrote:

The compiler-rt "__atomic_*" are known to be buggy; see https://reviews.llvm.org/D45321

Seems you're advocating that this patch (and the LangRef) follow the reality you'd like to have, not the one we actually have :-)
So I'll re-iterate: is the alignment check a documented guarantee that __atomic_* functions must provide, in all of their implementations?

The compiler must not emit a call to the __atomic_load_4 (and other names with sizes on the end) functions on unaligned memory -- that function can assume its argument is aligned.

But __atomic_load (size is given as an argument) must be safe to call on both aligned and unaligned memory, *and* must safely interoperate both with __atomic_load_4 and inline-emitted atomics if the current hardware supports them.

In D47589#1118174, @jfb wrote:

The compiler-rt "__atomic_*" are known to be buggy; see https://reviews.llvm.org/D45321

Seems you're advocating that this patch (and the LangRef) follow the reality you'd like to have, not the one we actually have :-)
So I'll re-iterate: is the alignment check a documented guarantee that __atomic_* functions must provide, in all of their implementations?

The compiler must not emit a call to the __atomic_load_4 (and other names with sizes on the end) functions on unaligned memory -- that function can assume its argument is aligned.

But __atomic_load (size is given as an argument) must be safe to call on both aligned and unaligned memory, *and* must safely interoperate both with __atomic_load_4 and inline-emitted atomics if the current hardware supports them.

Can you refer to documentation that says so? I'm happy if that's the case, but I want to be really sure it is. Not just for compiler-rt, but for GCC as well (I've ping'd someone on that side to get more info). If so the @t.p.northover's patch fixing some alignment stuff is also something we want, and we'll want to update some documentation.

In D47589#1119221, @jfb wrote:

Can you refer to documentation that says so? I'm happy if that's the case, but I want to be really sure it is. Not just for compiler-rt, but for GCC as well (I've ping'd someone on that side to get more info). If so the @t.p.northover's patch fixing some alignment stuff is also something we want, and we'll want to update some documentation.

That is what GCC's LIbrary docs you referred to before say -- although I agree it's not as clear as it could be.

Starting with "All the optimized routines expect that the object will be properly aligned for a data type of the specified size." In this doc, "optimized routines" means the function names ending with a size, so __atomic_load_4, but not __atomic_load. Then it says: "The compiler will not map the generic routine to an optimized routine unless the alignment is correct." The clear implication of both those sentences together is that __atomic_load does _NOT_ require the object will be properly aligned.

The next question perhaps is whether __atomic_load *MUST* use a lock-free implementation when given suitably-aligned data. The answer to that is the same as the answer to whether __atomic_load_4 must, because "A lazy compiler may simply call the generic routine, bypassing the optimized versions" -- so they must both use the same atomicity mechanism.

And the answer for both is effectively "Yes if the runtime-detected CPU supports it", because you have the requirement that code built for a CPU without atomic instructions for a given size (and thus calling into libatomic for everything), must interoperate properly with code built for a newer CPU model that does support lock-free atomics of that size. (e.g. if I build one .so with -march=386 and another .so with -march=486, they must work together). That implies that atomic_load and atomic_load_4 *MUST* use native atomic ops in at least as many situations as the compiler could have emitted inline atomics for.

asb updated this revision to Diff 150326.EditedJun 7 2018, 7:06 AM
asb edited the summary of this revision. (Show Details)

This patch now enables lowering of 8, 16, and 32-bit atomic load/stores. We no longer rely on D47553, as the follow-up patch to lower atomicrmw supports partword and native size atomics.

jyknight accepted this revision.Jun 7 2018, 8:07 AM
This revision is now accepted and ready to land.Jun 7 2018, 8:07 AM
This revision was automatically updated to reflect the committed changes.