This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/test/CodeGen/WebAssembly/
-
trunk/
-
test/
-
CodeGen/
-
WebAssembly/
-
atomic-mem-consistency.ll

Differential D49194

[WebAssembly] Add tests for weaker memory consistency orderings
ClosedPublic

Authored by aheejin on Jul 11 2018, 9:16 AM.

Download Raw Diff

Details

Reviewers

dschuff

Commits

rG8daef0751d4a: [WebAssembly] Add tests for weaker memory consistency orderings
rL337854: [WebAssembly] Add tests for weaker memory consistency orderings

Summary

Currently all wasm atomic memory access instructions are sequentially
consistent, so even if LLVM IR specifies weaker orderings than that, we
should upgrade them to sequential ordering and treat them in the same
way.

Diff Detail

Repository: rL LLVM

Event Timeline

aheejin created this revision.Jul 11 2018, 9:16 AM

Herald added subscribers: llvm-commits, sunfish, jgravelle-google, sbc100. · View Herald TranscriptJul 11 2018, 9:16 AM

Harbormaster completed remote builds in B20267: Diff 155019.Jul 11 2018, 9:16 AM

aheejin mentioned this in D49088: [WebAssembly] Support for binary atomic RMW instructions.Jul 11 2018, 9:29 AM

aheejin added a child revision: D49195: [WebAssembly] Support for a ternary atomic RMW instruction.Jul 11 2018, 9:44 AM

@jfb just as a sanity check, it should always be valid to just "upgrade" LLVM's orderings and select them all as the sequentially consistent wasm ops, right?

This revision is now accepted and ready to land.Jul 12 2018, 12:26 AM

In D49194#1159673, @dschuff wrote:

@jfb just as a sanity check, it should always be valid to just "upgrade" LLVM's orderings and select them all as the sequentially consistent wasm ops, right?

Without going into details, yes. There are reasons one could answer "no", but they're irrelevant.

Do you also upgrade volatile to seq_cst?

What do you do with fences? In particular, signal fences and release thread fences?

There's code with inline asm for the purpose of compiler fencing (i.e. just a "memory" constraint). What do you do for that?

Currently LLVM treats all atomic IR instructions as volatile. Link0 Link1 So I'm not exactly sure what you mean by 'upgrade volatile to seq_cst' is. For non-atomic volatile memory instructions, they are not atomic, so they don't have orderings.

And I haven't implemented translation rules for fence or asm volatile(""::: "memory"); yet. The wasm thread proposal does not have a fence instruction itself. Because in the current wasm spec all atomic memory accesses are sequentially consistent, do you think we can translate them to a nop safely? Please let me know if I'm mistaken.

Thank you!

In D49194#1161116, @aheejin wrote:

Currently LLVM treats all atomic IR instructions as volatile. Link0 Link1 So I'm not exactly sure what you mean by 'upgrade volatile to seq_cst' is. For non-atomic volatile memory instructions, they are not atomic, so they don't have orderings.

I mean volatile int i;. Agreed they don't have ordering, but they also don't really have semantics within WebAssembly. I think the user-friendly thing to do is to translate them to atomic WebAssembly instructions, and break up any larger than max-size volatile accesses to atomic ones (which is legal in the C++ memory model, volatile can tear). Transforming volatile to regular instructions isn't legal in the C++ memory model because volatile means you can only touch each byte once, and that's not true for regular load / store.

And I haven't implemented translation rules for fence or asm volatile(""::: "memory"); yet. The wasm thread proposal does not have a fence instruction itself. Because in the current wasm spec all atomic memory accesses are sequentially consistent, do you think we can translate them to a nop safely? Please let me know if I'm mistaken.

Fences cannot be transformed to nops: they can be used to order non-atomic instructions.

For the record, I'm opposed to having the WebAssembly backend translate asm volatile(""::: "memory") differently from other targets in LLVM or GCC, which to my knowledge always translate it to a no-op, even on platforms with hardware fence instructions.

In D49194#1170425, @sunfish wrote:

For the record, I'm opposed to having the WebAssembly backend translate asm volatile(""::: "memory") differently from other targets in LLVM or GCC, which to my knowledge always translate it to a no-op, even on platforms with hardware fence instructions.

I think it should lower and not error out, and thought should be put into what it should lower to. No-op seems fine to me, but I'd like to hear a rationale.

This is the smallest of the issues I outline above.

Thank you for the comments and sorry for the delayed reply!

@jfb

About translating volatiles to atomics, just in case we later happen to add some feature that more closely matches volatile or support more relaxed memory models than seq_cst and change volatile's translation rules, some user programs that worked before may not work anymore. But maybe this is ok because that means people have been relying on an undefined behavior..?

We currently don't have fences. Do you think using a (sequentially consistent) atomic load from an address 0 as a fence would be ok in this case? Or do you have any suggestions? And do you think we should translate asm volatile(""::: "memory") to whatever we translate a fence to as well?

@sunfish

You mean, in case they have fence instructions, they translate LLVM IR's fences to their fence instructions but not asm volatile(""::: "memory")? What do other platforms do to ensure the memory barrier if they translate asm volatile(""::: "memory") to a nop?

In D49194#1171262, @aheejin wrote:

Thank you for the comments and sorry for the delayed reply!

@jfb

About translating volatiles to atomics, just in case we later happen to add some feature that more closely matches volatile or support more relaxed memory models than seq_cst and change volatile's translation rules, some user programs that worked before may not work anymore. But maybe this is ok because that means people have been relying on an undefined behavior..?

Agreed. Further, we can't really translate volatile to regular operations without breaking the C++ memory model, and the only other choice we really have is seq_cst.

We currently don't have fences. Do you think using a (sequentially consistent) atomic load from an address 0 as a fence would be ok in this case? Or do you have any suggestions?

I think you want an idempotent RMW operation, such as seq_cst OR with 0, at an arbitrary (valid) address. The top of stack might be better because it's likely in cache (and I assume always valid?).

And do you think we should translate asm volatile(""::: "memory") to whatever we translate a fence to as well?

I just think we should translate it to something because it's a fairly common idiom (along with atomic_signal_fence). Both of these should probably just be noops (and if we ever do signals we'll strengthen the signal fence).

@sunfish

You mean, in case they have fence instructions, they translate LLVM IR's fences to their fence instructions but not asm volatile(""::: "memory")? What do other platforms do to ensure the memory barrier if they translate asm volatile(""::: "memory") to a nop?

In D49194#1170428, @jfb wrote:

In D49194#1170425, @sunfish wrote:

For the record, I'm opposed to having the WebAssembly backend translate asm volatile(""::: "memory") differently from other targets in LLVM or GCC, which to my knowledge always translate it to a no-op, even on platforms with hardware fence instructions.

I think it should lower and not error out, and thought should be put into what it should lower to. No-op seems fine to me, but I'd like to hear a rationale.

I'm not opposed to doing something here. I am opposed to doing something in the memory model that is markedly different from what all other targets in LLVM and all other compilers do, without agreeing on it with the maintainers of other backends, which I assume would be in the same boat as WebAssembly here.

In D49194#1171855, @jfb wrote:

In D49194#1171262, @aheejin wrote:

@jfb

About translating volatiles to atomics, just in case we later happen to add some feature that more closely matches volatile or support more relaxed memory models than seq_cst and change volatile's translation rules, some user programs that worked before may not work anymore. But maybe this is ok because that means people have been relying on an undefined behavior..?

Agreed. Further, we can't really translate volatile to regular operations without breaking the C++ memory model, and the only other choice we really have is seq_cst.

Could you say more about this? My non-expert understanding of volatile is that it doesn't do anything unless you're doing memory-mapped I/O or signal handlers, neither of which WebAssembly has right now. I'd be happy to read any background material or standard citations you could provide.

@sunfish

You mean, in case they have fence instructions, they translate LLVM IR's fences to their fence instructions but not asm volatile(""::: "memory")? What do other platforms do to ensure the memory barrier if they translate asm volatile(""::: "memory") to a nop?

My current position is that asm volatile(""::: "memory") should translate to zero instructions (specifically, not even a no-op instruction). This is what it currently does, so no change is needed.

In D49194#1171855, @jfb wrote:

In D49194#1171262, @aheejin wrote:

@jfb

About translating volatiles to atomics, just in case we later happen to add some feature that more closely matches volatile or support more relaxed memory models than seq_cst and change volatile's translation rules, some user programs that worked before may not work anymore. But maybe this is ok because that means people have been relying on an undefined behavior..?

Agreed. Further, we can't really translate volatile to regular operations without breaking the C++ memory model, and the only other choice we really have is seq_cst.

Could you say more about this? My non-expert understanding of volatile is that it doesn't do anything unless you're doing memory-mapped I/O or signal handlers, neither of which WebAssembly has right now. I'd be happy to read any background material or standard citations you could provide.

I have a talk for you! https://youtu.be/IB57wIf9W1k?t=52m20s
I talk about volatile in a few places, but 52m20s is the main one. Basically I'm advocating for something similar to the weird MSVC x86 volatile behavior because some users write code like a loop which updates a volatile variable which another thread reads, and expect it to stay in the loop (not hoisted out). Lowering to regular load / store breaks the guarantee that volatiles effects touch the variable's bytes once per source event. Such user code should really use atomic, but we don't need to be pedants about it.

I can make a more elaborate argument for it, but it's really about not being assholes to users who just want their updates to be visible. I have a bunch of reference in the talk's GitHub page, some about volatile.

If / when we add support for shared memory, setjmp / longjmp, signals, then having volatile translate to a sequence of always-lock-free load / store (with extra volatile semantics to preserve correctness) won't be a big deal. We can totally break up the wider atomics, and we don't have a choice anyways, so that's fine. RMW volatile operations don't really make sense, but we can do the same thing (seq_cst, and break up larger things).

@sunfish

You mean, in case they have fence instructions, they translate LLVM IR's fences to their fence instructions but not asm volatile(""::: "memory")? What do other platforms do to ensure the memory barrier if they translate asm volatile(""::: "memory") to a nop?

My current position is that asm volatile(""::: "memory") should translate to zero instructions (specifically, not even a no-op instruction). This is what it currently does, so no change is needed.

Agreed. I meant actual noop, not our other not-actual noop :-)

In D49194#1172051, @jfb wrote:

I have a talk for you! https://youtu.be/IB57wIf9W1k?t=52m20s
I talk about volatile in a few places, but 52m20s is the main one. Basically I'm advocating for something similar to the weird MSVC x86 volatile behavior because some users write code like a loop which updates a volatile variable which another thread reads, and expect it to stay in the loop (not hoisted out). Lowering to regular load / store breaks the guarantee that volatiles effects touch the variable's bytes once per source event. Such user code should really use atomic, but we don't need to be pedants about it.

Your talk mentions -fms-volatile, which looks like the functionality you're asking for with volatile here, right? If so, then we seem to be good here. If that's desirable to enable by default, please bring that up with the clang maintainers.

In D49194#1172504, @sunfish wrote:

In D49194#1172051, @jfb wrote:

I have a talk for you! https://youtu.be/IB57wIf9W1k?t=52m20s
I talk about volatile in a few places, but 52m20s is the main one. Basically I'm advocating for something similar to the weird MSVC x86 volatile behavior because some users write code like a loop which updates a volatile variable which another thread reads, and expect it to stay in the loop (not hoisted out). Lowering to regular load / store breaks the guarantee that volatiles effects touch the variable's bytes once per source event. Such user code should really use atomic, but we don't need to be pedants about it.

Your talk mentions -fms-volatile, which looks like the functionality you're asking for with volatile here, right? If so, then we seem to be good here. If that's desirable to enable by default, please bring that up with the clang maintainers.

I don't think we want the same behavior, and we certainly only want it for WebAssembly.

So far, things we decided:

An LLVM fence can be translated to an idempotent atomicrmw operation
asm volatile(""::: "memory") is currently translated to 0 operation and we will live it at that
For volatile, we can make it sequentially consistent atomic operation like MSVC to be user friendy, but some people a wasm CG meeting, and we will follow up in 2 weeks.

While we can continue discussions here if necessary, I'll land this CL, because this CL does not depend on the volatile thing. Or we can move the discussion to a Github issue or something if you want.

Closed by commit rL337854: [WebAssembly] Add tests for weaker memory consistency orderings (authored by aheejin). · Explain WhyJul 24 2018, 2:07 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

test/

CodeGen/

WebAssembly/

atomic-mem-consistency.ll

141 lines

Diff 157126

llvm/trunk/test/CodeGen/WebAssembly/atomic-mem-consistency.ll

				; RUN: not llc < %s -asm-verbose=false -disable-wasm-fallthrough-return-opt
				; RUN: llc < %s -asm-verbose=false -disable-wasm-fallthrough-return-opt -disable-wasm-explicit-locals -mattr=+atomics,+sign-ext \| FileCheck %s

				; Currently all wasm atomic memory access instructions are sequentially
				; consistent, so even if LLVM IR specifies weaker orderings than that, we
				; should upgrade them to sequential ordering and treat them in the same way.

				target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
				target triple = "wasm32-unknown-unknown"

				;===----------------------------------------------------------------------------
				; Atomic loads
				;===----------------------------------------------------------------------------

				; The 'release' and 'acq_rel' orderings are not valid on load instructions.

				; CHECK-LABEL: load_i32_unordered:
				; CHECK: i32.atomic.load $push0=, 0($0){{$}}
				; CHECK-NEXT: return $pop0{{$}}
				define i32 @load_i32_unordered(i32 *%p) {
				%v = load atomic i32, i32* %p unordered, align 4
				ret i32 %v
				}

				; CHECK-LABEL: load_i32_monotonic:
				; CHECK: i32.atomic.load $push0=, 0($0){{$}}
				; CHECK-NEXT: return $pop0{{$}}
				define i32 @load_i32_monotonic(i32 *%p) {
				%v = load atomic i32, i32* %p monotonic, align 4
				ret i32 %v
				}

				; CHECK-LABEL: load_i32_acquire:
				; CHECK: i32.atomic.load $push0=, 0($0){{$}}
				; CHECK-NEXT: return $pop0{{$}}
				define i32 @load_i32_acquire(i32 *%p) {
				%v = load atomic i32, i32* %p acquire, align 4
				ret i32 %v
				}

				; CHECK-LABEL: load_i32_seq_cst:
				; CHECK: i32.atomic.load $push0=, 0($0){{$}}
				; CHECK-NEXT: return $pop0{{$}}
				define i32 @load_i32_seq_cst(i32 *%p) {
				%v = load atomic i32, i32* %p seq_cst, align 4
				ret i32 %v
				}

				;===----------------------------------------------------------------------------
				; Atomic stores
				;===----------------------------------------------------------------------------

				; The 'acquire' and 'acq_rel' orderings aren’t valid on store instructions.

				; CHECK-LABEL: store_i32_unordered:
				; CHECK-NEXT: .param i32, i32{{$}}
				; CHECK-NEXT: i32.atomic.store 0($0), $1{{$}}
				; CHECK-NEXT: return{{$}}
				define void @store_i32_unordered(i32 *%p, i32 %v) {
				store atomic i32 %v, i32* %p unordered, align 4
				ret void
				}

				; CHECK-LABEL: store_i32_monotonic:
				; CHECK-NEXT: .param i32, i32{{$}}
				; CHECK-NEXT: i32.atomic.store 0($0), $1{{$}}
				; CHECK-NEXT: return{{$}}
				define void @store_i32_monotonic(i32 *%p, i32 %v) {
				store atomic i32 %v, i32* %p monotonic, align 4
				ret void
				}

				; CHECK-LABEL: store_i32_release:
				; CHECK-NEXT: .param i32, i32{{$}}
				; CHECK-NEXT: i32.atomic.store 0($0), $1{{$}}
				; CHECK-NEXT: return{{$}}
				define void @store_i32_release(i32 *%p, i32 %v) {
				store atomic i32 %v, i32* %p release, align 4
				ret void
				}

				; CHECK-LABEL: store_i32_seq_cst:
				; CHECK-NEXT: .param i32, i32{{$}}
				; CHECK-NEXT: i32.atomic.store 0($0), $1{{$}}
				; CHECK-NEXT: return{{$}}
				define void @store_i32_seq_cst(i32 *%p, i32 %v) {
				store atomic i32 %v, i32* %p seq_cst, align 4
				ret void
				}

				;===----------------------------------------------------------------------------
				; Atomic read-modify-writes
				;===----------------------------------------------------------------------------

				; Out of several binary RMW instructions, here we test 'add' as an example.
				; The 'unordered' ordering is not valid on atomicrmw instructions.

				; CHECK-LABEL: add_i32_monotonic:
				; CHECK-NEXT: .param i32, i32{{$}}
				; CHECK: i32.atomic.rmw.add $push0=, 0($0), $1{{$}}
				; CHECK-NEXT: return $pop0{{$}}
				define i32 @add_i32_monotonic(i32* %p, i32 %v) {
				%old = atomicrmw add i32* %p, i32 %v monotonic
				ret i32 %old
				}

				; CHECK-LABEL: add_i32_acquire:
				; CHECK-NEXT: .param i32, i32{{$}}
				; CHECK: i32.atomic.rmw.add $push0=, 0($0), $1{{$}}
				; CHECK-NEXT: return $pop0{{$}}
				define i32 @add_i32_acquire(i32* %p, i32 %v) {
				%old = atomicrmw add i32* %p, i32 %v acquire
				ret i32 %old
				}

				; CHECK-LABEL: add_i32_release:
				; CHECK-NEXT: .param i32, i32{{$}}
				; CHECK: i32.atomic.rmw.add $push0=, 0($0), $1{{$}}
				; CHECK-NEXT: return $pop0{{$}}
				define i32 @add_i32_release(i32* %p, i32 %v) {
				%old = atomicrmw add i32* %p, i32 %v release
				ret i32 %old
				}

				; CHECK-LABEL: add_i32_acq_rel:
				; CHECK-NEXT: .param i32, i32{{$}}
				; CHECK: i32.atomic.rmw.add $push0=, 0($0), $1{{$}}
				; CHECK-NEXT: return $pop0{{$}}
				define i32 @add_i32_acq_rel(i32* %p, i32 %v) {
				%old = atomicrmw add i32* %p, i32 %v acq_rel
				ret i32 %old
				}

				; CHECK-LABEL: add_i32_seq_cst:
				; CHECK-NEXT: .param i32, i32{{$}}
				; CHECK: i32.atomic.rmw.add $push0=, 0($0), $1{{$}}
				; CHECK-NEXT: return $pop0{{$}}
				define i32 @add_i32_seq_cst(i32* %p, i32 %v) {
				%old = atomicrmw add i32* %p, i32 %v seq_cst
				ret i32 %old
				}