Currently all wasm atomic memory access instructions are sequentially
consistent, so even if LLVM IR specifies weaker orderings than that, we
should upgrade them to sequential ordering and treat them in the same
way.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
@jfb just as a sanity check, it should always be valid to just "upgrade" LLVM's orderings and select them all as the sequentially consistent wasm ops, right?
Without going into details, yes. There are reasons one could answer "no", but they're irrelevant.
Do you also upgrade volatile to seq_cst?
What do you do with fences? In particular, signal fences and release thread fences?
There's code with inline asm for the purpose of compiler fencing (i.e. just a "memory" constraint). What do you do for that?
Currently LLVM treats all atomic IR instructions as volatile. Link0 Link1 So I'm not exactly sure what you mean by 'upgrade volatile to seq_cst' is. For non-atomic volatile memory instructions, they are not atomic, so they don't have orderings.
And I haven't implemented translation rules for fence or asm volatile(""::: "memory"); yet. The wasm thread proposal does not have a fence instruction itself. Because in the current wasm spec all atomic memory accesses are sequentially consistent, do you think we can translate them to a nop safely? Please let me know if I'm mistaken.
Thank you!
I mean volatile int i;. Agreed they don't have ordering, but they also don't really have semantics within WebAssembly. I think the user-friendly thing to do is to translate them to atomic WebAssembly instructions, and break up any larger than max-size volatile accesses to atomic ones (which is legal in the C++ memory model, volatile can tear). Transforming volatile to regular instructions isn't legal in the C++ memory model because volatile means you can only touch each byte once, and that's not true for regular load / store.
And I haven't implemented translation rules for fence or asm volatile(""::: "memory"); yet. The wasm thread proposal does not have a fence instruction itself. Because in the current wasm spec all atomic memory accesses are sequentially consistent, do you think we can translate them to a nop safely? Please let me know if I'm mistaken.
Fences cannot be transformed to nops: they can be used to order non-atomic instructions.
For the record, I'm opposed to having the WebAssembly backend translate asm volatile(""::: "memory") differently from other targets in LLVM or GCC, which to my knowledge always translate it to a no-op, even on platforms with hardware fence instructions.
I think it should lower and not error out, and thought should be put into what it should lower to. No-op seems fine to me, but I'd like to hear a rationale.
This is the smallest of the issues I outline above.
Thank you for the comments and sorry for the delayed reply!
- About translating volatiles to atomics, just in case we later happen to add some feature that more closely matches volatile or support more relaxed memory models than seq_cst and change volatile's translation rules, some user programs that worked before may not work anymore. But maybe this is ok because that means people have been relying on an undefined behavior..?
- We currently don't have fences. Do you think using a (sequentially consistent) atomic load from an address 0 as a fence would be ok in this case? Or do you have any suggestions? And do you think we should translate asm volatile(""::: "memory") to whatever we translate a fence to as well?
- You mean, in case they have fence instructions, they translate LLVM IR's fences to their fence instructions but not asm volatile(""::: "memory")? What do other platforms do to ensure the memory barrier if they translate asm volatile(""::: "memory") to a nop?
Agreed. Further, we can't really translate volatile to regular operations without breaking the C++ memory model, and the only other choice we really have is seq_cst.
- We currently don't have fences. Do you think using a (sequentially consistent) atomic load from an address 0 as a fence would be ok in this case? Or do you have any suggestions?
I think you want an idempotent RMW operation, such as seq_cst OR with 0, at an arbitrary (valid) address. The top of stack might be better because it's likely in cache (and I assume always valid?).
And do you think we should translate asm volatile(""::: "memory") to whatever we translate a fence to as well?
I just think we should translate it to something because it's a fairly common idiom (along with atomic_signal_fence). Both of these should probably just be noops (and if we ever do signals we'll strengthen the signal fence).
- You mean, in case they have fence instructions, they translate LLVM IR's fences to their fence instructions but not asm volatile(""::: "memory")? What do other platforms do to ensure the memory barrier if they translate asm volatile(""::: "memory") to a nop?
I'm not opposed to doing something here. I am opposed to doing something in the memory model that is markedly different from what all other targets in LLVM and all other compilers do, without agreeing on it with the maintainers of other backends, which I assume would be in the same boat as WebAssembly here.
Could you say more about this? My non-expert understanding of volatile is that it doesn't do anything unless you're doing memory-mapped I/O or signal handlers, neither of which WebAssembly has right now. I'd be happy to read any background material or standard citations you could provide.
- You mean, in case they have fence instructions, they translate LLVM IR's fences to their fence instructions but not asm volatile(""::: "memory")? What do other platforms do to ensure the memory barrier if they translate asm volatile(""::: "memory") to a nop?
My current position is that asm volatile(""::: "memory") should translate to zero instructions (specifically, not even a no-op instruction). This is what it currently does, so no change is needed.
Could you say more about this? My non-expert understanding of volatile is that it doesn't do anything unless you're doing memory-mapped I/O or signal handlers, neither of which WebAssembly has right now. I'd be happy to read any background material or standard citations you could provide.
I have a talk for you! https://youtu.be/IB57wIf9W1k?t=52m20s
I talk about volatile in a few places, but 52m20s is the main one. Basically I'm advocating for something similar to the weird MSVC x86 volatile behavior because some users write code like a loop which updates a volatile variable which another thread reads, and expect it to stay in the loop (not hoisted out). Lowering to regular load / store breaks the guarantee that volatiles effects touch the variable's bytes once per source event. Such user code should really use atomic, but we don't need to be pedants about it.
I can make a more elaborate argument for it, but it's really about not being assholes to users who just want their updates to be visible. I have a bunch of reference in the talk's GitHub page, some about volatile.
If / when we add support for shared memory, setjmp / longjmp, signals, then having volatile translate to a sequence of always-lock-free load / store (with extra volatile semantics to preserve correctness) won't be a big deal. We can totally break up the wider atomics, and we don't have a choice anyways, so that's fine. RMW volatile operations don't really make sense, but we can do the same thing (seq_cst, and break up larger things).
- You mean, in case they have fence instructions, they translate LLVM IR's fences to their fence instructions but not asm volatile(""::: "memory")? What do other platforms do to ensure the memory barrier if they translate asm volatile(""::: "memory") to a nop?
My current position is that asm volatile(""::: "memory") should translate to zero instructions (specifically, not even a no-op instruction). This is what it currently does, so no change is needed.
Agreed. I meant actual noop, not our other not-actual noop :-)
Your talk mentions -fms-volatile, which looks like the functionality you're asking for with volatile here, right? If so, then we seem to be good here. If that's desirable to enable by default, please bring that up with the clang maintainers.
I don't think we want the same behavior, and we certainly only want it for WebAssembly.
So far, things we decided:
- An LLVM fence can be translated to an idempotent atomicrmw operation
- asm volatile(""::: "memory") is currently translated to 0 operation and we will live it at that
- For volatile, we can make it sequentially consistent atomic operation like MSVC to be user friendy, but some people a wasm CG meeting, and we will follow up in 2 weeks.
While we can continue discussions here if necessary, I'll land this CL, because this CL does not depend on the volatile thing. Or we can move the discussion to a Github issue or something if you want.