This adds support for translation of LLVM IR fence instruction. We
convert a singlethread fence to a pseudo compiler barrier which becomes
0 instructions in final binary, and a thread fence to an idempotent
atomicrmw instruction to a memory address.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 28239 Build 28238: arc lint + arc unit
Event Timeline
What do you expect for relaxed, as well as for signal fences?
Could you also check in with Conrad, since he's been working on the WebAssembly memory model?
What do you expect for relaxed, as well as for signal fences?
Here I actually translated both a signal fence and a thread fence to the same thing. I think this will be conservatively correct, but maybe we should treat them differently. What if we translate a signal fence to a nothing but prevent reordering of instructions across it, and we might do the same thing for asm volatile("" ::: "memory") too. That can be something like making a pseudo instruction that eventually becomes nothing and prevents reordering across the pseudo instruction in the backend passes. What do you think?
Oh and for weaker fences, I added tests for them. They all translate to the same sequentially consistent atomicrmw or instruction.
Could you also check in with Conrad, since he's been working on the WebAssembly memory model?
I don't think Conrad is in the LLVM Phabricator here. Should we move the discussion to a github issue?
- Add support for singlethread fences
- Fixed handling of MachineMemOperand according to changes in rL339740
We discussed how to implement fences here, and I think the general consensus was even though it may not be strictly necessary to emit atomicrmw for some cases, it can be user friendly and also might help support legacy builtins like __sync_synchronize(). And IMHO it is a single instruction anyway and fences are generally expected to be expensive, so no significant harm will be done. Does anyone have more concerns on this? If not, can we land this?
I have concerns which I expressed here: https://github.com/WebAssembly/tool-conventions/issues/59#issuecomment-413620669
What is the status of the discussion here? We've gotten to the point where I have to manually add this patch to my local checkout in order to debug unrelated issues and it looks like the conversation fizzled out a while ago, so it would be great if we could land something.
I've now made an updated post on https://github.com/WebAssembly/tool-conventions/issues/59.
Given the discussion on the github issues, we have scheduled a discussion at the in-person CG meeting that should resolve the outstanding memory model issues, possibly with a change to the spec.
However, the lack of fence lowering is now AFAIK the only thing that holds us back from declaring the wasm backend as the primary/only supported codegen path and staring the process of removing fastcomp.
Given that, I'm going to propose that we apply this patch as-is, to match the current behavior of the existing emscripten pipeline, with the understanding that we can consider the issue still open and change LLVM's behavior based on the outcome of the discussions (and we can in any case ensure that we don't declare any stable ABIs or allow this change to get branched). We can even put it behind the emscripten triple if people feel strongly about that.
Any objections?
I agree; this matches the current behavior, so it does not change anything from users' standpoint, and we ever decide do something else in the CG meeting, we can always land another patch later.
If I follow the discussion correctly, there are subtle ABI issues either way, so making fence a RMW now doesn't give us fully a future-proof ABI anyway. Consequently, I would prefer to make this patch conditional on the Emscripten triple, as wasm32-wasi and wasm32-unknown-unknown don't require Emscripten compatibility.
Yeah I agree with your assessment that this doesn't give us a future-proof ABI; that's why I want to revisit it later. Mostly this is to unblock removing fastcomp.
OK I'll predicate this under emscripten triple. But anyway someone please approve this!
lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp | ||
---|---|---|
114 | This looks over-indented. Am I missing something? |
lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp | ||
---|---|---|
114 | Thanks! How did it happen...?? |
lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp | ||
---|---|---|
112 | My understanding of https://github.com/WebAssembly/tool-conventions/issues/59 is that we can lower fence to zero instructions here, rather than aborting. |
lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp | ||
---|---|---|
112 | Sorry I didn't see this before committing. This is just to ensure the current crashing behavior for non-emscripten OSes. This does not change anything for them. Didn't we decide that we should commit this CL only for emscripten triple and wait for other OSes until we make a final decision on fences? |
lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp | ||
---|---|---|
112 | The behavior before this patch is to emit zero instructions, and not crash. So this patch does change the behavior. I'm asking to restore this previous behavior for non-Emscripten OS's. |
lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp | ||
---|---|---|
112 | Really? Without this patch, it crashes for me. Of course, only when we have -mattr=+atomics; if we don't have atomics enabled all atomic instructions become normal instructions and fences become nothing for all triples. This behavior is still the same. $ llc atomic-fence.ll -mtriple=wasm32-unknown-wasi -mattr=+atomics,+sign-ext LLVM ERROR: Cannot select: t3: ch = AtomicFence t0, Constant:i32<7>, Constant:i32<1> t1: i32 = Constant<7> t2: i32 = Constant<1> In function: multithread_fence |
lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp | ||
---|---|---|
112 | Ah, you're right, I did a quick test without -mattr=+atomics, which I forget is now the thing which determines whether operations like fence are lowered by LowerAtomic. |
My understanding of https://github.com/WebAssembly/tool-conventions/issues/59 is that we can lower fence to zero instructions here, rather than aborting.