Depends on D4959.
Details
Diff Detail
Event Timeline
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
11008 | I just found that the test CodeGen/ARM/swift-atomics suggest that this case should fall-through (i.e. that a DMB ishst is correct in this case for Swift processor). I can change it, but I would love to link to some documentation saying so more clearly first, and I cannot seem to find it. Does anyone know where this is documented (or at least can confirm that dmb ishst is a valid leading fence for seq_cst stores)? |
Is there documentation for ARM that you can point at to explain why this is correct? Maybe the ARM barrier litmus tests and cookboook?
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
10985 | This should be static, and the Domain parameter an ARM_MB::MemBOpt. | |
11035 | Link to documentation that explains this. |
Add comment pointing to documentation per jfb request.
I did not point directly to the ARM documentation or barrier cookbook because
it is expressed in terms of the hardware memory model and the mapping to C++11
is not completely obvious. Instead I have linked the webpage that summarizes
the findings of a research group that spent the last decade formalizing/proving
these kind of things. It itself has links to other more primary sources. Is it
enough?
The documentation you added looks good to me, so does the change.
Waiting on @t.p.northover (or another Applet) for the swift issue you mentioned above.
Hi Robin,
This patch seems to not do anything. Is this needed to something else? Are these functions going to be used somewhere else? If they are, why not implement it on the same patch? Otherwise, this just adds dead code.
It also makes is impossible for you to test this...
cheers,
--renato
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
10992 | www | |
11002 | "Invalid fence" seems less aggressive... :) | |
11030 | idem | |
lib/Target/ARM/ARMISelLowering.h | ||
422–423 | this could be left out |
Thanks for the review,
Indeed, this patch is used in D4961. I can fairly easily merge the two patches if you want, I'm still struggling to find the right way of cutting patches (too big and it is hard to review, too small and it is temporarily dead code like here).
I will fix the other things you point out ASAP.
Merges D4961 into this one based on rengolin suggestion that I make the patch
testable in this way.
Also fixed the typos he found.
Here is the new summary of the commit as a result of the merge:
Use target-dependent emitLeading/TrailingFence instead of the target-independent insertLeading/TrailingFence (in AtomicExpandPass) Fixes two latent bugs: - There was no fence inserted before expanded seq_cst load (unsound on Power) - There was only a fence release before seq_cst stores (again unsound, in particular on Power) It is not even clear if this is correct on ARM swift processors (where release fences are DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift as it is not clear whether it is incorrect. I would love to get documentation stating whether it is correct or not. These two bugs were not triggered because Power is not (yet) using this pass, and these behaviours happen to be (mostly?) working on ARM (although they completely butchered the semantics of the llvm IR). See: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html for an example of the problems that can be caused by the second of these bugs. I couldn't see a way of fixing these in a completely target-independent way without adding lots of unnecessary fences on ARM, hence the target-dependent parts of this patch. This patch implements the new target-dependent parts only for ARM (the default of not doing anything is enough for AArch64), other architectures will use this infrastructure in later patches.
Adds Jim Grosbach as a reviewer based on his addition by rengolin to the now defunct D4961.
I should defer to Tim Northover on this sort of thing. He's better equipped than I to get into the details of atomics.
Same here. I can't see anything wrong with the code, but I might be missing a lot of the atomic context. Maybe James can also comment on that side?
cheers,
--renato
Hi Robin,
Sorry, this got lost over my holiday. I think it's mostly fine now. Just one very minor style point and a possibly misleading comment.
Cheers.
Tim.
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
10985 | Should probably be "makeDMB", for LLVM style. | |
11036–11037 | By my tests, an ISB is even heavier than a DMB, so we probably don't want to encourage this alternative. I'd just skip the comment entirely. |
Thanks for the review !
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
10985 | Ok, I will do it. | |
11036–11037 | Awesome, I was planning on doing benchmarks to evaluate this possibility; thanks for having done them :-) |
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
11036–11037 | I'd remove the comment but I think the benchmark is still good to do: different ARM implementations perform very differently. I assume Tim measured on Apple hardware? We can try out the ones we have on hand here to make sure the results are the same. |
I'd remove the comment but I think the benchmark is still good to do: different ARM implementations perform very differently. I assume Tim measured on Apple hardware? We can try out the ones we have on hand here to make sure the results are the same.
Yep, it was just a very quick trial on an iPad I had lying around.
It's definitely worth checking on other hardware (and in more
realistic situations than a tight DMB/ISB loop) before dismissing the
idea completely.
The new patch looks fine to me, though. Those are questions for another day.
Cheers.
Tim.
Sure, though we don't actually care about that as far as I know (the primary record of review is llvm-commits, and I'd quite happily accept one of my own revisions once someone had said OK).
Tim.
this could be left out