This is an archive of the discontinued LLVM Phabricator instance.

Implement emitLeading/TrailingFence in the ARM backend
ClosedPublic

Authored by morisset on Aug 18 2014, 3:08 PM.

Details

Summary

Depends on D4959.

Diff Detail

Event Timeline

morisset updated this revision to Diff 12630.Aug 18 2014, 3:08 PM
morisset retitled this revision from to Implement emitLeading/TrailingFence in the ARM backend.
morisset updated this object.
morisset edited the test plan for this revision. (Show Details)
morisset added a reviewer: jfb.
morisset added subscribers: t.p.northover, Unknown Object (MLST).
morisset added inline comments.Aug 18 2014, 5:29 PM
lib/Target/ARM/ARMISelLowering.cpp
11008

I just found that the test CodeGen/ARM/swift-atomics suggest that this case should fall-through (i.e. that a DMB ishst is correct in this case for Swift processor). I can change it, but I would love to link to some documentation saying so more clearly first, and I cannot seem to find it. Does anyone know where this is documented (or at least can confirm that dmb ishst is a valid leading fence for seq_cst stores)?

jfb edited edge metadata.Aug 20 2014, 10:27 PM

Is there documentation for ARM that you can point at to explain why this is correct? Maybe the ARM barrier litmus tests and cookboook?

lib/Target/ARM/ARMISelLowering.cpp
10985

This should be static, and the Domain parameter an ARM_MB::MemBOpt.

11035

Link to documentation that explains this.

morisset updated this revision to Diff 12807.Aug 21 2014, 2:48 PM
morisset edited edge metadata.

Add comment pointing to documentation per jfb request.

I did not point directly to the ARM documentation or barrier cookbook because
it is expressed in terms of the hardware memory model and the mapping to C++11
is not completely obvious. Instead I have linked the webpage that summarizes
the findings of a research group that spent the last decade formalizing/proving
these kind of things. It itself has links to other more primary sources. Is it
enough?

jfb added a comment.Aug 21 2014, 3:02 PM

The documentation you added looks good to me, so does the change.

Waiting on @t.p.northover (or another Applet) for the swift issue you mentioned above.

Hi Robin,

This patch seems to not do anything. Is this needed to something else? Are these functions going to be used somewhere else? If they are, why not implement it on the same patch? Otherwise, this just adds dead code.

It also makes is impossible for you to test this...

cheers,
--renato

lib/Target/ARM/ARMISelLowering.cpp
10992

www

11002

"Invalid fence" seems less aggressive... :)

11030

idem

lib/Target/ARM/ARMISelLowering.h
422–423

this could be left out

Thanks for the review,

Indeed, this patch is used in D4961. I can fairly easily merge the two patches if you want, I'm still struggling to find the right way of cutting patches (too big and it is hard to review, too small and it is temporarily dead code like here).
I will fix the other things you point out ASAP.

morisset updated this revision to Diff 12869.Aug 22 2014, 5:32 PM

Merges D4961 into this one based on rengolin suggestion that I make the patch
testable in this way.

Also fixed the typos he found.

Here is the new summary of the commit as a result of the merge:

Use target-dependent emitLeading/TrailingFence instead of the target-independent insertLeading/TrailingFence (in AtomicExpandPass)

Fixes two latent bugs:
- There was no fence inserted before expanded seq_cst load (unsound on Power)
- There was only a fence release before seq_cst stores (again unsound, in particular on Power)
It is not even clear if this is correct on ARM swift processors (where release fences are
DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift
as it is not clear whether it is incorrect. I would love to get documentation stating
whether it is correct or not.
These two bugs were not triggered because Power is not (yet) using this pass, and these
behaviours happen to be (mostly?) working on ARM
(although they completely butchered the semantics of the llvm IR).

See:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html
for an example of the problems that can be caused by the second of these bugs.

I couldn't see a way of fixing these in a completely target-independent way without
adding lots of unnecessary fences on ARM, hence the target-dependent parts of this
patch.

This patch implements the new target-dependent parts only for ARM (the default
of not doing anything is enough for AArch64), other architectures will use this
infrastructure in later patches.

Adds Jim Grosbach as a reviewer based on his addition by rengolin to the now defunct D4961.

grosbach edited edge metadata.Sep 2 2014, 4:00 PM

I should defer to Tim Northover on this sort of thing. He's better equipped than I to get into the details of atomics.

rengolin added a reviewer: jmolloy.
rengolin removed a subscriber: t.p.northover.

Same here. I can't see anything wrong with the code, but I might be missing a lot of the atomic context. Maybe James can also comment on that side?

cheers,
--renato

t.p.northover edited edge metadata.Sep 3 2014, 6:53 AM

Hi Robin,

Sorry, this got lost over my holiday. I think it's mostly fine now. Just one very minor style point and a possibly misleading comment.

Cheers.

Tim.

lib/Target/ARM/ARMISelLowering.cpp
10985

Should probably be "makeDMB", for LLVM style.

11036–11037

By my tests, an ISB is even heavier than a DMB, so we probably don't want to encourage this alternative. I'd just skip the comment entirely.

Thanks for the review !

lib/Target/ARM/ARMISelLowering.cpp
10985

Ok, I will do it.

11036–11037

Awesome, I was planning on doing benchmarks to evaluate this possibility; thanks for having done them :-)
I will remove the comment.

jfb added inline comments.Sep 3 2014, 10:46 AM
lib/Target/ARM/ARMISelLowering.cpp
11036–11037

I'd remove the comment but I think the benchmark is still good to do: different ARM implementations perform very differently. I assume Tim measured on Apple hardware? We can try out the ones we have on hand here to make sure the results are the same.

morisset updated this revision to Diff 13221.Sep 3 2014, 11:52 AM
morisset edited edge metadata.

MakeDMB -> makeDMB
Erase comment.

I'd remove the comment but I think the benchmark is still good to do: different ARM implementations perform very differently. I assume Tim measured on Apple hardware? We can try out the ones we have on hand here to make sure the results are the same.

Yep, it was just a very quick trial on an iPad I had lying around.
It's definitely worth checking on other hardware (and in more
realistic situations than a tight DMB/ISB loop) before dismissing the
idea completely.

The new patch looks fine to me, though. Those are questions for another day.

Cheers.

Tim.

Ok, can the revision be accepted then ? Thanks.

t.p.northover accepted this revision.Sep 3 2014, 1:04 PM
t.p.northover edited edge metadata.

Sure, though we don't actually care about that as far as I know (the primary record of review is llvm-commits, and I'd quite happily accept one of my own revisions once someone had said OK).

Tim.

This revision is now accepted and ready to land.Sep 3 2014, 1:04 PM
rengolin accepted this revision.Sep 3 2014, 1:33 PM
rengolin added a reviewer: rengolin.

It's mostly to be able to close the review after commit.

morisset closed this revision.Sep 3 2014, 2:19 PM

r217076
Thanks everyone