This is an archive of the discontinued LLVM Phabricator instance.

Relax atomic restrictions on memory dependence analysis
ClosedPublic

Authored by morisset on Aug 5 2014, 2:43 PM.

Details

Reviewers
jfb
morisset
Summary

Currently, any atomic store/load above unordered kills all information in the
analysis. With this patch, the analysis is only killed by a release store
followed by an acquire load, or a RMW, or a fence
(see reference in the comments for why that is correct)
This appear to only impact DSE and GVN, see tests for examples.

This fixes the second part of http://llvm.org/bugs/show_bug.cgi?id=17281

Diff Detail

Event Timeline

morisset updated this revision to Diff 12210.Aug 5 2014, 2:43 PM
morisset retitled this revision from to Relax atomic restrictions on memory dependence analysis.
morisset updated this object.
morisset edited the test plan for this revision. (Show Details)
morisset added a reviewer: jfb.
morisset added subscribers: dvyukov, kcc, Unknown Object (MLST).
reames added a subscriber: reames.Aug 6 2014, 4:52 PM

It feels like you're trying to go too far too fast here. Given the sensitivity of the topic, I'd would *strongly* request you split this into smaller chunks. I see several separate optimizations here:

  1. "monotonic" does not imply dependence if the addresses are known not alias -- note: your current change doesn't seem to implement the second part of that, which is required for correctness.
  2. "release" does not imply dependence unless there is a following "acquire"
  3. the addition of the atomic ops appears to be addressing a bug? (i.e. do atomics not participate in the ordering at all today?) If so, this should *definitely* be separate.

As a general point, you're clearly thinking in terms of acquire and release fences here. In C++, atomic and release apply to *operations* and as a result, all ordering is respect to specific addresses. It is conservatively correct to order with respect to all addresses, but not required. Just pointing that out.

lib/Analysis/MemoryDependenceAnalysis.cpp
374

A more general statement of this might be: "Ordered (atomic) accesses only need to be preserved if their presence or lack thereof are observable according to the memory model. Based on the results in <paper> we know that ... are observable while ... are not."

I would suggest spelling out the reasoning about why the implied optimization is correct, not the cases where optimizing wouldn't be. (Well, you can and should state both.) As a reviewer, I need that justification to assess your design.

415

This doesn't look right.

I could understand that a monotonic load wouldn't be a dependence, but shouldn't an cst_seq one still have the clobber behaviour?

Also, why would seeing an "release" trigger HasSeenAcquire?

483

I think this could actually be made stronger. If the ordering on the store is a release, but you haven't seen a require, is the following non-atomic load dependent? It doesn't seem like it should be.

morisset updated this revision to Diff 12263.Aug 6 2014, 5:58 PM

Improve patch for atomics in MemoryDependenceAnalysis based on Philip Reames comments

  1. Erase the third part of the patch (dealing with fences/AtomicRMW) as the current code was actually fine after testing (I should have tested this first, mea culpa). Replaced with a comment explaining why the current code works, as it depends on a non-obvious part of the implementation of AliasAnalysis.
  2. Expand the original comment to give the intuition of the paper
  3. Expand the comments for load/store, and change the wording of the condition so that it is clearer what is tested.

Thanks a lot for the comments !
I can still try to split the patch in two if you want, but the third part is gone, and with the
clearer condition it is unclear how I should split the rest: Monotonic accesses are not treated
really specially.

It looks like my update accidentally lost the changes to the test files, I will look at how to fix it.

morisset updated this revision to Diff 12284.Aug 7 2014, 10:32 AM

Trying to make the tests reappear in phabricator.

reames added a comment.Aug 7 2014, 3:39 PM

See comment inline. I ask that you do not submit this (even with the required bug fix) until I explicitly okay it. I need time to a) read the paper and b) try to understand it.

In general, I find these simple rules useful:

  • A release operation can not be reordered w.r.t. any preceding memory operation. It has no limitation w.r.t. following operations.
  • A memory operation can not be reordered w.r.t. a previous acquire. Any memory operation can move after an acquire.
  • Two monotonic or stronger loads or stores can not be reordered unless you can prove doing so doesn't effect the global order for the locations effected. (In addition to normal data dependence rules.)

I've found these two posts useful for understanding the C++ model:
http://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/
http://preshing.com/20120913/acquire-and-release-semantics/

lib/Analysis/MemoryDependenceAnalysis.cpp
417

This is still not correct. Consider this:
load %addr1, monotonic
load %addr2, monotonic

Unless you *know* that addr2 and addr1 are NoAlias, the second load must depend on the first. Consider two threads with this pattern, one which reorders, one which doesn't. The two threads would observe an inconsistent order of writes from a third thread which wrote a series of increasing integers.

From the LangRef monotonic spec: "If one atomic read happens before another atomic read of the same address, the later read must see the same value or a later value in the address’s modification order. *This disallows reordering of monotonic (or stronger) operations on the same address*. If an address is written monotonic-ally by one thread, and other threads monotonic-ally read that address repeatedly, the other threads must eventually see the write. This corresponds to the C++0x/C1x memory_order_relaxed."

To be explicit here, the problem I'm pointing out is *not* with your proposed optimization per se, it's with the removal of the early exit without adding a required change in default behaviour w.r.t aliasing. Getting *this part* right, should be it's own patch.

464

Specifically, this comment is *wrong* for monotonic loads.

As we've discussed yesterday, this patch is indeed wrong as is, and will have to be split in several parts even after fixing. I will do so as soon as possible, in the meantime it is probably not useful for other people to review this version of this patch.

morisset updated this revision to Diff 12348.Aug 11 2014, 10:27 AM

As suggested, I have split this revision in three, and added more careful tests.

This is patch 1/3, only adding support for monotonic accesses, and only when
the QueryInst isUnordered().
(there was a bug if the QueryInst was itself atomic, which is not triggered in the
tests because the passes that use MemoryDependencyAnalysis are themselves conservative
about atomic accesses)

I will post the other two patches as two separate revisions.

Note that this patch alone is enough to fix the main problem in
http://llvm.org/bugs/show_bug.cgi?id=17281

This version looks close to ready. See inline comments. Once you fix those, I'll give one more read through before officially giving an LGTM.

lib/Analysis/MemoryDependenceAnalysis.cpp
414

I originally wrote: "You're still missing the point of my previous example. The problem is that ordered operations have additional *aliasing* requirements. Your code is still not checking for these conditions and thus is still incorrect. "

I believe your code is correct. You're making a fairly subtle point with your checks though. Any two potentially aliased monotonic loads are ordered, but a monotonic and unordered load are not. Even if they alias. Please clarify this in comments. I nearly missed it (as you can see from the comment I left in above.)

I would suggest that you change the aliasing specific comment in the same loop. Keep the comments in sync with the code.

Nitpick: Separate the first "if (!QueryInst || LI->getOrdering() != Monotonic)" into two clauses. Semantically, the checks are unrelated.

423

You need to check if LI isVolatile. LoadInst::isUnordered does this, and you only want to change memory ordering here. I'd suggest having that be it's own top most check for clarity.

test/Transforms/DeadStoreElimination/atomic.ll
131

You could use a couple of other test cases here. In particular, load-value forwarding across monotonic loads and stores would be helpful.

A positive example:
store x = 0
store y monotonic
load x <- can use 0 since doesn't participate in ordering even if x == y at runtime

A negative example:
store x = 0 monotonic
load y monotonic
load x monotonic <-- not safe to forward!

And an ambiguous example:
store x = 0 unordered
load y monotonic
load x monotonic <-- is this correct to forward? I don't know.

morisset updated this revision to Diff 12626.Aug 18 2014, 2:43 PM

Answer to Philip Reames comments

  • add check for volatile (probably unneeded, but I agree that we should be conservative about it).
  • strengthen condition from isUnordered() to isSimple(), as I don't understand well enough Unordered semantics (and it also matches the comment better this way) to be confident in the previous behaviour (thanks a lot for catching that one, I had missed the case Monotonic/Unordered).
  • separate a condition in two.
  • lengthen comment about aliasing and loads
  • add tests in GVN/atomic.ll

LGTM with minor comment clarification.

lib/Analysis/MemoryDependenceAnalysis.cpp
411

You should explain why. This is the heart of the aliasing confusion.

428

This comment is unclear. "in that way"?

morisset accepted this revision.Aug 18 2014, 3:28 PM
morisset added a reviewer: morisset.

Fixed the comments + LGTM in previous comment

This revision is now accepted and ready to land.Aug 18 2014, 3:28 PM
morisset closed this revision.Aug 18 2014, 3:29 PM

commited as r215942 and r215943 (mistake on my side, I was planning to do only one commit, and forgot to rebase before git svn dcommit).