Download Raw Diff

Details

Reviewers

jfb
morisset

Summary

Currently, any atomic store/load above unordered kills all information in the
analysis. With this patch, the analysis is only killed by a release store
followed by an acquire load, or a RMW, or a fence
(see reference in the comments for why that is correct)
This appear to only impact DSE and GVN, see tests for examples.

This fixes the second part of http://llvm.org/bugs/show_bug.cgi?id=17281

Diff Detail

Event Timeline

morisset updated this revision to Diff 12210.Aug 5 2014, 2:43 PM

morisset retitled this revision from to Relax atomic restrictions on memory dependence analysis.

morisset updated this object.

morisset edited the test plan for this revision. (Show Details)

morisset added a reviewer: jfb.

morisset added subscribers: dvyukov, kcc, Unknown Object (MLST).

It feels like you're trying to go too far too fast here. Given the sensitivity of the topic, I'd would *strongly* request you split this into smaller chunks. I see several separate optimizations here:

"monotonic" does not imply dependence if the addresses are known not alias -- note: your current change doesn't seem to implement the second part of that, which is required for correctness.
"release" does not imply dependence unless there is a following "acquire"
the addition of the atomic ops appears to be addressing a bug? (i.e. do atomics not participate in the ordering at all today?) If so, this should *definitely* be separate.

As a general point, you're clearly thinking in terms of acquire and release fences here. In C++, atomic and release apply to *operations* and as a result, all ordering is respect to specific addresses. It is conservatively correct to order with respect to all addresses, but not required. Just pointing that out.

lib/Analysis/MemoryDependenceAnalysis.cpp
374	A more general statement of this might be: "Ordered (atomic) accesses only need to be preserved if their presence or lack thereof are observable according to the memory model. Based on the results in <paper> we know that ... are observable while ... are not." I would suggest spelling out the reasoning about why the implied optimization is correct, not the cases where optimizing wouldn't be. (Well, you can and should state both.) As a reviewer, I need that justification to assess your design.
434	This doesn't look right. I could understand that a monotonic load wouldn't be a dependence, but shouldn't an cst_seq one still have the clobber behaviour? Also, why would seeing an "release" trigger HasSeenAcquire?
494	I think this could actually be made stronger. If the ordering on the store is a release, but you haven't seen a require, is the following non-atomic load dependent? It doesn't seem like it should be.

Improve patch for atomics in MemoryDependenceAnalysis based on Philip Reames comments

Erase the third part of the patch (dealing with fences/AtomicRMW) as the current code was actually fine after testing (I should have tested this first, mea culpa). Replaced with a comment explaining why the current code works, as it depends on a non-obvious part of the implementation of AliasAnalysis.
Expand the original comment to give the intuition of the paper
Expand the comments for load/store, and change the wording of the condition so that it is clearer what is tested.

Thanks a lot for the comments !
I can still try to split the patch in two if you want, but the third part is gone, and with the
clearer condition it is unclear how I should split the rest: Monotonic accesses are not treated
really specially.

It looks like my update accidentally lost the changes to the test files, I will look at how to fix it.

Trying to make the tests reappear in phabricator.

See comment inline. I ask that you do not submit this (even with the required bug fix) until I explicitly okay it. I need time to a) read the paper and b) try to understand it.

In general, I find these simple rules useful:

A release operation can not be reordered w.r.t. any preceding memory operation. It has no limitation w.r.t. following operations.
A memory operation can not be reordered w.r.t. a previous acquire. Any memory operation can move after an acquire.
Two monotonic or stronger loads or stores can not be reordered unless you can prove doing so doesn't effect the global order for the locations effected. (In addition to normal data dependence rules.)

I've found these two posts useful for understanding the C++ model:
http://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/
http://preshing.com/20120913/acquire-and-release-semantics/

lib/Analysis/MemoryDependenceAnalysis.cpp
436	This is still not correct. Consider this: load %addr1, monotonic load %addr2, monotonic Unless you know that addr2 and addr1 are NoAlias, the second load must depend on the first. Consider two threads with this pattern, one which reorders, one which doesn't. The two threads would observe an inconsistent order of writes from a third thread which wrote a series of increasing integers. From the LangRef monotonic spec: "If one atomic read happens before another atomic read of the same address, the later read must see the same value or a later value in the address’s modification order. This disallows reordering of monotonic (or stronger) operations on the same address. If an address is written monotonic-ally by one thread, and other threads monotonic-ally read that address repeatedly, the other threads must eventually see the write. This corresponds to the C++0x/C1x memory_order_relaxed." To be explicit here, the problem I'm pointing out is not with your proposed optimization per se, it's with the removal of the early exit without adding a required change in default behaviour w.r.t aliasing. Getting this part right, should be it's own patch.
474	Specifically, this comment is wrong for monotonic loads.

As we've discussed yesterday, this patch is indeed wrong as is, and will have to be split in several parts even after fixing. I will do so as soon as possible, in the meantime it is probably not useful for other people to review this version of this patch.

As suggested, I have split this revision in three, and added more careful tests.

This is patch 1/3, only adding support for monotonic accesses, and only when
the QueryInst isUnordered().
(there was a bug if the QueryInst was itself atomic, which is not triggered in the
tests because the passes that use MemoryDependencyAnalysis are themselves conservative
about atomic accesses)

I will post the other two patches as two separate revisions.

Note that this patch alone is enough to fix the main problem in
http://llvm.org/bugs/show_bug.cgi?id=17281

The two other patches are in
http://reviews.llvm.org/D4844
http://reviews.llvm.org/D4845

This version looks close to ready. See inline comments. Once you fix those, I'll give one more read through before officially giving an LGTM.

lib/Analysis/MemoryDependenceAnalysis.cpp
433	I originally wrote: "You're still missing the point of my previous example. The problem is that ordered operations have additional aliasing requirements. Your code is still not checking for these conditions and thus is still incorrect. " I believe your code is correct. You're making a fairly subtle point with your checks though. Any two potentially aliased monotonic loads are ordered, but a monotonic and unordered load are not. Even if they alias. Please clarify this in comments. I nearly missed it (as you can see from the comment I left in above.) I would suggest that you change the aliasing specific comment in the same loop. Keep the comments in sync with the code. Nitpick: Separate the first "if (!QueryInst \|\| LI->getOrdering() != Monotonic)" into two clauses. Semantically, the checks are unrelated.
442	You need to check if LI isVolatile. LoadInst::isUnordered does this, and you only want to change memory ordering here. I'd suggest having that be it's own top most check for clarity.
test/Transforms/DeadStoreElimination/atomic.ll
131	You could use a couple of other test cases here. In particular, load-value forwarding across monotonic loads and stores would be helpful. A positive example: store x = 0 store y monotonic load x <- can use 0 since doesn't participate in ordering even if x == y at runtime A negative example: store x = 0 monotonic load y monotonic load x monotonic <-- not safe to forward! And an ambiguous example: store x = 0 unordered load y monotonic load x monotonic <-- is this correct to forward? I don't know.

Answer to Philip Reames comments

add check for volatile (probably unneeded, but I agree that we should be conservative about it).
strengthen condition from isUnordered() to isSimple(), as I don't understand well enough Unordered semantics (and it also matches the comment better this way) to be confident in the previous behaviour (thanks a lot for catching that one, I had missed the case Monotonic/Unordered).
separate a condition in two.
lengthen comment about aliasing and loads
add tests in GVN/atomic.ll

LGTM with minor comment clarification.

lib/Analysis/MemoryDependenceAnalysis.cpp
411	You should explain why. This is the heart of the aliasing confusion.
431	This comment is unclear. "in that way"?

Fixed the comments + LGTM in previous comment

This revision is now accepted and ready to land.Aug 18 2014, 3:28 PM

commited as r215942 and r215943 (mistake on my side, I was planning to do only one commit, and forgot to rebase before git svn dcommit).

Diff 12626

lib/Analysis/MemoryDependenceAnalysis.cpp

Show First 20 Lines • Show All 365 Lines • ▼ Show 20 Lines	getPointerDependencyFrom(const AliasAnalysis::Location &MemLoc, bool isLoad,
BasicBlock::iterator ScanIt, BasicBlock *BB,		BasicBlock::iterator ScanIt, BasicBlock *BB,
Instruction *QueryInst) {		Instruction *QueryInst) {

const Value *MemLocBase = nullptr;		const Value *MemLocBase = nullptr;
int64_t MemLocOffset = 0;		int64_t MemLocOffset = 0;
unsigned Limit = BlockScanLimit;		unsigned Limit = BlockScanLimit;
bool isInvariantLoad = false;		bool isInvariantLoad = false;
if (isLoad && QueryInst) {		if (isLoad && QueryInst) {
LoadInst *LI = dyn_cast<LoadInst>(QueryInst);		LoadInst *LI = dyn_cast<LoadInst>(QueryInst);
		reamesUnsubmitted Not Done Reply Inline Actions A more general statement of this might be: "Ordered (atomic) accesses only need to be preserved if their presence or lack thereof are observable according to the memory model. Based on the results in <paper> we know that ... are observable while ... are not." I would suggest spelling out the reasoning about why the implied optimization is correct, not the cases where optimizing wouldn't be. (Well, you can and should state both.) As a reviewer, I need that justification to assess your design. reames: A more general statement of this might be: "Ordered (atomic) accesses only need to be preserved…
if (LI && LI->getMetadata(LLVMContext::MD_invariant_load) != nullptr)		if (LI && LI->getMetadata(LLVMContext::MD_invariant_load) != nullptr)
isInvariantLoad = true;		isInvariantLoad = true;
}		}

// Walk backwards through the basic block, looking for dependencies.		// Walk backwards through the basic block, looking for dependencies.
while (ScanIt != BB->begin()) {		while (ScanIt != BB->begin()) {
Instruction *Inst = --ScanIt;		Instruction *Inst = --ScanIt;

Show All 19 Lines	if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {
MemLoc))		MemLoc))
return MemDepResult::getDef(II);		return MemDepResult::getDef(II);
continue;		continue;
}		}
}		}

// Values depend on loads if the pointers are must aliased. This means that		// Values depend on loads if the pointers are must aliased. This means that
// a load depends on another must aliased load from the same value.		// a load depends on another must aliased load from the same value.
		// One exception is atomic loads: a value can depend on an atomic load that it
		// does not alias with.
		reamesUnsubmitted Not Done Reply Inline Actions You should explain why. This is the heart of the aliasing confusion. reames: You should explain why. This is the heart of the aliasing confusion.
if (LoadInst *LI = dyn_cast<LoadInst>(Inst)) {		if (LoadInst *LI = dyn_cast<LoadInst>(Inst)) {
// Atomic loads have complications involved.		// Atomic loads have complications involved.
		// A monotonic load is OK if the query inst is itself not atomic.
// FIXME: This is overly conservative.		// FIXME: This is overly conservative.
if (!LI->isUnordered())		if (!LI->isUnordered()) {
		if (!QueryInst)
		return MemDepResult::getClobber(LI);
		if (LI->getOrdering() != Monotonic)
		return MemDepResult::getClobber(LI);
		if (auto *QueryLI = dyn_cast<LoadInst>(QueryInst))
		if (!QueryLI->isSimple())
		return MemDepResult::getClobber(LI);
		if (auto *QuerySI = dyn_cast<StoreInst>(QueryInst))
		if (!QuerySI->isSimple())
		return MemDepResult::getClobber(LI);
		}

		// FIXME: this is overly conservative.
		// While volatile access cannot be eliminated, they do not have to kill
		// optimisation in that way.
		reamesUnsubmitted Not Done Reply Inline Actions This comment is unclear. "in that way"? reames: This comment is unclear. "in that way"?
		if (LI->isVolatile())
return MemDepResult::getClobber(LI);		return MemDepResult::getClobber(LI);
		reamesUnsubmitted Not Done Reply Inline Actions I originally wrote: "You're still missing the point of my previous example. The problem is that ordered operations have additional aliasing requirements. Your code is still not checking for these conditions and thus is still incorrect. " I believe your code is correct. You're making a fairly subtle point with your checks though. Any two potentially aliased monotonic loads are ordered, but a monotonic and unordered load are not. Even if they alias. Please clarify this in comments. I nearly missed it (as you can see from the comment I left in above.) I would suggest that you change the aliasing specific comment in the same loop. Keep the comments in sync with the code. Nitpick: Separate the first "if (!QueryInst \|\| LI->getOrdering() != Monotonic)" into two clauses. Semantically, the checks are unrelated. reames: I originally wrote: "You're still missing the point of my previous example. The problem is…

		reamesUnsubmitted Not Done Reply Inline Actions This doesn't look right. I could understand that a monotonic load wouldn't be a dependence, but shouldn't an cst_seq one still have the clobber behaviour? Also, why would seeing an "release" trigger HasSeenAcquire? reames: This doesn't look right. I could understand that a monotonic load wouldn't be a dependence…
AliasAnalysis::Location LoadLoc = AA->getLocation(LI);		AliasAnalysis::Location LoadLoc = AA->getLocation(LI);

		reamesUnsubmitted Not Done Reply Inline Actions This is still not correct. Consider this: load %addr1, monotonic load %addr2, monotonic Unless you know that addr2 and addr1 are NoAlias, the second load must depend on the first. Consider two threads with this pattern, one which reorders, one which doesn't. The two threads would observe an inconsistent order of writes from a third thread which wrote a series of increasing integers. From the LangRef monotonic spec: "If one atomic read happens before another atomic read of the same address, the later read must see the same value or a later value in the address’s modification order. This disallows reordering of monotonic (or stronger) operations on the same address. If an address is written monotonic-ally by one thread, and other threads monotonic-ally read that address repeatedly, the other threads must eventually see the write. This corresponds to the C++0x/C1x memory_order_relaxed." To be explicit here, the problem I'm pointing out is not with your proposed optimization per se, it's with the removal of the early exit without adding a required change in default behaviour w.r.t aliasing. Getting this part right, should be it's own patch. reames: This is still not correct. Consider this: load %addr1, monotonic load %addr2, monotonic…
// If we found a pointer, check if it could be the same as our pointer.		// If we found a pointer, check if it could be the same as our pointer.
AliasAnalysis::AliasResult R = AA->alias(LoadLoc, MemLoc);		AliasAnalysis::AliasResult R = AA->alias(LoadLoc, MemLoc);

if (isLoad) {		if (isLoad) {
if (R == AliasAnalysis::NoAlias) {		if (R == AliasAnalysis::NoAlias) {
// If this is an over-aligned integer load (for example,		// If this is an over-aligned integer load (for example,
		reamesUnsubmitted Not Done Reply Inline Actions You need to check if LI isVolatile. LoadInst::isUnordered does this, and you only want to change memory ordering here. I'd suggest having that be it's own top most check for clarity. reames: You need to check if LI isVolatile. LoadInst::isUnordered does this, and you only want to…
// "load i8* %P, align 4") see if it would obviously overlap with the		// "load i8* %P, align 4") see if it would obviously overlap with the
// queried location if widened to a larger load (e.g. if the queried		// queried location if widened to a larger load (e.g. if the queried
// location is 1 byte at P+1). If so, return it as a load/load		// location is 1 byte at P+1). If so, return it as a load/load
// clobber result, allowing the client to decide to widen the load if		// clobber result, allowing the client to decide to widen the load if
// it wants to.		// it wants to.
if (IntegerType *ITy = dyn_cast<IntegerType>(LI->getType()))		if (IntegerType *ITy = dyn_cast<IntegerType>(LI->getType()))
if (LI->getAlignment()*8 > ITy->getPrimitiveSizeInBits() &&		if (LI->getAlignment()*8 > ITy->getPrimitiveSizeInBits() &&
isLoadLoadClobberIfExtendedToFullWidth(MemLoc, MemLocBase,		isLoadLoadClobberIfExtendedToFullWidth(MemLoc, MemLocBase,
Show All 15 Lines	#if 0 // FIXME: Temporarily disabled. GVN is cleverly rewriting loads
// If we have a partial alias, then return this as a clobber for the		// If we have a partial alias, then return this as a clobber for the
// client to handle.		// client to handle.
if (R == AliasAnalysis::PartialAlias)		if (R == AliasAnalysis::PartialAlias)
return MemDepResult::getClobber(Inst);		return MemDepResult::getClobber(Inst);
#endif		#endif

// Random may-alias loads don't depend on each other without a		// Random may-alias loads don't depend on each other without a
// dependence.		// dependence.
continue;		continue;
		reamesUnsubmitted Not Done Reply Inline Actions Specifically, this comment is wrong for monotonic loads. reames: Specifically, this comment is wrong for monotonic loads.
}		}

// Stores don't depend on other no-aliased accesses.		// Stores don't depend on other no-aliased accesses.
if (R == AliasAnalysis::NoAlias)		if (R == AliasAnalysis::NoAlias)
continue;		continue;

// Stores don't alias loads from read-only memory.		// Stores don't alias loads from read-only memory.
if (AA->pointsToConstantMemory(LoadLoc))		if (AA->pointsToConstantMemory(LoadLoc))
continue;		continue;

// Stores depend on may/must aliased loads.		// Stores depend on may/must aliased loads.
return MemDepResult::getDef(Inst);		return MemDepResult::getDef(Inst);
}		}

if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {		if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {
// Atomic stores have complications involved.		// Atomic stores have complications involved.
		// A monotonic store is OK if the query inst is itself not atomic.
// FIXME: This is overly conservative.		// FIXME: This is overly conservative.
if (!SI->isUnordered())		if (!SI->isUnordered()) {
		if (!QueryInst)
		reamesUnsubmitted Not Done Reply Inline Actions I think this could actually be made stronger. If the ordering on the store is a release, but you haven't seen a require, is the following non-atomic load dependent? It doesn't seem like it should be. reames: I think this could actually be made stronger. If the ordering on the store is a release, but…
		return MemDepResult::getClobber(SI);
		if (SI->getOrdering() != Monotonic)
		return MemDepResult::getClobber(SI);
		if (auto *QueryLI = dyn_cast<LoadInst>(QueryInst))
		if (!QueryLI->isSimple())
		return MemDepResult::getClobber(SI);
		if (auto *QuerySI = dyn_cast<StoreInst>(QueryInst))
		if (!QuerySI->isSimple())
		return MemDepResult::getClobber(SI);
		}

		// FIXME: this is overly conservative.
		// While volatile access cannot be eliminated, they do not have to kill
		// optimisation in that way.
		if (SI->isVolatile())
return MemDepResult::getClobber(SI);		return MemDepResult::getClobber(SI);

// If alias analysis can tell that this store is guaranteed to not modify		// If alias analysis can tell that this store is guaranteed to not modify
// the query pointer, ignore it. Use getModRefInfo to handle cases where		// the query pointer, ignore it. Use getModRefInfo to handle cases where
// the query pointer points to constant memory etc.		// the query pointer points to constant memory etc.
if (AA->getModRefInfo(SI, MemLoc) == AliasAnalysis::NoModRef)		if (AA->getModRefInfo(SI, MemLoc) == AliasAnalysis::NoModRef)
continue;		continue;

▲ Show 20 Lines • Show All 1,065 Lines • Show Last 20 Lines

test/Transforms/DeadStoreElimination/atomic.ll

	Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
	entry:			entry:
	%a = alloca i32			%a = alloca i32
	call void @randomop(i32* %a)			call void @randomop(i32* %a)
	store i32 0, i32* %a, align 4			store i32 0, i32* %a, align 4
	%x = load atomic i32* @x seq_cst, align 4			%x = load atomic i32* @x seq_cst, align 4
	ret i32 %x			ret i32 %x
	}			}

				; DSE across monotonic load (allowed as long as the eliminated store isUnordered)
				define i32 @test9() nounwind uwtable ssp {
				; CHECK: test9
				; CHECK-NOT: store i32 0
				; CHECK: store i32 1
				entry:
				store i32 0, i32* @x
				%x = load atomic i32* @y monotonic, align 4
				store i32 1, i32* @x
				ret i32 %x
				}

				; DSE across monotonic store (allowed as long as the eliminated store isUnordered)
				define void @test10() nounwind uwtable ssp {
				; CHECK: test10
				; CHECK-NOT: store i32 0
				; CHECK: store i32 1
				entry:
				store i32 0, i32* @x
				store atomic i32 42, i32* @y monotonic, align 4
				store i32 1, i32* @x
				ret void
				}

				reamesUnsubmitted Not Done Reply Inline Actions You could use a couple of other test cases here. In particular, load-value forwarding across monotonic loads and stores would be helpful. A positive example: store x = 0 store y monotonic load x <- can use 0 since doesn't participate in ordering even if x == y at runtime A negative example: store x = 0 monotonic load y monotonic load x monotonic <-- not safe to forward! And an ambiguous example: store x = 0 unordered load y monotonic load x monotonic <-- is this correct to forward? I don't know. reames: You could use a couple of other test cases here. In particular, load-value forwarding across…
				; DSE across monotonic load (forbidden since the eliminated store is atomic)
				define i32 @test11() nounwind uwtable ssp {
				; CHECK: test11
				; CHECK: store atomic i32 0
				; CHECK: store atomic i32 1
				entry:
				store atomic i32 0, i32* @x monotonic, align 4
				%x = load atomic i32* @y monotonic, align 4
				store atomic i32 1, i32* @x monotonic, align 4
				ret i32 %x
				}

				; DSE across monotonic store (forbidden since the eliminated store is atomic)
				define void @test12() nounwind uwtable ssp {
				; CHECK: test12
				; CHECK: store atomic i32 0
				; CHECK: store atomic i32 1
				entry:
				store atomic i32 0, i32* @x monotonic, align 4
				store atomic i32 42, i32* @y monotonic, align 4
				store atomic i32 1, i32* @x monotonic, align 4
				ret void
				}

test/Transforms/GVN/atomic.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; CHECK: test6			; CHECK: test6
	; CHECK: load atomic i32* @x unordered			; CHECK: load atomic i32* @x unordered
	entry:			entry:
	%x = load i32* @x			%x = load i32* @x
	%x2 = load atomic i32* @x unordered, align 4			%x2 = load atomic i32* @x unordered, align 4
	%x3 = add i32 %x, %x2			%x3 = add i32 %x, %x2
	ret i32 %x3			ret i32 %x3
	}			}

				; GVN across monotonic store (allowed)
				define i32 @test7() nounwind uwtable ssp {
				; CHECK: test7
				; CHECK: add i32 %x, %x
				entry:
				%x = load i32* @y
				store atomic i32 %x, i32* @x monotonic, align 4
				%y = load i32* @y
				%z = add i32 %x, %y
				ret i32 %z
				}

				; GVN of an unordered across monotonic load (not allowed)
				define i32 @test8() nounwind uwtable ssp {
				; CHECK: test8
				; CHECK: add i32 %x, %y
				entry:
				%x = load atomic i32* @y unordered, align 4
				%clobber = load atomic i32* @x monotonic, align 4
				%y = load atomic i32* @y monotonic, align 4
				%z = add i32 %x, %y
				ret i32 %z
				}

This is an archive of the discontinued LLVM Phabricator instance.

Relax atomic restrictions on memory dependence analysis
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 12626

lib/Analysis/MemoryDependenceAnalysis.cpp

test/Transforms/DeadStoreElimination/atomic.ll

test/Transforms/GVN/atomic.ll

This is an archive of the discontinued LLVM Phabricator instance.

Relax atomic restrictions on memory dependence analysisClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 12626

lib/Analysis/MemoryDependenceAnalysis.cpp

test/Transforms/DeadStoreElimination/atomic.ll

test/Transforms/GVN/atomic.ll

Relax atomic restrictions on memory dependence analysis
ClosedPublic