This is an archive of the discontinued LLVM Phabricator instance.

Further relax the constraint on atomics in MemoryDependencyAnalysis.cpp
ClosedPublic

Authored by morisset on Aug 11 2014, 10:46 AM.

Download Raw Diff

Details

Reviewers

reames
jfb

Summary

This is the patch 3/3 resulting from the split of D4797.

Even loads/stores that have a stronger ordering than monotonic can be safe.
The rule is no release-acquire pair on the path from the QueryInst, assuming that
the QueryInst is not atomic itself.

I also cleaned up a bit the test DeadStoreElimination/atomic.ll
as I was adding tests to it

Depends on D4797
Depends on D4844

Diff Detail

Event Timeline

morisset updated this revision to Diff 12350.Aug 11 2014, 10:46 AM

morisset retitled this revision from to Further relax the constraint on atomics in MemoryDependencyAnalysis.cpp.

morisset updated this object.

morisset edited the test plan for this revision. (Show Details)

morisset added reviewers: jfb, reames.

morisset added parent revisions: D4844: Add two helper functions: isAtLeastAcquire, isAtLeastRelease, D4797: Relax atomic restrictions on memory dependence analysis.

morisset added a subscriber: Unknown Object (MLST).

The signature of the helpers introduced in D4844 changed, so this
revision needs this fix to use the new one.

Just a rebase since D4797 changed.

The change looks good overall, but I'd like to have @reames review it too.

test/Transforms/DeadStoreElimination/atomic.ll
163	Why not also test load-acq followed by store-rel here and in other places (or the reverse store/load)? It seems like a good sanity check.
test/Transforms/GVN/atomic.ll
81	Drop the extra space.

Comments inline. I am not yet convinced of the correctness.

lib/Analysis/MemoryDependenceAnalysis.cpp
388	I'm still not finding this explanation particularly clear. I'm going to ask you not to commit this until you can justify why this approach is correct. A potentially more intuitive way to explain this is to why this reordering is valid: store x = 0 acquire operator release operation load x < -- okay to use zero This is valid because the first pair of operations and the second pair can both be reordered. i.e. the intermediate state is: acquire operation store x = 0 load x < -- now obviously zero release operation What is "discriminating context"? What does it mean to "clobber"?
438	This is a bit off topic for this review, but I may have spotted a bug here. What if the query instruction is a RMW operation? These can have ordering semantics, but the existing code (from your previous change) would say there's no dependence.
441	I think this check is wrong. (Independent of whether the approach is valid or not.) My reasoning is that, a cst operation is "at least" an acquire. However, it's _also_ a release and reordering these two is not legal: load cst y store x = 0 This would be a violation of a "release operation" ordering as used by C++11. It's unclear whether that would violate the LLVM spec as written, but I think it probably should. Note that if my reasoning is sound, this is actually a problem with the previous patch as well, not just this one.
test/Transforms/DeadStoreElimination/atomic.ll
149	As currently specified in the LangRef, I don't think this is legal. If I'm reading the spec right, the load must be considered both an Acquire and a Release. (This is possibly not intended, but seems to follow from the wording.) As a result, this is not a acquire/release pair, but both a acquire/release and release/acquire pair. Finding a case where this is observable would be hard, but it seems possible. If need be, I'll spend some time thinking about it.
test/Transforms/GVN/atomic.ll
46	I don't believe this is valid.
94	This is valid, but not necessarily for the reason you gave. %y can move above the release. %x can move below the acquire. Once this happens, %x & %y can be commoned.
106	CHECK-LABEL please

Thank you very much for the careful reviews !

Inline comments below with answers to your comments.

lib/Analysis/MemoryDependenceAnalysis.cpp
388	The reordering approach is not enough to explain this optimisation, as there is no way of hoisting the store of x back above the acquire operation in the example you gave. I will try to give a more detailed explanation below, if you find it clearer I will put it in the comments. In the following code: store x = 0 release operation (1) acquire operation (4) %val = load x it is not okay to replace %val by 0, because another thread may be doing: acquire operation (2) store x = 42 release operation (3) And if the program ensures that 1 synchronizes with 2, and 3 with 4 then this code is correct and %val ends up being 42 and not 0. A key property of the above program, is that if either (1) or (4) are absent, there would be a race between the store x = 42 and either the original store or the subsequent load (so the whole program is undefined). It can be shown (mostly through an excruciatingly boring case analysis) that every such program where the optimisation is visible needs such a pair of release-acquire pair for synchronisation between the threads or is racy. (Discriminating context means "any number of threads which would make this optimisation visible if they were running concurrently with that one". I should indeed remove it as it seems unclear).
438	Indeed, thank you very much for find this.. it is incredible how hard to get right these things appear to be. I will send another patch shortly fixing it. I would suggest making isAtomic()/isSimple() methods of all instructions (just return false/true respectively for the instructions that do not override it) and just checking that. In this way There would be much less risk in the future of forgetting one such case (there is also CmpXchg for example). Does this sound reasonable ?
441	I am pretty sure a cst operation only behaves as a release if it stores something (and as an acquire only if it loads something). For example, the 29.3 section starts by defining the memory orders in this way: [..] and memory_order_seq_cst: a store operation performs a release operation on the affected memory location. This is also true in the formalisation of Batty&al: a synchronizes-with relation can only be from a store or fence to a load or fence [http://www.cl.cam.ac.uk/~pes20/cpp/] So for this purpose, seq_cst loads do behave mostly like acquire loads (the difference is the extra total order on all seq_cst operations, that is irrelevant for the reasoning in this patch).
test/Transforms/DeadStoreElimination/atomic.ll
149	In the LangRef, just under the first occurence of memory_order_seq_cst I see: "In addition to the guarantees of acq_rel (acquire for an operation which only reads, release for an operation which only writes)" It seems pretty clear to me that it follows the C++11 standard, and that a load cannot be considered a release, even if it is seq_cst. If you see a case where this optimisation would be observable, I am extremely interested in it.
163	I agree, that was intended to be covered by the previous test, but it cannot hurt to add another one, I will do it.
test/Transforms/GVN/atomic.ll
46	For this optimisation to be invalid, @y would have to change between the two loads. Such a store to @y would necessarily race with the first load (to %x) and make the whole program undefined. So my argument is that this is correct because it can only be observed by racy programs.
81	Yes, I didn't notice it.
106	Sure, sorry about forgetting that.

comments inline.

lib/Analysis/MemoryDependenceAnalysis.cpp
388	I do find your explanation more clear. Putting that in a comment or documentation would be a good idea. This gives the intuition for the approach. I'd avoid the term "discriminating context". It's extra jargon with no real gain. FYI, when I think about the legality of the optimization in terms of reordering, I tend to think "if I did this, can I do that?" It doesn't necessarily imply that I actually need to perform the reordering. It simply means that I could and thus it is legal to do the optimization. Now, potentially I might get myself in trouble by assuming two conflicting reorderings, but I've never actually run into that. (yet) My experience is that if I can't find some series of valid reorderings that could enable an optimization, it usually isn't actually correct. :)
438	This approach seems reasonable to me. Alternately, you could use mayReadMemory() as your generic fallback check.
441	You're right. I went and checked the spec and my interpretation was incorrect.
test/Transforms/DeadStoreElimination/atomic.ll
149	Thanks for pointing out that wording. I was looking in acq_rel. Any chance you could move that to be under acq_rel? It seems odd to have a later order defining a previous one. For this particular example, it's only valid because we know @x and @y are distinct locations. If we didn't know that, we couldn't remove the first store without changing %x, and thus the observable value.
test/Transforms/GVN/atomic.ll
46	I had to think about this one a bit. :) I can't find an argument to refute your racy program one, but I find that answer disturbing. From a practical software engineering perspective, the answer "oh, we corrupted all of the runtime state in hard to explain ways because you had one race somewhere" is a bit hard to swallow. I will admit it's what the c++11 standard says though. I'm curious, do you believe the answer changes if both loads are monotonic? If not, this would seem to imply that LLVM can't implement the Java memory model correctly. The JMM does specify limited semantics for racy programs.

I will update this patch with the clearer comment/documentation ASAP.

Comments inline about the other points of discussion.

lib/Analysis/MemoryDependenceAnalysis.cpp
438	mayReadOrWriteMemory seems like the perfect fallback indeed, I will send a patch using it.
test/Transforms/DeadStoreElimination/atomic.ll
149	It is not under the acq_rel section, because it is impossible to have an acq_rel load or store, only RMW/CmpXchg operations can be acq_rel. For the non-aliasing condition I agree, but that is a completely unrelated check: it also applies even if y was non-atomic (and is correctly checked by MemoryDependency).
test/Transforms/GVN/atomic.ll
46	About the practical software engineering issue, it could maybe be mitigated by warnings (although I am not sure how exactly to give helpful warnings for this) and by the use of thread sanitizer. Do you think a discussion on LLVM-dev would be warranted about the topic of how aggressively/non-intuitively we may optimize code ? I admit I was focusing only on the correctness with regards to the standard and not necessarily with regards to the programmer's expectations. On the other hand, the programmer is not supposed to use atomics if he does not know what he's doing... (and he will probably have a bad time otherwise even if the compiler is not aggressively optimizing) This specific argument does not work if the loads are monotonic (as races involving relaxed accesses are well-defined in C++11). About the question of whether it is true for another reason, I don't know. We have a paper under review (we = Viktor Vafeiadis and Thibault Balabonski mostly) about what kind of optimizations are possible on atomics themselves (such as would be the case here if the loads are monotonic). I do not remember what it says about this specific situation (and don't have a copy of it with me right now), but we generally found that almost every optimization (even seemingly "obviously true" ones) break horrendously in horrifyingly subtle and evil ways as soon as they involve monotonic/relaxed accesses. So I plan to avoid messing with these kinds of accesses, at least until/unless their semantics is modified/fixed by the C++ committee. Java accesses are "unordered" from what I understand. It is because I have no idea what their exact semantics is (especially in the LLVM framework based on C11) that I picked isSimple() in the previous patch (although I had forgotten them in the beginning).

Mostly add to the big comment, and improve a bit the tests.

LGTM. I'm still not 100% comfortable with this, but I've been unable to find a counter example and arguments given are solid. I don't want to hold this up any further.

reames accepted this revision.Aug 29 2014, 11:45 AM

reames edited edge metadata.

This revision is now accepted and ready to land.Aug 29 2014, 11:45 AM

r216771

Revision Contents

Path

Size

lib/

Analysis/

MemoryDependenceAnalysis.cpp

32 lines

test/

Transforms/

DeadStoreElimination/

atomic.ll

98 lines

GVN/

atomic.ll

34 lines

Diff 12425

lib/Analysis/MemoryDependenceAnalysis.cpp

Show First 20 Lines • Show All 364 Lines • ▼ Show 20 Lines
getPointerDependencyFrom(const AliasAnalysis::Location &MemLoc, bool isLoad,		getPointerDependencyFrom(const AliasAnalysis::Location &MemLoc, bool isLoad,
BasicBlock::iterator ScanIt, BasicBlock *BB,		BasicBlock::iterator ScanIt, BasicBlock *BB,
Instruction *QueryInst) {		Instruction *QueryInst) {

const Value *MemLocBase = nullptr;		const Value *MemLocBase = nullptr;
int64_t MemLocOffset = 0;		int64_t MemLocOffset = 0;
unsigned Limit = BlockScanLimit;		unsigned Limit = BlockScanLimit;
bool isInvariantLoad = false;		bool isInvariantLoad = false;

		// We must be careful with atomic accesses, as they may allow another thread
		// to touch this location, cloberring it. We are conservative: if the
		// QueryInst is not a simple (non-atomic) memory access, we automatically
		// return getClobber.
		// If it is simple, we know based on the results of
		// "Compiler testing via a theory of sound optimisations in the C11/C++11
		// memory model" in PLDI 2013, that a non-atomic location can only be
		// clobbered between a pair of a release and an acquire action, with no
		// access to the location in between.
		// The general idea is that for a discriminating context (i.e. that accesses
		// MemLoc) to not be racy, it must synchronize with both the access being
		// optimized and the previous one, which requires respectively an acquire
		// and a release on this thread.
		bool HasSeenAcquire = false;

		reamesUnsubmitted Not Done Reply Inline Actions I'm still not finding this explanation particularly clear. I'm going to ask you not to commit this until you can justify why this approach is correct. A potentially more intuitive way to explain this is to why this reordering is valid: store x = 0 acquire operator release operation load x < -- okay to use zero This is valid because the first pair of operations and the second pair can both be reordered. i.e. the intermediate state is: acquire operation store x = 0 load x < -- now obviously zero release operation What is "discriminating context"? What does it mean to "clobber"? reames: I'm still not finding this explanation particularly clear. I'm going to ask you not to commit…
		morissetAuthorUnsubmitted Not Done Reply Inline Actions The reordering approach is not enough to explain this optimisation, as there is no way of hoisting the store of x back above the acquire operation in the example you gave. I will try to give a more detailed explanation below, if you find it clearer I will put it in the comments. In the following code: store x = 0 release operation (1) acquire operation (4) %val = load x it is not okay to replace %val by 0, because another thread may be doing: acquire operation (2) store x = 42 release operation (3) And if the program ensures that 1 synchronizes with 2, and 3 with 4 then this code is correct and %val ends up being 42 and not 0. A key property of the above program, is that if either (1) or (4) are absent, there would be a race between the store x = 42 and either the original store or the subsequent load (so the whole program is undefined). It can be shown (mostly through an excruciatingly boring case analysis) that every such program where the optimisation is visible needs such a pair of release-acquire pair for synchronisation between the threads or is racy. (Discriminating context means "any number of threads which would make this optimisation visible if they were running concurrently with that one". I should indeed remove it as it seems unclear). morisset: The reordering approach is not enough to explain this optimisation, as there is no way of…
		reamesUnsubmitted Not Done Reply Inline Actions I do find your explanation more clear. Putting that in a comment or documentation would be a good idea. This gives the intuition for the approach. I'd avoid the term "discriminating context". It's extra jargon with no real gain. FYI, when I think about the legality of the optimization in terms of reordering, I tend to think "if I did this, can I do that?" It doesn't necessarily imply that I actually need to perform the reordering. It simply means that I could and thus it is legal to do the optimization. Now, potentially I might get myself in trouble by assuming two conflicting reorderings, but I've never actually run into that. (yet) My experience is that if I can't find some series of valid reorderings that could enable an optimization, it usually isn't actually correct. :) reames: I do find your explanation more clear. Putting that in a comment or documentation would be a…
if (isLoad && QueryInst) {		if (isLoad && QueryInst) {
LoadInst *LI = dyn_cast<LoadInst>(QueryInst);		LoadInst *LI = dyn_cast<LoadInst>(QueryInst);
if (LI && LI->getMetadata(LLVMContext::MD_invariant_load) != nullptr)		if (LI && LI->getMetadata(LLVMContext::MD_invariant_load) != nullptr)
isInvariantLoad = true;		isInvariantLoad = true;
}		}

// Walk backwards through the basic block, looking for dependencies.		// Walk backwards through the basic block, looking for dependencies.
while (ScanIt != BB->begin()) {		while (ScanIt != BB->begin()) {
Show All 23 Lines	if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {
continue;		continue;
}		}
}		}

// Values depend on loads if the pointers are must aliased. This means that		// Values depend on loads if the pointers are must aliased. This means that
// a load depends on another must aliased load from the same value.		// a load depends on another must aliased load from the same value.
if (LoadInst *LI = dyn_cast<LoadInst>(Inst)) {		if (LoadInst *LI = dyn_cast<LoadInst>(Inst)) {
// Atomic loads have complications involved.		// Atomic loads have complications involved.
// A monotonic load is OK if the query inst is itself not atomic.		// A Monotonic (or higher) load is OK if the query inst is itself not atomic.
		// An Acquire (or higher) load sets the HasSeenAcquire flag, so that any
		// release store will know to return getClobber.
// FIXME: This is overly conservative.		// FIXME: This is overly conservative.
if (!LI->isUnordered()) {		if (!LI->isUnordered()) {
if (!QueryInst \|\| LI->getOrdering() != Monotonic)		if (!QueryInst)
return MemDepResult::getClobber(LI);		return MemDepResult::getClobber(LI);
if (auto *QueryLI = dyn_cast<LoadInst>(QueryInst))		if (auto *QueryLI = dyn_cast<LoadInst>(QueryInst))
if (!QueryLI->isUnordered())		if (!QueryLI->isUnordered())
return MemDepResult::getClobber(LI);		return MemDepResult::getClobber(LI);
if (auto *QuerySI = dyn_cast<StoreInst>(QueryInst))		if (auto *QuerySI = dyn_cast<StoreInst>(QueryInst))
		reamesUnsubmitted Not Done Reply Inline Actions This is a bit off topic for this review, but I may have spotted a bug here. What if the query instruction is a RMW operation? These can have ordering semantics, but the existing code (from your previous change) would say there's no dependence. reames: This is a bit off topic for this review, but I may have spotted a bug here. What if the query…
		morissetAuthorUnsubmitted Not Done Reply Inline Actions Indeed, thank you very much for find this.. it is incredible how hard to get right these things appear to be. I will send another patch shortly fixing it. I would suggest making isAtomic()/isSimple() methods of all instructions (just return false/true respectively for the instructions that do not override it) and just checking that. In this way There would be much less risk in the future of forgetting one such case (there is also CmpXchg for example). Does this sound reasonable ? morisset: Indeed, thank you very much for find this.. it is incredible how hard to get right these things…
		reamesUnsubmitted Not Done Reply Inline Actions This approach seems reasonable to me. Alternately, you could use mayReadMemory() as your generic fallback check. reames: This approach seems reasonable to me. Alternately, you could use mayReadMemory() as your…
		morissetAuthorUnsubmitted Not Done Reply Inline Actions mayReadOrWriteMemory seems like the perfect fallback indeed, I will send a patch using it. morisset: mayReadOrWriteMemory seems like the perfect fallback indeed, I will send a patch using it.
if (!QuerySI->isUnordered())		if (!QuerySI->isUnordered())
return MemDepResult::getClobber(LI);		return MemDepResult::getClobber(LI);
		if (isAtLeastAcquire(LI->getOrdering()))
		reamesUnsubmitted Not Done Reply Inline Actions I think this check is wrong. (Independent of whether the approach is valid or not.) My reasoning is that, a cst operation is "at least" an acquire. However, it's _also_ a release and reordering these two is not legal: load cst y store x = 0 This would be a violation of a "release operation" ordering as used by C++11. It's unclear whether that would violate the LLVM spec as written, but I think it probably should. Note that if my reasoning is sound, this is actually a problem with the previous patch as well, not just this one. reames: I think this check is wrong. (Independent of whether the approach is valid or not.) My…
		morissetAuthorUnsubmitted Not Done Reply Inline Actions I am pretty sure a cst operation only behaves as a release if it stores something (and as an acquire only if it loads something). For example, the 29.3 section starts by defining the memory orders in this way: [..] and memory_order_seq_cst: a store operation performs a release operation on the affected memory location. This is also true in the formalisation of Batty&al: a synchronizes-with relation can only be from a store or fence to a load or fence [http://www.cl.cam.ac.uk/~pes20/cpp/] So for this purpose, seq_cst loads do behave mostly like acquire loads (the difference is the extra total order on all seq_cst operations, that is irrelevant for the reasoning in this patch). morisset: I am pretty sure a cst operation only behaves as a release if it stores something (and as an…
		reamesUnsubmitted Not Done Reply Inline Actions You're right. I went and checked the spec and my interpretation was incorrect. reames: You're right. I went and checked the spec and my interpretation was incorrect.
		HasSeenAcquire = true;
}		}

AliasAnalysis::Location LoadLoc = AA->getLocation(LI);		AliasAnalysis::Location LoadLoc = AA->getLocation(LI);

// If we found a pointer, check if it could be the same as our pointer.		// If we found a pointer, check if it could be the same as our pointer.
AliasAnalysis::AliasResult R = AA->alias(LoadLoc, MemLoc);		AliasAnalysis::AliasResult R = AA->alias(LoadLoc, MemLoc);

if (isLoad) {		if (isLoad) {
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	#endif
continue;		continue;

// Stores depend on may/must aliased loads.		// Stores depend on may/must aliased loads.
return MemDepResult::getDef(Inst);		return MemDepResult::getDef(Inst);
}		}

if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {		if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {
// Atomic stores have complications involved.		// Atomic stores have complications involved.
// A monotonic store is OK if the query inst is itself not atomic.		// A Monotonic store is OK if the query inst is itself not atomic.
		// A Release (or higher) store further requires that no acquire load
		// has been seen.
// FIXME: This is overly conservative.		// FIXME: This is overly conservative.
if (!SI->isUnordered()) {		if (!SI->isUnordered()) {
if (!QueryInst \|\| SI->getOrdering() != Monotonic)		if (!QueryInst)
return MemDepResult::getClobber(SI);		return MemDepResult::getClobber(SI);
if (auto *QueryLI = dyn_cast<LoadInst>(QueryInst))		if (auto *QueryLI = dyn_cast<LoadInst>(QueryInst))
if (!QueryLI->isUnordered())		if (!QueryLI->isUnordered())
return MemDepResult::getClobber(SI);		return MemDepResult::getClobber(SI);
if (auto *QuerySI = dyn_cast<StoreInst>(QueryInst))		if (auto *QuerySI = dyn_cast<StoreInst>(QueryInst))
if (!QuerySI->isUnordered())		if (!QuerySI->isUnordered())
return MemDepResult::getClobber(SI);		return MemDepResult::getClobber(SI);
		if (HasSeenAcquire && isAtLeastRelease(SI->getOrdering()))
		return MemDepResult::getClobber(SI);
}		}

// If alias analysis can tell that this store is guaranteed to not modify		// If alias analysis can tell that this store is guaranteed to not modify
// the query pointer, ignore it. Use getModRefInfo to handle cases where		// the query pointer, ignore it. Use getModRefInfo to handle cases where
// the query pointer points to constant memory etc.		// the query pointer points to constant memory etc.
if (AA->getModRefInfo(SI, MemLoc) == AliasAnalysis::NoModRef)		if (AA->getModRefInfo(SI, MemLoc) == AliasAnalysis::NoModRef)
continue;		continue;

▲ Show 20 Lines • Show All 1,065 Lines • Show Last 20 Lines

test/Transforms/DeadStoreElimination/atomic.ll

	; RUN: opt -basicaa -dse -S < %s \| FileCheck %s			; RUN: opt -basicaa -dse -S < %s \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
	target triple = "x86_64-apple-macosx10.7.0"			target triple = "x86_64-apple-macosx10.7.0"

	; Sanity tests for atomic stores.			; Sanity tests for atomic stores.
	; Note that it turns out essentially every transformation DSE does is legal on			; Note that it turns out essentially every transformation DSE does is legal on
	; atomic ops, just some transformations are not allowed across them.			; atomic ops, just some transformations are not allowed across release-acquire pairs.

	@x = common global i32 0, align 4			@x = common global i32 0, align 4
	@y = common global i32 0, align 4			@y = common global i32 0, align 4

	declare void @randomop(i32*)			declare void @randomop(i32*)

	; DSE across unordered store (allowed)			; DSE across unordered store (allowed)
	define void @test1() nounwind uwtable ssp {			define void @test1() {
	; CHECK: test1			; CHECK-LABEL: test1
	; CHECK-NOT: store i32 0			; CHECK-NOT: store i32 0
	; CHECK: store i32 1			; CHECK: store i32 1
	entry:
	store i32 0, i32* @x			store i32 0, i32* @x
	store atomic i32 0, i32* @y unordered, align 4			store atomic i32 0, i32* @y unordered, align 4
	store i32 1, i32* @x			store i32 1, i32* @x
	ret void			ret void
	}			}

	; DSE across seq_cst load (allowed in theory; not implemented ATM)			; DSE across seq_cst load (allowed)
	define i32 @test2() nounwind uwtable ssp {			define i32 @test2() {
	; CHECK: test2			; CHECK-LABEL: test2
	; CHECK: store i32 0			; CHECK-NOT: store i32 0
	; CHECK: store i32 1			; CHECK: store i32 1
	entry:
	store i32 0, i32* @x			store i32 0, i32* @x
	%x = load atomic i32* @y seq_cst, align 4			%x = load atomic i32* @y seq_cst, align 4
	store i32 1, i32* @x			store i32 1, i32* @x
	ret i32 %x			ret i32 %x
	}			}

	; DSE across seq_cst store (store before atomic store must not be removed)			; DSE across seq_cst store (allowed)
	define void @test3() nounwind uwtable ssp {			define void @test3() {
	; CHECK: test3			; CHECK-LABEL: test3
	; CHECK: store i32			; CHECK-NOT: store i32 0
	; CHECK: store atomic i32 2			; CHECK: store atomic i32 2
	entry:
	store i32 0, i32* @x			store i32 0, i32* @x
	store atomic i32 2, i32* @y seq_cst, align 4			store atomic i32 2, i32* @y seq_cst, align 4
	store i32 1, i32* @x			store i32 1, i32* @x
	ret void			ret void
	}			}

	; DSE remove unordered store (allowed)			; DSE remove unordered store (allowed)
	define void @test4() nounwind uwtable ssp {			define void @test4() {
	; CHECK: test4			; CHECK-LABEL: test4
	; CHECK-NOT: store atomic			; CHECK-NOT: store atomic
	; CHECK: store i32 1			; CHECK: store i32 1
	entry:
	store atomic i32 0, i32* @x unordered, align 4			store atomic i32 0, i32* @x unordered, align 4
	store i32 1, i32* @x			store i32 1, i32* @x
	ret void			ret void
	}			}

	; DSE unordered store overwriting non-atomic store (allowed)			; DSE unordered store overwriting non-atomic store (allowed)
	define void @test5() nounwind uwtable ssp {			define void @test5() {
	; CHECK: test5			; CHECK-LABEL: test5
	; CHECK: store atomic i32 1			; CHECK: store atomic i32 1
	entry:
	store i32 0, i32* @x			store i32 0, i32* @x
	store atomic i32 1, i32* @x unordered, align 4			store atomic i32 1, i32* @x unordered, align 4
	ret void			ret void
	}			}

	; DSE no-op unordered atomic store (allowed)			; DSE no-op unordered atomic store (allowed)
	define void @test6() nounwind uwtable ssp {			define void @test6() {
	; CHECK: test6			; CHECK-LABEL: test6
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: ret void			; CHECK: ret void
	entry:
	%x = load atomic i32* @x unordered, align 4			%x = load atomic i32* @x unordered, align 4
	store atomic i32 %x, i32* @x unordered, align 4			store atomic i32 %x, i32* @x unordered, align 4
	ret void			ret void
	}			}

	; DSE seq_cst store (be conservative; DSE doesn't have infrastructure			; DSE seq_cst store (be conservative; DSE doesn't have infrastructure
	; to reason about atomic operations).			; to reason about atomic operations).
	define void @test7() nounwind uwtable ssp {			define void @test7() {
	; CHECK: test7			; CHECK-LABEL: test7
	; CHECK: store atomic			; CHECK: store atomic
	entry:
	%a = alloca i32			%a = alloca i32
	store atomic i32 0, i32* %a seq_cst, align 4			store atomic i32 0, i32* %a seq_cst, align 4
	ret void			ret void
	}			}

	; DSE and seq_cst load (be conservative; DSE doesn't have infrastructure			; DSE and seq_cst load (be conservative; DSE doesn't have infrastructure
	; to reason about atomic operations).			; to reason about atomic operations).
	define i32 @test8() nounwind uwtable ssp {			define i32 @test8() {
	; CHECK: test8			; CHECK-LABEL: test8
	; CHECK: store			; CHECK: store
	; CHECK: load atomic			; CHECK: load atomic
	entry:
	%a = alloca i32			%a = alloca i32
	call void @randomop(i32* %a)			call void @randomop(i32* %a)
	store i32 0, i32* %a, align 4			store i32 0, i32* %a, align 4
	%x = load atomic i32* @x seq_cst, align 4			%x = load atomic i32* @x seq_cst, align 4
	ret i32 %x			ret i32 %x
	}			}

	; DSE across monotonic load (allowed as long as the eliminated store isUnordered)			; DSE across monotonic load (allowed as long as the eliminated store isUnordered)
	define i32 @test9() nounwind uwtable ssp {			define i32 @test9() {
	; CHECK: test9			; CHECK-LABEL: test9
	; CHECK-NOT: store i32 0			; CHECK-NOT: store i32 0
	; CHECK: store i32 1			; CHECK: store i32 1
	entry:
	store i32 0, i32* @x			store i32 0, i32* @x
	%x = load atomic i32* @y monotonic, align 4			%x = load atomic i32* @y monotonic, align 4
	store i32 1, i32* @x			store i32 1, i32* @x
	ret i32 %x			ret i32 %x
	}			}

	; DSE across monotonic store (allowed as long as the eliminated store isUnordered)			; DSE across monotonic store (allowed as long as the eliminated store isUnordered)
	define void @test10() nounwind uwtable ssp {			define void @test10() {
	; CHECK: test10			; CHECK-LABEL: test10
	; CHECK-NOT: store i32 0			; CHECK-NOT: store i32 0
	; CHECK: store i32 1			; CHECK: store i32 1
	entry:
	store i32 0, i32* @x			store i32 0, i32* @x
	store atomic i32 42, i32* @y monotonic, align 4			store atomic i32 42, i32* @y monotonic, align 4
	store i32 1, i32* @x			store i32 1, i32* @x
	ret void			ret void
	}			}

	; DSE across monotonic load (forbidden since the eliminated store is atomic)			; DSE across monotonic load (forbidden since the eliminated store is atomic)
	define i32 @test11() nounwind uwtable ssp {			define i32 @test11() {
	; CHECK: test11			; CHECK-LABEL: test11
	; CHECK: store atomic i32 0			; CHECK: store atomic i32 0
	; CHECK: store atomic i32 1			; CHECK: store atomic i32 1
	entry:
	store atomic i32 0, i32* @x monotonic, align 4			store atomic i32 0, i32* @x monotonic, align 4
	%x = load atomic i32* @y monotonic, align 4			%x = load atomic i32* @y monotonic, align 4
	store atomic i32 1, i32* @x monotonic, align 4			store atomic i32 1, i32* @x monotonic, align 4
	ret i32 %x			ret i32 %x
	}			}

	; DSE across monotonic store (forbidden since the eliminated store is atomic)			; DSE across monotonic store (forbidden since the eliminated store is atomic)
	define void @test12() nounwind uwtable ssp {			define void @test12() {
	; CHECK: test12			; CHECK-LABEL: test12
	; CHECK: store atomic i32 0			; CHECK: store atomic i32 0
	; CHECK: store atomic i32 1			; CHECK: store atomic i32 1
	entry:
	store atomic i32 0, i32* @x monotonic, align 4			store atomic i32 0, i32* @x monotonic, align 4
	store atomic i32 42, i32* @y monotonic, align 4			store atomic i32 42, i32* @y monotonic, align 4
	store atomic i32 1, i32* @x monotonic, align 4			store atomic i32 1, i32* @x monotonic, align 4
	ret void			ret void
	}			}

				; DSE is allowed across a pair of an atomic read and then write.
				define i32 @test13() {
				; CHECK-LABEL: test13
				; CHECK-NOT: store i32 0
				; CHECK: store i32 1
				store i32 0, i32* @x
				reamesUnsubmitted Not Done Reply Inline Actions As currently specified in the LangRef, I don't think this is legal. If I'm reading the spec right, the load must be considered both an Acquire and a Release. (This is possibly not intended, but seems to follow from the wording.) As a result, this is not a acquire/release pair, but both a acquire/release and release/acquire pair. Finding a case where this is observable would be hard, but it seems possible. If need be, I'll spend some time thinking about it. reames: As currently specified in the LangRef, I don't think this is legal. If I'm reading the spec…
				morissetAuthorUnsubmitted Not Done Reply Inline Actions In the LangRef, just under the first occurence of memory_order_seq_cst I see: "In addition to the guarantees of acq_rel (acquire for an operation which only reads, release for an operation which only writes)" It seems pretty clear to me that it follows the C++11 standard, and that a load cannot be considered a release, even if it is seq_cst. If you see a case where this optimisation would be observable, I am extremely interested in it. morisset: In the LangRef, just under the first occurence of memory_order_seq_cst I see: "In addition to…
				reamesUnsubmitted Not Done Reply Inline Actions Thanks for pointing out that wording. I was looking in acq_rel. Any chance you could move that to be under acq_rel? It seems odd to have a later order defining a previous one. For this particular example, it's only valid because we know @x and @y are distinct locations. If we didn't know that, we couldn't remove the first store without changing %x, and thus the observable value. reames: Thanks for pointing out that wording. I was looking in acq_rel. Any chance you could move…
				morissetAuthorUnsubmitted Not Done Reply Inline Actions It is not under the acq_rel section, because it is impossible to have an acq_rel load or store, only RMW/CmpXchg operations can be acq_rel. For the non-aliasing condition I agree, but that is a completely unrelated check: it also applies even if y was non-atomic (and is correctly checked by MemoryDependency). morisset: It is not under the acq_rel section, because it is impossible to have an acq_rel load or store…
				%x = load atomic i32* @y seq_cst, align 4
				store atomic i32 %x, i32* @y seq_cst, align 4
				store i32 1, i32* @x
				ret i32 %x
				}

				; But DSE is not allowed across a release-acquire pair.
				define i32 @test14() {
				; CHECK-LABEL: test14
				; CHECK: store i32 0
				; CHECK: store i32 1
				store i32 0, i32* @x
				store atomic i32 0, i32* @y release, align 4
				%x = load atomic i32* @y acquire, align 4
				jfbUnsubmitted Not Done Reply Inline Actions Why not also test load-acq followed by store-rel here and in other places (or the reverse store/load)? It seems like a good sanity check. jfb: Why not also test load-acq followed by store-rel here and in other places (or the reverse…
				morissetAuthorUnsubmitted Not Done Reply Inline Actions I agree, that was intended to be covered by the previous test, but it cannot hurt to add another one, I will do it. morisset: I agree, that was intended to be covered by the previous test, but it cannot hurt to add…
				store i32 1, i32* @x
				ret i32 %x
				}

test/Transforms/GVN/atomic.ll

	Show All 12 Lines
	entry:			entry:
	%x = load i32* @y			%x = load i32* @y
	store atomic i32 %x, i32* @x unordered, align 4			store atomic i32 %x, i32* @x unordered, align 4
	%y = load i32* @y			%y = load i32* @y
	%z = add i32 %x, %y			%z = add i32 %x, %y
	ret i32 %z			ret i32 %z
	}			}

	; GVN across seq_cst store (allowed in theory; not implemented ATM)			; GVN across seq_cst store (allowed)
	define i32 @test2() nounwind uwtable ssp {			define i32 @test2() nounwind uwtable ssp {
	; CHECK: test2			; CHECK: test2
	; CHECK: add i32 %x, %y			; CHECK: add i32 %x, %x
	entry:			entry:
	%x = load i32* @y			%x = load i32* @y
	store atomic i32 %x, i32* @x seq_cst, align 4			store atomic i32 %x, i32* @x seq_cst, align 4
	%y = load i32* @y			%y = load i32* @y
	%z = add i32 %x, %y			%z = add i32 %x, %y
	ret i32 %z			ret i32 %z
	}			}

	; GVN across unordered load (allowed)			; GVN across unordered load (allowed)
	define i32 @test3() nounwind uwtable ssp {			define i32 @test3() nounwind uwtable ssp {
	; CHECK: test3			; CHECK: test3
	; CHECK: add i32 %x, %x			; CHECK: add i32 %x, %x
	entry:			entry:
	%x = load i32* @y			%x = load i32* @y
	%y = load atomic i32* @x unordered, align 4			%y = load atomic i32* @x unordered, align 4
	%z = load i32* @y			%z = load i32* @y
	%a = add i32 %x, %z			%a = add i32 %x, %z
	%b = add i32 %y, %a			%b = add i32 %y, %a
	ret i32 %b			ret i32 %b
	}			}

	; GVN across acquire load (load after atomic load must not be removed)			; GVN across acquire load (allowed as the original load was not atomic)
				reamesUnsubmitted Not Done Reply Inline Actions I don't believe this is valid. reames: I don't believe this is valid.
				morissetAuthorUnsubmitted Not Done Reply Inline Actions For this optimisation to be invalid, @y would have to change between the two loads. Such a store to @y would necessarily race with the first load (to %x) and make the whole program undefined. So my argument is that this is correct because it can only be observed by racy programs. morisset: For this optimisation to be invalid, @y would have to change between the two loads. Such a…
				reamesUnsubmitted Not Done Reply Inline Actions I had to think about this one a bit. :) I can't find an argument to refute your racy program one, but I find that answer disturbing. From a practical software engineering perspective, the answer "oh, we corrupted all of the runtime state in hard to explain ways because you had one race somewhere" is a bit hard to swallow. I will admit it's what the c++11 standard says though. I'm curious, do you believe the answer changes if both loads are monotonic? If not, this would seem to imply that LLVM can't implement the Java memory model correctly. The JMM does specify limited semantics for racy programs. reames: I had to think about this one a bit. :) I can't find an argument to refute your racy program…
				morissetAuthorUnsubmitted Not Done Reply Inline Actions About the practical software engineering issue, it could maybe be mitigated by warnings (although I am not sure how exactly to give helpful warnings for this) and by the use of thread sanitizer. Do you think a discussion on LLVM-dev would be warranted about the topic of how aggressively/non-intuitively we may optimize code ? I admit I was focusing only on the correctness with regards to the standard and not necessarily with regards to the programmer's expectations. On the other hand, the programmer is not supposed to use atomics if he does not know what he's doing... (and he will probably have a bad time otherwise even if the compiler is not aggressively optimizing) This specific argument does not work if the loads are monotonic (as races involving relaxed accesses are well-defined in C++11). About the question of whether it is true for another reason, I don't know. We have a paper under review (we = Viktor Vafeiadis and Thibault Balabonski mostly) about what kind of optimizations are possible on atomics themselves (such as would be the case here if the loads are monotonic). I do not remember what it says about this specific situation (and don't have a copy of it with me right now), but we generally found that almost every optimization (even seemingly "obviously true" ones) break horrendously in horrifyingly subtle and evil ways as soon as they involve monotonic/relaxed accesses. So I plan to avoid messing with these kinds of accesses, at least until/unless their semantics is modified/fixed by the C++ committee. Java accesses are "unordered" from what I understand. It is because I have no idea what their exact semantics is (especially in the LLVM framework based on C11) that I picked isSimple() in the previous patch (although I had forgotten them in the beginning). morisset: About the practical software engineering issue, it could maybe be mitigated by warnings…
	define i32 @test4() nounwind uwtable ssp {			define i32 @test4() nounwind uwtable ssp {
	; CHECK: test4			; CHECK: test4
	; CHECK: load atomic i32* @x			; CHECK: load atomic i32* @x
	; CHECK: load i32* @y			; CHECK-NOT: load i32* @y
	entry:			entry:
	%x = load i32* @y			%x = load i32* @y
	%y = load atomic i32* @x seq_cst, align 4			%y = load atomic i32* @x seq_cst, align 4
	%x2 = load i32* @y			%x2 = load i32* @y
	%x3 = add i32 %x, %x2			%x3 = add i32 %x, %x2
	%y2 = add i32 %y, %x3			%y2 = add i32 %y, %x3
	ret i32 %y2			ret i32 %y2
	}			}
	Show All 14 Lines
	; CHECK: test6			; CHECK: test6
	; CHECK: load atomic i32* @x unordered			; CHECK: load atomic i32* @x unordered
	entry:			entry:
	%x = load i32* @x			%x = load i32* @x
	%x2 = load atomic i32* @x unordered, align 4			%x2 = load atomic i32* @x unordered, align 4
	%x3 = add i32 %x, %x2			%x3 = add i32 %x, %x2
	ret i32 %x3			ret i32 %x3
	}			}

				jfbUnsubmitted Not Done Reply Inline Actions Drop the extra space. jfb: Drop the extra space.
				morissetAuthorUnsubmitted Not Done Reply Inline Actions Yes, I didn't notice it. morisset: Yes, I didn't notice it.
				; GVN across release-acquire pair (forbidden)
				define i32 @test7() nounwind uwtable ssp {
				; CHECK: test7
				; CHECK: add i32 %x, %y
				entry:
				%x = load i32* @y
				store atomic i32 %x, i32* @x release, align 4
				%w = load atomic i32* @x acquire, align 4
				%y = load i32* @y
				%z = add i32 %x, %y
				ret i32 %z
				}

				reamesUnsubmitted Not Done Reply Inline Actions This is valid, but not necessarily for the reason you gave. %y can move above the release. %x can move below the acquire. Once this happens, %x & %y can be commoned. reames: This is valid, but not necessarily for the reason you gave. %y can move above the release. %x…
				; GVN across acquire-release pair (allowed)
				define i32 @test8() nounwind uwtable ssp {
				; CHECK: test8
				; CHECK: add i32 %x, %x
				entry:
				%x = load i32* @y
				%w = load atomic i32* @x acquire, align 4
				store atomic i32 %x, i32* @x release, align 4
				%y = load i32* @y
				%z = add i32 %x, %y
				ret i32 %z
				}
				reamesUnsubmitted Not Done Reply Inline Actions CHECK-LABEL please reames: CHECK-LABEL please
				morissetAuthorUnsubmitted Not Done Reply Inline Actions Sure, sorry about forgetting that. morisset: Sure, sorry about forgetting that.