This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
LICM.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
-
atomics.ll
-
scalar-promote.ll

Differential D15592

[LICM] Make store promotion work in the face of unordered atomics
ClosedPublic

Authored by reames on Dec 16 2015, 3:08 PM.

Download Raw Diff

Details

Reviewers

majnemer
sanjoy
jfb
hfinkel

Commits

rGb2bca7e30984: [LICM] Make store promotion work in the face of unordered atomics
rL295015: [LICM] Make store promotion work in the face of unordered atomics

Summary

Extend our store promotion code to deal with unordered atomic accesses. Ordered atomics continue to be unhandled.

Most of the change is straight-forward, the only complicated bit is in the reasoning around mixing of atomic and non-atomic memory access. Rather than trying to reason about the complex semantics in these cases, I simply disallowed promotion when both atomic and non-atomic accesses are present. This is conservatively correct.

It seems really tempting to just promote all access to atomics, but the original accesses might have been conditional. Since we can't lower an arbitrary atomic type, it might not be safe to promote all access to atomic. Consider a loop like the following:
while(b) {

load i128 ...
if (can lower i128 atomic)
  store atomic i128 ...
else
  store i128

}

It could be there's no race on the location and thus the code is perfectly well defined even if we can't lower a i128 atomically. Promoting the non-atomic accesses to atomic would be incorrect.

Diff Detail

Repository: rL LLVM

Event Timeline

reames updated this revision to Diff 43070.Dec 16 2015, 3:08 PM

reames retitled this revision from to [LICM] Make store promotion work in the face of unordered atomics.

reames updated this object.

reames added reviewers: jfb, hfinkel, majnemer, sanjoy.

reames added a subscriber: llvm-commits.

Can you add tests which validates that:

Atomics in a loop aren't moved past a signal fence.
Volatile atomics aren't affected.

Also, parts of the C++ standards committee thinks that the standard should "discourage non-normatively aggressive optimizations, e.g across large or unbounded loops" in the context of p0062r0. There isn't full consensus on this yet, but I'd be cautious about doing anything too aggressive to relaxed (a.k.a. monotonic) accesses for now. It would be good to also add a test for the example in p0062r0 and make sure it doesn't get optimized for now. I'll ping Hans to see what he thinks.

lib/Transforms/Scalar/LICM.cpp
939 ↗	(On Diff #43070)	I'm being a bit paranoid here (since this shouldn't happen because you prevent mixing of atomic / non-atomic), but it would be good to `assert((Alignment == 0 ? !store->isAtomic() : true) && "atomic alignment can't be 0");` just to make sure that future edits don't introduce bugs.
test/Transforms/LICM/atomics.ll
69 ↗	(On Diff #43070)	Could you add a similar test where the loop does a monotonic load followed by a monotonic store?
77 ↗	(On Diff #43070)	`CHECK-NOT: store`
102 ↗	(On Diff #43070)	Could you also add a test which has: loop: store i32 5, i32* %x %vala = load atomic i32, i32* %y monotonic, align 4 store atomic i32 %vala, i32* %z unordered, align 4 %exitcond = icmp ne i32 %vala, 0 br i1 %exitcond, label %end, label %loop Note the unordered store is now to `%z` which should also be `noalias`.
118 ↗	(On Diff #43070)	Also check that the unordered atomic is still there.

Hans pointed out another issue worth adding a test for: you can't remove all atomic accesses from loops which aren't proven to terminate (or otherwise have I/O, volatile, synchronization):

[intro.multithread] 24:
The implementation may assume that any thread will eventually do one of 
the following:
— terminate,
— make a call to a library I/O function,
— access or modify a volatile object, or
— perform a synchronization operation or an atomic operation.
[ Note: This is intended to allow compiler transformations such as 
removal of empty loops, even when
termination cannot be proven. —end note ]

Regardless of what the standard says, it's desirable to keep atomic stores inside infinite loops, though IIUC that would be OK for "unordered" (a.k.a. Java) atomics, but not "monotonic" (C++ relaxed)? It all boils down to what LLVM's memory model is compared to C++'s, and I'm not clear on this (at the minimum it should be as strict as the language it's honoring).

He also points at this article which he believes would be more profitable profitable to implement (in separate patches). The problem in the past was that LLVM didn't have enough alias information to make this work in common cases, but offers to put us in contact with the author off-list.

JF,

I'm going to respond to your points in more detail at a future point,
but I did want to quickly say that I do not feel it's appropriate to
keep adding tests for every possible combination of ordered atomics
optimizations under the sun. I've gone along so far because the amount
of work has been less than that required to argue the case, but given
I'm working on supporting only the unordered variety, I really don't see
how the ordered ones are relevant beyond a couple of simple tests to
make sure I don't accidentally include ordered where only unordered was
intended.

Philip

Promoting the non-atomic accesses to atomic would be incorrect.

Why?

Hans pointed out another issue worth adding a test for: you can't remove all atomic accesses from loops which aren't proven to terminate (or otherwise have I/O, volatile, synchronization):

...

Also, parts of the C++ standards committee thinks that the standard should "discourage non-normatively aggressive optimizations, e.g across large or unbounded loops" in the context of p0062r0.

Philip, can you comment on the original use cases here? Are there cases where indefinitely postponing the write (because, say, the loop ended up not terminating) would be acceptable?

In D15592#342294, @hfinkel wrote:

Promoting the non-atomic accesses to atomic would be incorrect.

Why?

Per the example I gave, we might not be able to lower a 128 bit atomic, and the code might be conditional on that fact. Inserting an unconditional i128 atomic which can't be lowered which wasn't in the original code in that case seems "questionable". I didn't have a need to do this and the semantics seemed a bit unclear, so it seemed better to avoid.

Philip, can you comment on the original use cases here? Are there cases where indefinitely postponing the write (because, say, the loop ended up not terminating) would be acceptable?

As I said previous, I consider most of JFs comments to be wildly off topic. Particularly, the "unordered" ordering is not related to C++ at all. As specifically documented in the LangRef: "This is intended to provide a guarantee strong enough to model Java’s non-volatile shared variables. " This is exactly the purpose I'm using it for. To the best of my knowledge (which admittedly isn't that extensive here), there is no way to generate an "unordered" instruction from C++ code.

On the topic of infinite loops, I could go either way. In practice, any infinite loop will have a safepoint or deoptimization point within it, so the loop promotion couldn't trigger anyways. Given that, I don't see any reason why we'd need to treat potentially infinite loops specially here. (Again, C++'s rules are not relevant.)

(Hoping to get back to this line of work soon. Distracted with higher priority tasks at the moment.)

In D15592#343574, @reames wrote:

In D15592#342294, @hfinkel wrote:

Promoting the non-atomic accesses to atomic would be incorrect.

Why?

Per the example I gave, we might not be able to lower a 128 bit atomic, and the code might be conditional on that fact. Inserting an unconditional i128 atomic which can't be lowered which wasn't in the original code in that case seems "questionable". I didn't have a need to do this and the semantics seemed a bit unclear, so it seemed better to avoid.

Fair enough, and I'm fine with not doing this. I don't understand your example, however, because if we can't lower the atomic, then the compilation will fail regardless of whether or not the atomic is conditionally executed. We still need to generate code for it if it is present.

Philip, can you comment on the original use cases here? Are there cases where indefinitely postponing the write (because, say, the loop ended up not terminating) would be acceptable?

As I said previous, I consider most of JFs comments to be wildly off topic. Particularly, the "unordered" ordering is not related to C++ at all. As specifically documented in the LangRef: "This is intended to provide a guarantee strong enough to model Java’s non-volatile shared variables. " This is exactly the purpose I'm using it for. To the best of my knowledge (which admittedly isn't that extensive here), there is no way to generate an "unordered" instruction from C++ code.

I agree, the C++ rules here are not strictly relevant.

On the topic of infinite loops, I could go either way. In practice, any infinite loop will have a safepoint or deoptimization point within it, so the loop promotion couldn't trigger anyways. Given that, I don't see any reason why we'd need to treat potentially infinite loops specially here. (Again, C++'s rules are not relevant.)

I understand, and the fact that this can't come up in your environment is interesting. However, from LLVM's perspective, we still need to decide on the semantics here.

(Hoping to get back to this line of work soon. Distracted with higher priority tasks at the moment.)

In D15592#343586, @hfinkel wrote:

In D15592#343574, @reames wrote:

In D15592#342294, @hfinkel wrote:

Promoting the non-atomic accesses to atomic would be incorrect.

Why?

Per the example I gave, we might not be able to lower a 128 bit atomic, and the code might be conditional on that fact. Inserting an unconditional i128 atomic which can't be lowered which wasn't in the original code in that case seems "questionable". I didn't have a need to do this and the semantics seemed a bit unclear, so it seemed better to avoid.

Fair enough, and I'm fine with not doing this. I don't understand your example, however, because if we can't lower the atomic, then the compilation will fail regardless of whether or not the atomic is conditionally executed. We still need to generate code for it if it is present.

My reasoning was that the very next thing the optimizer might do is constant fold the "can lower i128 atomically" branch to false. The original program would then not contain a atomic i128 (i.e. codegen failure) while a program we promoted to atomic i128 would. I'm not sure this is strictly speaking legal since the program is relying on optimizer behaviour for correctness.

On the topic of infinite loops, I could go either way. In practice, any infinite loop will have a safepoint or deoptimization point within it, so the loop promotion couldn't trigger anyways. Given that, I don't see any reason why we'd need to treat potentially infinite loops specially here. (Again, C++'s rules are not relevant.)

I understand, and the fact that this can't come up in your environment is interesting. However, from LLVM's perspective, we still need to decide on the semantics here.

I think the answer should be that "unordered" atomics are treated like normal non-atomic instructions in all ways *except* that the actual memory operation is atomic. Thus, there are no special requirement about infinite loops.

I draw a mental distinction between atomicity, ordering, and visibility. All of our "atomics" are atomic, but not all of them are ordered. The visibility rules are essentially unspecified, but in practice, we have to give the "ordered" forms eventual visibility guarantees since that's what the C++ spec requires.

To say this differently, I really don't think we want the optimizer to be in a position where it has to prove a loop is not dynamically infinite before doing store promotion. That seems like a major complication. If we later find we need a unordered, atomic, but eventually visible store, I'd rather add that explicitly to the IR. Having both forms seems potentially useful.

For a code clarity perspective, should we add a predicate like mustBeEventuallyVisible(AtomicOrdering)? This would give a place to describe the result of this discussion - whatever we eventually settle on - in a way future uses might find. :)

p.s. Lest it be confusing, our needs on this changed in the not too distant past. I was at one point arguing that LLVM should allow infinite loops without side effects. I'm no longer of that belief. So long as every infinite loop has a side effect (for us, a potential safepoint, deoptimization point, or call), then we do not need any special handling for the unordered atomic stores.

On the topic of infinite loops, I could go either way. In practice,

any infinite loop will have a safepoint or deoptimization point within it,
so the loop promotion couldn't trigger anyways. Given that, I don't see
any reason why we'd need to treat potentially infinite loops specially
here. (Again, C++'s rules are not relevant.)

I understand, and the fact that this can't come up in your environment

is interesting. However, from LLVM's perspective, we still need to decide
on the semantics here.

I think the answer should be that "unordered" atomics are treated like
normal non-atomic instructions in all ways *except* that the actual memory
operation is atomic. Thus, there are no special requirement about infinite
loops.

That's totally fine by me. My concern is that C++ relaxed atomics
inadvertently gain the same semantics. Maybe not in your patch, but through
further refactoring. That's why I ask for negative tests on relaxed
atomics: to catch if they ever start doing the wrong thing.

Add requested tests, address comments, and rebase on ToT.

Herald added a subscriber: mcrosier. · View Herald TranscriptMar 9 2016, 2:02 PM

Last update was 3 months ago. Feel free to add me back if you resurrect this review.

Is there anything left to do on this? If not, you should commit this :)

This LGTM!

lib/Transforms/Scalar/LICM.cpp
803 ↗	(On Diff #50193)	clang-format?

This revision is now accepted and ready to land.Feb 13 2017, 12:36 PM

Closed by commit rL295015: [LICM] Make store promotion work in the face of unordered atomics (authored by reames). · Explain WhyFeb 13 2017, 5:50 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

LICM.cpp

32 lines

test/

Transforms/

LICM/

atomics.ll

146 lines

scalar-promote.ll

27 lines

Diff 88288

llvm/trunk/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 911 Lines • ▼ Show 20 Lines	class LoopPromoter : public LoadAndStorePromoter {
SmallPtrSetImpl<Value *> &PointerMustAliases;		SmallPtrSetImpl<Value *> &PointerMustAliases;
SmallVectorImpl<BasicBlock *> &LoopExitBlocks;		SmallVectorImpl<BasicBlock *> &LoopExitBlocks;
SmallVectorImpl<Instruction *> &LoopInsertPts;		SmallVectorImpl<Instruction *> &LoopInsertPts;
PredIteratorCache &PredCache;		PredIteratorCache &PredCache;
AliasSetTracker &AST;		AliasSetTracker &AST;
LoopInfo &LI;		LoopInfo &LI;
DebugLoc DL;		DebugLoc DL;
int Alignment;		int Alignment;
		bool UnorderedAtomic;
AAMDNodes AATags;		AAMDNodes AATags;

Value maybeInsertLCSSAPHI(Value V, BasicBlock *BB) const {		Value maybeInsertLCSSAPHI(Value V, BasicBlock *BB) const {
if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
if (Loop *L = LI.getLoopFor(I->getParent()))		if (Loop *L = LI.getLoopFor(I->getParent()))
if (!L->contains(BB)) {		if (!L->contains(BB)) {
// We need to create an LCSSA PHI node for the incoming value and		// We need to create an LCSSA PHI node for the incoming value and
// store that.		// store that.
PHINode *PN = PHINode::Create(I->getType(), PredCache.size(BB),		PHINode *PN = PHINode::Create(I->getType(), PredCache.size(BB),
I->getName() + ".lcssa", &BB->front());		I->getName() + ".lcssa", &BB->front());
for (BasicBlock *Pred : PredCache.get(BB))		for (BasicBlock *Pred : PredCache.get(BB))
PN->addIncoming(I, Pred);		PN->addIncoming(I, Pred);
return PN;		return PN;
}		}
return V;		return V;
}		}

public:		public:
LoopPromoter(Value SP, ArrayRef<const Instruction > Insts, SSAUpdater &S,		LoopPromoter(Value SP, ArrayRef<const Instruction > Insts, SSAUpdater &S,
SmallPtrSetImpl<Value *> &PMA,		SmallPtrSetImpl<Value *> &PMA,
SmallVectorImpl<BasicBlock *> &LEB,		SmallVectorImpl<BasicBlock *> &LEB,
SmallVectorImpl<Instruction *> &LIP, PredIteratorCache &PIC,		SmallVectorImpl<Instruction *> &LIP, PredIteratorCache &PIC,
AliasSetTracker &ast, LoopInfo &li, DebugLoc dl, int alignment,		AliasSetTracker &ast, LoopInfo &li, DebugLoc dl, int alignment,
const AAMDNodes &AATags)		bool UnorderedAtomic, const AAMDNodes &AATags)
: LoadAndStorePromoter(Insts, S), SomePtr(SP), PointerMustAliases(PMA),		: LoadAndStorePromoter(Insts, S), SomePtr(SP), PointerMustAliases(PMA),
LoopExitBlocks(LEB), LoopInsertPts(LIP), PredCache(PIC), AST(ast),		LoopExitBlocks(LEB), LoopInsertPts(LIP), PredCache(PIC), AST(ast),
LI(li), DL(std::move(dl)), Alignment(alignment), AATags(AATags) {}		LI(li), DL(std::move(dl)), Alignment(alignment),
		UnorderedAtomic(UnorderedAtomic),AATags(AATags) {}

bool isInstInList(Instruction *I,		bool isInstInList(Instruction *I,
const SmallVectorImpl<Instruction *> &) const override {		const SmallVectorImpl<Instruction *> &) const override {
Value *Ptr;		Value *Ptr;
if (LoadInst *LI = dyn_cast<LoadInst>(I))		if (LoadInst *LI = dyn_cast<LoadInst>(I))
Ptr = LI->getOperand(0);		Ptr = LI->getOperand(0);
else		else
Ptr = cast<StoreInst>(I)->getPointerOperand();		Ptr = cast<StoreInst>(I)->getPointerOperand();
return PointerMustAliases.count(Ptr);		return PointerMustAliases.count(Ptr);
}		}

void doExtraRewritesBeforeFinalDeletion() const override {		void doExtraRewritesBeforeFinalDeletion() const override {
// Insert stores after in the loop exit blocks. Each exit block gets a		// Insert stores after in the loop exit blocks. Each exit block gets a
// store of the live-out values that feed them. Since we've already told		// store of the live-out values that feed them. Since we've already told
// the SSA updater about the defs in the loop and the preheader		// the SSA updater about the defs in the loop and the preheader
// definition, it is all set and we can start using it.		// definition, it is all set and we can start using it.
for (unsigned i = 0, e = LoopExitBlocks.size(); i != e; ++i) {		for (unsigned i = 0, e = LoopExitBlocks.size(); i != e; ++i) {
BasicBlock *ExitBlock = LoopExitBlocks[i];		BasicBlock *ExitBlock = LoopExitBlocks[i];
Value *LiveInValue = SSA.GetValueInMiddleOfBlock(ExitBlock);		Value *LiveInValue = SSA.GetValueInMiddleOfBlock(ExitBlock);
LiveInValue = maybeInsertLCSSAPHI(LiveInValue, ExitBlock);		LiveInValue = maybeInsertLCSSAPHI(LiveInValue, ExitBlock);
Value *Ptr = maybeInsertLCSSAPHI(SomePtr, ExitBlock);		Value *Ptr = maybeInsertLCSSAPHI(SomePtr, ExitBlock);
Instruction *InsertPos = LoopInsertPts[i];		Instruction *InsertPos = LoopInsertPts[i];
StoreInst *NewSI = new StoreInst(LiveInValue, Ptr, InsertPos);		StoreInst *NewSI = new StoreInst(LiveInValue, Ptr, InsertPos);
		if (UnorderedAtomic)
		NewSI->setOrdering(AtomicOrdering::Unordered);
NewSI->setAlignment(Alignment);		NewSI->setAlignment(Alignment);
NewSI->setDebugLoc(DL);		NewSI->setDebugLoc(DL);
if (AATags)		if (AATags)
NewSI->setAAMetadata(AATags);		NewSI->setAAMetadata(AATags);
}		}
}		}

void replaceLoadWithValue(LoadInst LI, Value V) const override {		void replaceLoadWithValue(LoadInst LI, Value V) const override {
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	bool llvm::promoteLoopAccessesToScalars(
bool SafeToInsertStore = false;		bool SafeToInsertStore = false;

SmallVector<Instruction *, 64> LoopUses;		SmallVector<Instruction *, 64> LoopUses;
SmallPtrSet<Value *, 4> PointerMustAliases;		SmallPtrSet<Value *, 4> PointerMustAliases;

// We start with an alignment of one and try to find instructions that allow		// We start with an alignment of one and try to find instructions that allow
// us to prove better alignment.		// us to prove better alignment.
unsigned Alignment = 1;		unsigned Alignment = 1;
		// Keep track of which types of access we see
		bool SawUnorderedAtomic = false;
		bool SawNotAtomic = false;
AAMDNodes AATags;		AAMDNodes AATags;

const DataLayout &MDL = Preheader->getModule()->getDataLayout();		const DataLayout &MDL = Preheader->getModule()->getDataLayout();

// Do we know this object does not escape ?		// Do we know this object does not escape ?
bool IsKnownNonEscapingObject = false;		bool IsKnownNonEscapingObject = false;
if (SafetyInfo->MayThrow) {		if (SafetyInfo->MayThrow) {
// If a loop can throw, we have to insert a store along each unwind edge.		// If a loop can throw, we have to insert a store along each unwind edge.
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	for (User *U : ASIV->users()) {
Instruction *UI = dyn_cast<Instruction>(U);		Instruction *UI = dyn_cast<Instruction>(U);
if (!UI \|\| !CurLoop->contains(UI))		if (!UI \|\| !CurLoop->contains(UI))
continue;		continue;

// If there is an non-load/store instruction in the loop, we can't promote		// If there is an non-load/store instruction in the loop, we can't promote
// it.		// it.
if (LoadInst *Load = dyn_cast<LoadInst>(UI)) {		if (LoadInst *Load = dyn_cast<LoadInst>(UI)) {
assert(!Load->isVolatile() && "AST broken");		assert(!Load->isVolatile() && "AST broken");
if (!Load->isSimple())		if (!Load->isUnordered())
return false;		return false;

		SawUnorderedAtomic \|= Load->isAtomic();
		SawNotAtomic \|= !Load->isAtomic();

if (!DereferenceableInPH)		if (!DereferenceableInPH)
DereferenceableInPH = isSafeToExecuteUnconditionally(		DereferenceableInPH = isSafeToExecuteUnconditionally(
*Load, DT, CurLoop, SafetyInfo, ORE, Preheader->getTerminator());		*Load, DT, CurLoop, SafetyInfo, ORE, Preheader->getTerminator());
} else if (const StoreInst *Store = dyn_cast<StoreInst>(UI)) {		} else if (const StoreInst *Store = dyn_cast<StoreInst>(UI)) {
// Stores of the pointer are not interesting, only stores to the		// Stores of the pointer are not interesting, only stores to the
// pointer.		// pointer.
if (UI->getOperand(1) != ASIV)		if (UI->getOperand(1) != ASIV)
continue;		continue;
assert(!Store->isVolatile() && "AST broken");		assert(!Store->isVolatile() && "AST broken");
if (!Store->isSimple())		if (!Store->isUnordered())
return false;		return false;

		SawUnorderedAtomic \|= Store->isAtomic();
		SawNotAtomic \|= !Store->isAtomic();

// If the store is guaranteed to execute, both properties are satisfied.		// If the store is guaranteed to execute, both properties are satisfied.
// We may want to check if a store is guaranteed to execute even if we		// We may want to check if a store is guaranteed to execute even if we
// already know that promotion is safe, since it may have higher		// already know that promotion is safe, since it may have higher
// alignment than any other guaranteed stores, in which case we can		// alignment than any other guaranteed stores, in which case we can
// raise the alignment on the promoted store.		// raise the alignment on the promoted store.
unsigned InstAlignment = Store->getAlignment();		unsigned InstAlignment = Store->getAlignment();
if (!InstAlignment)		if (!InstAlignment)
InstAlignment =		InstAlignment =
Show All 36 Lines	for (User *U : ASIV->users()) {
} else if (AATags) {		} else if (AATags) {
UI->getAAMetadata(AATags, /* Merge = */ true);		UI->getAAMetadata(AATags, /* Merge = */ true);
}		}

LoopUses.push_back(UI);		LoopUses.push_back(UI);
}		}
}		}

		// If we found both an unordered atomic instruction and a non-atomic memory
		// access, bail. We can't blindly promote non-atomic to atomic since we
		// might not be able to lower the result. We can't downgrade since that
		// would violate memory model. Also, align 0 is an error for atomics.
		if (SawUnorderedAtomic && SawNotAtomic)
		return false;

// If we couldn't prove we can hoist the load, bail.		// If we couldn't prove we can hoist the load, bail.
if (!DereferenceableInPH)		if (!DereferenceableInPH)
return false;		return false;

// We know we can hoist the load, but don't have a guaranteed store.		// We know we can hoist the load, but don't have a guaranteed store.
// Check whether the location is thread-local. If it is, then we can insert		// Check whether the location is thread-local. If it is, then we can insert
// stores along paths which originally didn't have them without violating the		// stores along paths which originally didn't have them without violating the
Show All 27 Lines	bool llvm::promoteLoopAccessesToScalars(
// this code just arbitrarily picks a location from one, since any debug		// this code just arbitrarily picks a location from one, since any debug
// location is better than none.		// location is better than none.
DebugLoc DL = LoopUses[0]->getDebugLoc();		DebugLoc DL = LoopUses[0]->getDebugLoc();

// We use the SSAUpdater interface to insert phi nodes as required.		// We use the SSAUpdater interface to insert phi nodes as required.
SmallVector<PHINode *, 16> NewPHIs;		SmallVector<PHINode *, 16> NewPHIs;
SSAUpdater SSA(&NewPHIs);		SSAUpdater SSA(&NewPHIs);
LoopPromoter Promoter(SomePtr, LoopUses, SSA, PointerMustAliases, ExitBlocks,		LoopPromoter Promoter(SomePtr, LoopUses, SSA, PointerMustAliases, ExitBlocks,
InsertPts, PIC, CurAST, LI, DL, Alignment, AATags);		InsertPts, PIC, CurAST, LI, DL, Alignment,
		SawUnorderedAtomic, AATags);

// Set up the preheader to have a definition of the value. It is the live-out		// Set up the preheader to have a definition of the value. It is the live-out
// value from the preheader that uses in the loop will use.		// value from the preheader that uses in the loop will use.
LoadInst *PreheaderLoad = new LoadInst(		LoadInst *PreheaderLoad = new LoadInst(
SomePtr, SomePtr->getName() + ".promoted", Preheader->getTerminator());		SomePtr, SomePtr->getName() + ".promoted", Preheader->getTerminator());
		if (SawUnorderedAtomic)
		PreheaderLoad->setOrdering(AtomicOrdering::Unordered);
PreheaderLoad->setAlignment(Alignment);		PreheaderLoad->setAlignment(Alignment);
PreheaderLoad->setDebugLoc(DL);		PreheaderLoad->setDebugLoc(DL);
if (AATags)		if (AATags)
PreheaderLoad->setAAMetadata(AATags);		PreheaderLoad->setAAMetadata(AATags);
SSA.AddAvailableValue(Preheader, PreheaderLoad);		SSA.AddAvailableValue(Preheader, PreheaderLoad);

// Rewrite all the loads in the loop and remember all the definitions from		// Rewrite all the loads in the loop and remember all the definitions from
// stores in the loop.		// stores in the loop.
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LICM/atomics.ll

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

	end:			end:
	ret i32 %vala			ret i32 %vala
	; CHECK-LABEL: define i32 @test3(			; CHECK-LABEL: define i32 @test3(
	; CHECK: load atomic i32, i32* %x unordered			; CHECK: load atomic i32, i32* %x unordered
	; CHECK-NEXT: br label %loop			; CHECK-NEXT: br label %loop
	}			}

	; Don't try to "sink" unordered stores yet; it is legal, but the machinery			; We can sink an unordered store
	; isn't there.
	define i32 @test4(i32* nocapture noalias %x, i32* nocapture %y) nounwind uwtable ssp {			define i32 @test4(i32* nocapture noalias %x, i32* nocapture %y) nounwind uwtable ssp {
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%vala = load atomic i32, i32* %y monotonic, align 4			%vala = load atomic i32, i32* %y monotonic, align 4
	store atomic i32 %vala, i32* %x unordered, align 4			store atomic i32 %vala, i32* %x unordered, align 4
	%exitcond = icmp ne i32 %vala, 0			%exitcond = icmp ne i32 %vala, 0
	br i1 %exitcond, label %end, label %loop			br i1 %exitcond, label %end, label %loop

	end:			end:
	ret i32 %vala			ret i32 %vala
	; CHECK-LABEL: define i32 @test4(			; CHECK-LABEL: define i32 @test4(
				; CHECK-LABEL: loop:
				; CHECK: load atomic i32, i32* %y monotonic
				; CHECK-NOT: store
				; CHECK-LABEL: end:
				; CHECK-NEXT: %[[LCSSAPHI:.*]] = phi i32 [ %vala
				; CHECK: store atomic i32 %[[LCSSAPHI]], i32* %x unordered, align 4
				}

				; We currently don't handle ordered atomics.
				define i32 @test5(i32* nocapture noalias %x, i32* nocapture %y) nounwind uwtable ssp {
				entry:
				br label %loop

				loop:
				%vala = load atomic i32, i32* %y monotonic, align 4
				store atomic i32 %vala, i32* %x release, align 4
				%exitcond = icmp ne i32 %vala, 0
				br i1 %exitcond, label %end, label %loop

				end:
				ret i32 %vala
				; CHECK-LABEL: define i32 @test5(
	; CHECK: load atomic i32, i32* %y monotonic			; CHECK: load atomic i32, i32* %y monotonic
	; CHECK-NEXT: store atomic			; CHECK-NEXT: store atomic
	}			}

				; We currently don't touch volatiles
				define i32 @test6(i32* nocapture noalias %x, i32* nocapture %y) nounwind uwtable ssp {
				entry:
				br label %loop

				loop:
				%vala = load atomic i32, i32* %y monotonic, align 4
				store volatile i32 %vala, i32* %x, align 4
				%exitcond = icmp ne i32 %vala, 0
				br i1 %exitcond, label %end, label %loop

				end:
				ret i32 %vala
				; CHECK-LABEL: define i32 @test6(
				; CHECK: load atomic i32, i32* %y monotonic
				; CHECK-NEXT: store volatile
				}

				; We currently don't touch volatiles
				define i32 @test6b(i32* nocapture noalias %x, i32* nocapture %y) nounwind uwtable ssp {
				entry:
				br label %loop

				loop:
				%vala = load atomic i32, i32* %y monotonic, align 4
				store atomic volatile i32 %vala, i32* %x unordered, align 4
				%exitcond = icmp ne i32 %vala, 0
				br i1 %exitcond, label %end, label %loop

				end:
				ret i32 %vala
				; CHECK-LABEL: define i32 @test6b(
				; CHECK: load atomic i32, i32* %y monotonic
				; CHECK-NEXT: store atomic volatile
				}

				; Mixing unorder atomics and normal loads/stores is
				; current unimplemented
				define i32 @test7(i32* nocapture noalias %x, i32* nocapture %y) nounwind uwtable ssp {
				entry:
				br label %loop

				loop:
				store i32 5, i32* %x
				%vala = load atomic i32, i32* %y monotonic, align 4
				store atomic i32 %vala, i32* %x unordered, align 4
				%exitcond = icmp ne i32 %vala, 0
				br i1 %exitcond, label %end, label %loop

				end:
				ret i32 %vala
				; CHECK-LABEL: define i32 @test7(
				; CHECK: store i32 5, i32* %x
				; CHECK-NEXT: load atomic i32, i32* %y
				; CHECK-NEXT: store atomic i32
				}

				; Three provably noalias locations - we can sink normal and unordered, but
				; not monotonic
				define i32 @test7b(i32* nocapture noalias %x, i32* nocapture %y, i32* noalias nocapture %z) nounwind uwtable ssp {
				entry:
				br label %loop

				loop:
				store i32 5, i32* %x
				%vala = load atomic i32, i32* %y monotonic, align 4
				store atomic i32 %vala, i32* %z unordered, align 4
				%exitcond = icmp ne i32 %vala, 0
				br i1 %exitcond, label %end, label %loop

				end:
				ret i32 %vala
				; CHECK-LABEL: define i32 @test7b(
				; CHECK: load atomic i32, i32* %y monotonic

				; CHECK-LABEL: end:
				; CHECK: store i32 5, i32* %x
				; CHECK: store atomic i32 %{{.+}}, i32* %z unordered, align 4
				}


				define i32 @test8(i32* nocapture noalias %x, i32* nocapture %y) {
				entry:
				br label %loop

				loop:
				%vala = load atomic i32, i32* %y monotonic, align 4
				store atomic i32 %vala, i32* %x unordered, align 4
				fence release
				%exitcond = icmp ne i32 %vala, 0
				br i1 %exitcond, label %end, label %loop

				end:
				ret i32 %vala
				; CHECK-LABEL: define i32 @test8(
				; CHECK-LABEL: loop:
				; CHECK: load atomic i32, i32* %y monotonic
				; CHECK-NEXT: store atomic
				; CHECK-NEXT: fence
				}

				; Exact semantics of monotonic accesses are a bit vague in the C++ spec,
				; for the moment, be conservative and don't touch them.
				define i32 @test9(i32* nocapture noalias %x, i32* nocapture %y) {
				entry:
				br label %loop

				loop:
				%vala = load atomic i32, i32* %y monotonic, align 4
				store atomic i32 %vala, i32* %x monotonic, align 4
				%exitcond = icmp ne i32 %vala, 0
				br i1 %exitcond, label %end, label %loop

				end:
				ret i32 %vala
				; CHECK-LABEL: define i32 @test9(
				; CHECK-LABEL: loop:
				; CHECK: load atomic i32, i32* %y monotonic
				; CHECK-NEXT: store atomic i32 %vala, i32* %x monotonic, align 4
				}

llvm/trunk/test/Transforms/LICM/scalar-promote.ll

Show First 20 Lines • Show All 372 Lines • ▼ Show 20 Lines	else:
%cond = icmp eq i32 %next, 0		%cond = icmp eq i32 %next, 0
br i1 %cond, label %exit, label %loop		br i1 %cond, label %exit, label %loop

exit:		exit:
%ret = load i32, i32* %notderef		%ret = load i32, i32* %notderef
ret i32 %ret		ret i32 %ret
}		}

		define void @test10(i32 %i) {
		Entry:
		br label %Loop
		; CHECK-LABEL: @test10(
		; CHECK: Entry:
		; CHECK-NEXT: load atomic i32, i32* @X unordered, align 4
		; CHECK-NEXT: br label %Loop


		Loop: ; preds = %Loop, %0
		%j = phi i32 [ 0, %Entry ], [ %Next, %Loop ] ; <i32> [#uses=1]
		%x = load atomic i32, i32* @X unordered, align 4
		%x2 = add i32 %x, 1
		store atomic i32 %x2, i32* @X unordered, align 4
		%Next = add i32 %j, 1
		%cond = icmp eq i32 %Next, 0
		br i1 %cond, label %Out, label %Loop

		Out:
		ret void
		; CHECK: Out:
		; CHECK-NEXT: %[[LCSSAPHI:.*]] = phi i32 [ %x2
		; CHECK-NEXT: store atomic i32 %[[LCSSAPHI]], i32* @X unordered, align 4
		; CHECK-NEXT: ret void

		}

!0 = !{!4, !4, i64 0}		!0 = !{!4, !4, i64 0}
!1 = !{!"omnipotent char", !2}		!1 = !{!"omnipotent char", !2}
!2 = !{!"Simple C/C++ TBAA"}		!2 = !{!"Simple C/C++ TBAA"}
!3 = !{!5, !5, i64 0}		!3 = !{!5, !5, i64 0}
!4 = !{!"int", !1}		!4 = !{!"int", !1}
!5 = !{!"float", !1}		!5 = !{!"float", !1}