This is an archive of the discontinued LLVM Phabricator instance.

lib/Target/ARM/ARMISelLowering.cpp
11024	I just found that the test CodeGen/ARM/swift-atomics suggest that this case should fall-through (i.e. that a DMB ishst is correct in this case for Swift processor). I can change it, but I would love to link to some documentation saying so more clearly first, and I cannot seem to find it. Does anyone know where this is documented (or at least can confirm that dmb ishst is a valid leading fence for seq_cst stores)?

Is there documentation for ARM that you can point at to explain why this is correct? Maybe the ARM barrier litmus tests and cookboook?

lib/Target/ARM/ARMISelLowering.cpp
11001	This should be `static`, and the `Domain` parameter an `ARM_MB::MemBOpt`.
11051	Link to documentation that explains this.

Add comment pointing to documentation per jfb request.

I did not point directly to the ARM documentation or barrier cookbook because
it is expressed in terms of the hardware memory model and the mapping to C++11
is not completely obvious. Instead I have linked the webpage that summarizes
the findings of a research group that spent the last decade formalizing/proving
these kind of things. It itself has links to other more primary sources. Is it
enough?

The documentation you added looks good to me, so does the change.

Waiting on @t.p.northover (or another Applet) for the swift issue you mentioned above.

Hi Robin,

This patch seems to not do anything. Is this needed to something else? Are these functions going to be used somewhere else? If they are, why not implement it on the same patch? Otherwise, this just adds dead code.

It also makes is impossible for you to test this...

cheers,
--renato

lib/Target/ARM/ARMISelLowering.cpp
11008	www
11018	"Invalid fence" seems less aggressive... :)
11046	idem
lib/Target/ARM/ARMISelLowering.h
422–423	this could be left out

Thanks for the review,

Indeed, this patch is used in D4961. I can fairly easily merge the two patches if you want, I'm still struggling to find the right way of cutting patches (too big and it is hard to review, too small and it is temporarily dead code like here).
I will fix the other things you point out ASAP.

Merges D4961 into this one based on rengolin suggestion that I make the patch
testable in this way.

Also fixed the typos he found.

Here is the new summary of the commit as a result of the merge:

Use target-dependent emitLeading/TrailingFence instead of the target-independent insertLeading/TrailingFence (in AtomicExpandPass)

Fixes two latent bugs:
- There was no fence inserted before expanded seq_cst load (unsound on Power)
- There was only a fence release before seq_cst stores (again unsound, in particular on Power)
It is not even clear if this is correct on ARM swift processors (where release fences are
DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift
as it is not clear whether it is incorrect. I would love to get documentation stating
whether it is correct or not.
These two bugs were not triggered because Power is not (yet) using this pass, and these
behaviours happen to be (mostly?) working on ARM
(although they completely butchered the semantics of the llvm IR).

See:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html
for an example of the problems that can be caused by the second of these bugs.

I couldn't see a way of fixing these in a completely target-independent way without
adding lots of unnecessary fences on ARM, hence the target-dependent parts of this
patch.

This patch implements the new target-dependent parts only for ARM (the default
of not doing anything is enough for AArch64), other architectures will use this
infrastructure in later patches.

Adds Jim Grosbach as a reviewer based on his addition by rengolin to the now defunct D4961.

ping.

I should defer to Tim Northover on this sort of thing. He's better equipped than I to get into the details of atomics.

Same here. I can't see anything wrong with the code, but I might be missing a lot of the atomic context. Maybe James can also comment on that side?

cheers,
--renato

Hi Robin,

Sorry, this got lost over my holiday. I think it's mostly fine now. Just one very minor style point and a possibly misleading comment.

Cheers.

Tim.

lib/Target/ARM/ARMISelLowering.cpp
11001	Should probably be "makeDMB", for LLVM style.
11052–11053	By my tests, an ISB is even heavier than a DMB, so we probably don't want to encourage this alternative. I'd just skip the comment entirely.

Thanks for the review !

lib/Target/ARM/ARMISelLowering.cpp
11001	Ok, I will do it.
11052–11053	Awesome, I was planning on doing benchmarks to evaluate this possibility; thanks for having done them :-) I will remove the comment.

jfb added inline comments.Sep 3 2014, 10:46 AM

lib/Target/ARM/ARMISelLowering.cpp
11052–11053	I'd remove the comment but I think the benchmark is still good to do: different ARM implementations perform very differently. I assume Tim measured on Apple hardware? We can try out the ones we have on hand here to make sure the results are the same.

MakeDMB -> makeDMB
Erase comment.

I'd remove the comment but I think the benchmark is still good to do: different ARM implementations perform very differently. I assume Tim measured on Apple hardware? We can try out the ones we have on hand here to make sure the results are the same.

Yep, it was just a very quick trial on an iPad I had lying around.
It's definitely worth checking on other hardware (and in more
realistic situations than a tight DMB/ISB loop) before dismissing the
idea completely.

The new patch looks fine to me, though. Those are questions for another day.

Cheers.

Tim.

Ok, can the revision be accepted then ? Thanks.

Sure, though we don't actually care about that as far as I know (the primary record of review is llvm-commits, and I'd quite happily accept one of my own revisions once someone had said OK).

Tim.

This revision is now accepted and ready to land.Sep 3 2014, 1:04 PM

It's mostly to be able to close the review after commit.

r217076
Thanks everyone

Revision Contents

Path

Size

lib/

CodeGen/

AtomicExpandPass.cpp

92 lines

Target/

ARM/

ARMISelLowering.h

5 lines

ARMISelLowering.cpp

63 lines

test/

Transforms/

AtomicExpand/

ARM/

atomic-expansion-v7.ll

68 lines

cmpxchg-weak.ll

19 lines

Diff 12869

lib/CodeGen/AtomicExpandPass.cpp

Show All 38 Lines	public:

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;
bool expandAtomicInsts(Function &F);		bool expandAtomicInsts(Function &F);

bool expandAtomicLoad(LoadInst *LI);		bool expandAtomicLoad(LoadInst *LI);
bool expandAtomicStore(StoreInst *LI);		bool expandAtomicStore(StoreInst *LI);
bool expandAtomicRMW(AtomicRMWInst *AI);		bool expandAtomicRMW(AtomicRMWInst *AI);
bool expandAtomicCmpXchg(AtomicCmpXchgInst *CI);		bool expandAtomicCmpXchg(AtomicCmpXchgInst *CI);

AtomicOrdering insertLeadingFence(IRBuilder<> &Builder, AtomicOrdering Ord);
void insertTrailingFence(IRBuilder<> &Builder, AtomicOrdering Ord);
};		};
}		}

char AtomicExpand::ID = 0;		char AtomicExpand::ID = 0;
char &llvm::AtomicExpandID = AtomicExpand::ID;		char &llvm::AtomicExpandID = AtomicExpand::ID;
INITIALIZE_TM_PASS(AtomicExpand, "atomic-expand",		INITIALIZE_TM_PASS(AtomicExpand, "atomic-expand",
"Expand Atomic calls in terms of either load-linked & store-conditional or cmpxchg",		"Expand Atomic calls in terms of either load-linked & store-conditional or cmpxchg",
false, false)		false, false)
Show All 35 Lines	for (Instruction *Inst : AtomicInsts) {
else		else
llvm_unreachable("Unknown atomic instruction");		llvm_unreachable("Unknown atomic instruction");
}		}

return MadeChange;		return MadeChange;
}		}

bool AtomicExpand::expandAtomicLoad(LoadInst *LI) {		bool AtomicExpand::expandAtomicLoad(LoadInst *LI) {
// Load instructions don't actually need a leading fence, even in the		auto TLI = TM->getSubtargetImpl()->getTargetLowering();
// SequentiallyConsistent case.		// If getInsertFencesForAtomic() returns true, then the target does not want to
		// deal with memory orders, and emitLeading/TrailingFence should take care of
		// everything. Otherwise, emitLeading/TrailingFence are no-op and we should
		// preserve the ordering.
AtomicOrdering MemOpOrder =		AtomicOrdering MemOpOrder =
TM->getSubtargetImpl()->getTargetLowering()->getInsertFencesForAtomic()		TLI->getInsertFencesForAtomic() ? Monotonic : LI->getOrdering();
? Monotonic		IRBuilder<> Builder(LI);
: LI->getOrdering();

// The only 64-bit load guaranteed to be single-copy atomic by the ARM is		// Note that although no fence is required before atomic load on ARM, it is required
		// before SequentiallyConsistent loads for the recommended Power mapping (see
		// http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html).
		// So we let the target choose what to emit.
		TLI->emitLeadingFence(Builder, LI->getOrdering(),
		/IsStore=/false, /IsLoad=/true);

		// The only 64-bit load guaranteed to be single-copy atomic by ARM is
// an ldrexd (A3.5.3).		// an ldrexd (A3.5.3).
IRBuilder<> Builder(LI);		Value *Val = TLI->emitLoadLinked(Builder, LI->getPointerOperand(), MemOpOrder);
Value *Val = TM->getSubtargetImpl()->getTargetLowering()->emitLoadLinked(
Builder, LI->getPointerOperand(), MemOpOrder);

insertTrailingFence(Builder, LI->getOrdering());		TLI->emitTrailingFence(Builder, LI->getOrdering(),
		/IsStore=/false, /IsLoad=/true);

LI->replaceAllUsesWith(Val);		LI->replaceAllUsesWith(Val);
LI->eraseFromParent();		LI->eraseFromParent();

return true;		return true;
}		}

bool AtomicExpand::expandAtomicStore(StoreInst *SI) {		bool AtomicExpand::expandAtomicStore(StoreInst *SI) {
// The only atomic 64-bit store on ARM is an strexd that succeeds, which means		// The only atomic 64-bit store on ARM is an strexd that succeeds, which means
// we need a loop and the entire instruction is essentially an "atomicrmw		// we need a loop and the entire instruction is essentially an "atomicrmw
// xchg" that ignores the value loaded.		// xchg" that ignores the value loaded.
IRBuilder<> Builder(SI);		IRBuilder<> Builder(SI);
AtomicRMWInst *AI =		AtomicRMWInst *AI =
Builder.CreateAtomicRMW(AtomicRMWInst::Xchg, SI->getPointerOperand(),		Builder.CreateAtomicRMW(AtomicRMWInst::Xchg, SI->getPointerOperand(),
SI->getValueOperand(), SI->getOrdering());		SI->getValueOperand(), SI->getOrdering());
SI->eraseFromParent();		SI->eraseFromParent();

// Now we have an appropriate swap instruction, lower it as usual.		// Now we have an appropriate swap instruction, lower it as usual.
return expandAtomicRMW(AI);		return expandAtomicRMW(AI);
}		}

bool AtomicExpand::expandAtomicRMW(AtomicRMWInst *AI) {		bool AtomicExpand::expandAtomicRMW(AtomicRMWInst *AI) {
		auto TLI = TM->getSubtargetImpl()->getTargetLowering();
AtomicOrdering Order = AI->getOrdering();		AtomicOrdering Order = AI->getOrdering();
Value *Addr = AI->getPointerOperand();		Value *Addr = AI->getPointerOperand();
BasicBlock *BB = AI->getParent();		BasicBlock *BB = AI->getParent();
Function *F = BB->getParent();		Function *F = BB->getParent();
LLVMContext &Ctx = F->getContext();		LLVMContext &Ctx = F->getContext();
		// If getInsertFencesForAtomic() return true, then the target does not want to
		// deal with memory orders, and emitLeading/TrailingFence should take care of
		// everything. Otherwise, emitLeading/TrailingFence are no-op and we should
		// preserve the ordering.
		AtomicOrdering MemOpOrder =
		TLI->getInsertFencesForAtomic() ? Monotonic : Order;

// Given: atomicrmw some_op iN* %addr, iN %incr ordering		// Given: atomicrmw some_op iN* %addr, iN %incr ordering
//		//
// The standard expansion we produce is:		// The standard expansion we produce is:
// [...]		// [...]
// fence?		// fence?
// atomicrmw.start:		// atomicrmw.start:
// %loaded = @load.linked(%addr)		// %loaded = @load.linked(%addr)
Show All 10 Lines	bool AtomicExpand::expandAtomicRMW(AtomicRMWInst *AI) {
// This grabs the DebugLoc from AI.		// This grabs the DebugLoc from AI.
IRBuilder<> Builder(AI);		IRBuilder<> Builder(AI);

// The split call above "helpfully" added a branch at the end of BB (to the		// The split call above "helpfully" added a branch at the end of BB (to the
// wrong place), but we might want a fence too. It's easiest to just remove		// wrong place), but we might want a fence too. It's easiest to just remove
// the branch entirely.		// the branch entirely.
std::prev(BB->end())->eraseFromParent();		std::prev(BB->end())->eraseFromParent();
Builder.SetInsertPoint(BB);		Builder.SetInsertPoint(BB);
AtomicOrdering MemOpOrder = insertLeadingFence(Builder, Order);		TLI->emitLeadingFence(Builder, Order, /IsStore=/true, /IsLoad=/true);
Builder.CreateBr(LoopBB);		Builder.CreateBr(LoopBB);

// Start the main loop block now that we've taken care of the preliminaries.		// Start the main loop block now that we've taken care of the preliminaries.
Builder.SetInsertPoint(LoopBB);		Builder.SetInsertPoint(LoopBB);
Value *Loaded = TM->getSubtargetImpl()->getTargetLowering()->emitLoadLinked(		Value *Loaded = TLI->emitLoadLinked(Builder, Addr, MemOpOrder);
Builder, Addr, MemOpOrder);

Value *NewVal;		Value *NewVal;
switch (AI->getOperation()) {		switch (AI->getOperation()) {
case AtomicRMWInst::Xchg:		case AtomicRMWInst::Xchg:
NewVal = AI->getValOperand();		NewVal = AI->getValOperand();
break;		break;
case AtomicRMWInst::Add:		case AtomicRMWInst::Add:
NewVal = Builder.CreateAdd(Loaded, AI->getValOperand(), "new");		NewVal = Builder.CreateAdd(Loaded, AI->getValOperand(), "new");
Show All 30 Lines	case AtomicRMWInst::UMin:
NewVal = Builder.CreateICmpULE(Loaded, AI->getValOperand());		NewVal = Builder.CreateICmpULE(Loaded, AI->getValOperand());
NewVal = Builder.CreateSelect(NewVal, Loaded, AI->getValOperand(), "new");		NewVal = Builder.CreateSelect(NewVal, Loaded, AI->getValOperand(), "new");
break;		break;
default:		default:
llvm_unreachable("Unknown atomic op");		llvm_unreachable("Unknown atomic op");
}		}

Value *StoreSuccess =		Value *StoreSuccess =
TM->getSubtargetImpl()->getTargetLowering()->emitStoreConditional(		TLI->emitStoreConditional(Builder, NewVal, Addr, MemOpOrder);
Builder, NewVal, Addr, MemOpOrder);
Value *TryAgain = Builder.CreateICmpNE(		Value *TryAgain = Builder.CreateICmpNE(
StoreSuccess, ConstantInt::get(IntegerType::get(Ctx, 32), 0), "tryagain");		StoreSuccess, ConstantInt::get(IntegerType::get(Ctx, 32), 0), "tryagain");
Builder.CreateCondBr(TryAgain, LoopBB, ExitBB);		Builder.CreateCondBr(TryAgain, LoopBB, ExitBB);

Builder.SetInsertPoint(ExitBB, ExitBB->begin());		Builder.SetInsertPoint(ExitBB, ExitBB->begin());
insertTrailingFence(Builder, Order);		TLI->emitTrailingFence(Builder, Order, /IsStore=/true, /IsLoad=/true);

AI->replaceAllUsesWith(Loaded);		AI->replaceAllUsesWith(Loaded);
AI->eraseFromParent();		AI->eraseFromParent();

return true;		return true;
}		}

bool AtomicExpand::expandAtomicCmpXchg(AtomicCmpXchgInst *CI) {		bool AtomicExpand::expandAtomicCmpXchg(AtomicCmpXchgInst *CI) {
		auto TLI = TM->getSubtargetImpl()->getTargetLowering();
AtomicOrdering SuccessOrder = CI->getSuccessOrdering();		AtomicOrdering SuccessOrder = CI->getSuccessOrdering();
AtomicOrdering FailureOrder = CI->getFailureOrdering();		AtomicOrdering FailureOrder = CI->getFailureOrdering();
Value *Addr = CI->getPointerOperand();		Value *Addr = CI->getPointerOperand();
BasicBlock *BB = CI->getParent();		BasicBlock *BB = CI->getParent();
Function *F = BB->getParent();		Function *F = BB->getParent();
LLVMContext &Ctx = F->getContext();		LLVMContext &Ctx = F->getContext();
		// If getInsertFencesForAtomic() return true, then the target does not want to
		// deal with memory orders, and emitLeading/TrailingFence should take care of
		// everything. Otherwise, emitLeading/TrailingFence are no-op and we should
		// preserve the ordering.
		AtomicOrdering MemOpOrder =
		TLI->getInsertFencesForAtomic() ? Monotonic : SuccessOrder;

// Given: cmpxchg some_op iN* %addr, iN %desired, iN %new success_ord fail_ord		// Given: cmpxchg some_op iN* %addr, iN %desired, iN %new success_ord fail_ord
//		//
// The full expansion we produce is:		// The full expansion we produce is:
// [...]		// [...]
// fence?		// fence?
// cmpxchg.start:		// cmpxchg.start:
// %loaded = @load.linked(%addr)		// %loaded = @load.linked(%addr)
Show All 24 Lines	bool AtomicExpand::expandAtomicCmpXchg(AtomicCmpXchgInst *CI) {
// This grabs the DebugLoc from CI		// This grabs the DebugLoc from CI
IRBuilder<> Builder(CI);		IRBuilder<> Builder(CI);

// The split call above "helpfully" added a branch at the end of BB (to the		// The split call above "helpfully" added a branch at the end of BB (to the
// wrong place), but we might want a fence too. It's easiest to just remove		// wrong place), but we might want a fence too. It's easiest to just remove
// the branch entirely.		// the branch entirely.
std::prev(BB->end())->eraseFromParent();		std::prev(BB->end())->eraseFromParent();
Builder.SetInsertPoint(BB);		Builder.SetInsertPoint(BB);
AtomicOrdering MemOpOrder = insertLeadingFence(Builder, SuccessOrder);		TLI->emitLeadingFence(Builder,
		SuccessOrder, /IsStore=/true, /IsLoad=/true);
Builder.CreateBr(LoopBB);		Builder.CreateBr(LoopBB);

// Start the main loop block now that we've taken care of the preliminaries.		// Start the main loop block now that we've taken care of the preliminaries.
Builder.SetInsertPoint(LoopBB);		Builder.SetInsertPoint(LoopBB);
Value *Loaded = TM->getSubtargetImpl()->getTargetLowering()->emitLoadLinked(		Value *Loaded = TLI->emitLoadLinked(Builder, Addr, MemOpOrder);
Builder, Addr, MemOpOrder);
Value *ShouldStore =		Value *ShouldStore =
Builder.CreateICmpEQ(Loaded, CI->getCompareOperand(), "should_store");		Builder.CreateICmpEQ(Loaded, CI->getCompareOperand(), "should_store");

// If the the cmpxchg doesn't actually need any ordering when it fails, we can		// If the the cmpxchg doesn't actually need any ordering when it fails, we can
// jump straight past that fence instruction (if it exists).		// jump straight past that fence instruction (if it exists).
Builder.CreateCondBr(ShouldStore, TryStoreBB, FailureBB);		Builder.CreateCondBr(ShouldStore, TryStoreBB, FailureBB);

Builder.SetInsertPoint(TryStoreBB);		Builder.SetInsertPoint(TryStoreBB);
Value *StoreSuccess =		Value *StoreSuccess =
TM->getSubtargetImpl()->getTargetLowering()->emitStoreConditional(		TLI->emitStoreConditional(Builder, CI->getNewValOperand(), Addr, MemOpOrder);
Builder, CI->getNewValOperand(), Addr, MemOpOrder);
StoreSuccess = Builder.CreateICmpEQ(		StoreSuccess = Builder.CreateICmpEQ(
StoreSuccess, ConstantInt::get(Type::getInt32Ty(Ctx), 0), "success");		StoreSuccess, ConstantInt::get(Type::getInt32Ty(Ctx), 0), "success");
Builder.CreateCondBr(StoreSuccess, SuccessBB,		Builder.CreateCondBr(StoreSuccess, SuccessBB,
CI->isWeak() ? FailureBB : LoopBB);		CI->isWeak() ? FailureBB : LoopBB);

// Make sure later instructions don't get reordered with a fence if necessary.		// Make sure later instructions don't get reordered with a fence if necessary.
Builder.SetInsertPoint(SuccessBB);		Builder.SetInsertPoint(SuccessBB);
insertTrailingFence(Builder, SuccessOrder);		TLI->emitTrailingFence(Builder, SuccessOrder, /IsStore=/true, /IsLoad=/true);
Builder.CreateBr(ExitBB);		Builder.CreateBr(ExitBB);

Builder.SetInsertPoint(FailureBB);		Builder.SetInsertPoint(FailureBB);
insertTrailingFence(Builder, FailureOrder);		TLI->emitTrailingFence(Builder, FailureOrder, /IsStore=/true, /IsLoad=/true);
Builder.CreateBr(ExitBB);		Builder.CreateBr(ExitBB);

// Finally, we have control-flow based knowledge of whether the cmpxchg		// Finally, we have control-flow based knowledge of whether the cmpxchg
// succeeded or not. We expose this to later passes by converting any		// succeeded or not. We expose this to later passes by converting any
// subsequent "icmp eq/ne %loaded, %oldval" into a use of an appropriate PHI.		// subsequent "icmp eq/ne %loaded, %oldval" into a use of an appropriate PHI.

// Setup the builder so we can create any PHIs we need.		// Setup the builder so we can create any PHIs we need.
Builder.SetInsertPoint(ExitBB, ExitBB->begin());		Builder.SetInsertPoint(ExitBB, ExitBB->begin());
Show All 32 Lines	if (!CI->use_empty()) {
Res = Builder.CreateInsertValue(Res, Success, 1);		Res = Builder.CreateInsertValue(Res, Success, 1);

CI->replaceAllUsesWith(Res);		CI->replaceAllUsesWith(Res);
}		}

CI->eraseFromParent();		CI->eraseFromParent();
return true;		return true;
}		}

AtomicOrdering AtomicExpand::insertLeadingFence(IRBuilder<> &Builder,
AtomicOrdering Ord) {
if (!TM->getSubtargetImpl()->getTargetLowering()->getInsertFencesForAtomic())
return Ord;

if (Ord == Release \|\| Ord == AcquireRelease \|\| Ord == SequentiallyConsistent)
Builder.CreateFence(Release);

// The exclusive operations don't need any barrier if we're adding separate
// fences.
return Monotonic;
}

void AtomicExpand::insertTrailingFence(IRBuilder<> &Builder,
AtomicOrdering Ord) {
if (!TM->getSubtargetImpl()->getTargetLowering()->getInsertFencesForAtomic())
return;

if (Ord == Acquire \|\| Ord == AcquireRelease)
Builder.CreateFence(Acquire);
else if (Ord == SequentiallyConsistent)
Builder.CreateFence(SequentiallyConsistent);
}

lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	public:
bool functionArgumentNeedsConsecutiveRegisters(		bool functionArgumentNeedsConsecutiveRegisters(
Type *Ty, CallingConv::ID CallConv, bool isVarArg) const override;		Type *Ty, CallingConv::ID CallConv, bool isVarArg) const override;

Value emitLoadLinked(IRBuilder<> &Builder, Value Addr,		Value emitLoadLinked(IRBuilder<> &Builder, Value Addr,
AtomicOrdering Ord) const override;		AtomicOrdering Ord) const override;
Value emitStoreConditional(IRBuilder<> &Builder, Value Val,		Value emitStoreConditional(IRBuilder<> &Builder, Value Val,
Value *Addr, AtomicOrdering Ord) const override;		Value *Addr, AtomicOrdering Ord) const override;

		void emitLeadingFence(IRBuilder<> &Builder, AtomicOrdering Ord,
		bool IsStore, bool IsLoad) const override;
		void emitTrailingFence(IRBuilder<> &Builder, AtomicOrdering Ord,
		bool IsStore, bool IsLoad) const override;

bool shouldExpandAtomicInIR(Instruction *Inst) const override;		bool shouldExpandAtomicInIR(Instruction *Inst) const override;

bool useLoadStackGuardNode() const override;		bool useLoadStackGuardNode() const override;

protected:		protected:
std::pair<const TargetRegisterClass*, uint8_t>		std::pair<const TargetRegisterClass*, uint8_t>
findRepresentativeClass(MVT VT) const override;		findRepresentativeClass(MVT VT) const override;

private:		private:
/// Subtarget - Keep a pointer to the ARMSubtarget around so that we can		/// Subtarget - Keep a pointer to the ARMSubtarget around so that we can
/// make the right decision when generating code for different targets.		/// make the right decision when generating code for different targets.
const ARMSubtarget *Subtarget;		const ARMSubtarget *Subtarget;

const TargetRegisterInfo *RegInfo;		const TargetRegisterInfo *RegInfo;

const InstrItineraryData *Itins;		const InstrItineraryData *Itins;

/// ARMPCLabelIndex - Keep track of the number of ARM PC labels created.		/// ARMPCLabelIndex - Keep track of the number of ARM PC labels created.
///		///
		rengolinUnsubmitted Not Done Reply Inline Actions this could be left out rengolin: this could be left out
unsigned ARMPCLabelIndex;		unsigned ARMPCLabelIndex;

void addTypeForNEON(MVT VT, MVT PromotedLdStVT, MVT PromotedBitwiseVT);		void addTypeForNEON(MVT VT, MVT PromotedLdStVT, MVT PromotedBitwiseVT);
void addDRTypeForNEON(MVT VT);		void addDRTypeForNEON(MVT VT);
void addQRTypeForNEON(MVT VT);		void addQRTypeForNEON(MVT VT);
std::pair<SDValue, SDValue> getARMXALUOOp(SDValue Op, SelectionDAG &DAG, SDValue &ARMcc) const;		std::pair<SDValue, SDValue> getARMXALUOOp(SDValue Op, SelectionDAG &DAG, SDValue &ARMcc) const;

typedef SmallVector<std::pair<unsigned, SDValue>, 8> RegsToPassVector;		typedef SmallVector<std::pair<unsigned, SDValue>, 8> RegsToPassVector;
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,731 Lines • ▼ Show 20 Lines	if (!Subtarget->hasDataBarrier()) {
assert(Subtarget->hasV6Ops() && !Subtarget->isThumb() &&		assert(Subtarget->hasV6Ops() && !Subtarget->isThumb() &&
"Unexpected ISD::ATOMIC_FENCE encountered. Should be libcall!");		"Unexpected ISD::ATOMIC_FENCE encountered. Should be libcall!");
return DAG.getNode(ARMISD::MEMBARRIER_MCR, dl, MVT::Other, Op.getOperand(0),		return DAG.getNode(ARMISD::MEMBARRIER_MCR, dl, MVT::Other, Op.getOperand(0),
DAG.getConstant(0, MVT::i32));		DAG.getConstant(0, MVT::i32));
}		}

ConstantSDNode *OrdN = cast<ConstantSDNode>(Op.getOperand(1));		ConstantSDNode *OrdN = cast<ConstantSDNode>(Op.getOperand(1));
AtomicOrdering Ord = static_cast<AtomicOrdering>(OrdN->getZExtValue());		AtomicOrdering Ord = static_cast<AtomicOrdering>(OrdN->getZExtValue());
unsigned Domain = ARM_MB::ISH;		ARM_MB::MemBOpt Domain = ARM_MB::ISH;
if (Subtarget->isMClass()) {		if (Subtarget->isMClass()) {
// Only a full system barrier exists in the M-class architectures.		// Only a full system barrier exists in the M-class architectures.
Domain = ARM_MB::SY;		Domain = ARM_MB::SY;
} else if (Subtarget->isSwift() && Ord == Release) {		} else if (Subtarget->isSwift() && Ord == Release) {
// Swift happens to implement ISHST barriers in a way that's compatible with		// Swift happens to implement ISHST barriers in a way that's compatible with
// Release semantics but weaker than ISH so we'd be fools not to use		// Release semantics but weaker than ISH so we'd be fools not to use
// it. Beware: other processors probably don't!		// it. Beware: other processors probably don't!
Domain = ARM_MB::ISHST;		Domain = ARM_MB::ISHST;
▲ Show 20 Lines • Show All 8,244 Lines • ▼ Show 20 Lines	bool ARMTargetLowering::shouldConvertConstantLoadToIntImm(const APInt &Imm,
assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned Bits = Ty->getPrimitiveSizeInBits();		unsigned Bits = Ty->getPrimitiveSizeInBits();
if (Bits == 0 \|\| Bits > 32)		if (Bits == 0 \|\| Bits > 32)
return false;		return false;
return true;		return true;
}		}

		static void MakeDMB(IRBuilder<> &Builder, ARM_MB::MemBOpt Domain) {
		jfbUnsubmitted Not Done Reply Inline Actions This should be `static`, and the `Domain` parameter an `ARM_MB::MemBOpt`. jfb: This should be `static`, and the `Domain` parameter an `ARM_MB::MemBOpt`.
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Should probably be "makeDMB", for LLVM style. t.p.northover: Should probably be "makeDMB", for LLVM style.
		morissetAuthorUnsubmitted Not Done Reply Inline Actions Ok, I will do it. morisset: Ok, I will do it.
		Module *M = Builder.GetInsertBlock()->getParent()->getParent();
		Function *DMB = llvm::Intrinsic::getDeclaration(M, Intrinsic::arm_dmb);
		Constant *CDomain = Builder.getInt32(Domain);
		Builder.CreateCall(DMB, CDomain);
		}

		// Based on http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
		rengolinUnsubmitted Not Done Reply Inline Actions www rengolin: www
		void ARMTargetLowering::emitLeadingFence(IRBuilder<> &Builder,
		AtomicOrdering Ord,
		bool IsStore, bool IsLoad) const {
		if (!getInsertFencesForAtomic())
		return;

		switch (Ord) {
		case NotAtomic:
		case Unordered:
		llvm_unreachable("Invalid fence: unordered/non-atomic");
		rengolinUnsubmitted Not Done Reply Inline Actions "Invalid fence" seems less aggressive... :) rengolin: "Invalid fence" seems less aggressive... :)
		case Monotonic:
		case Acquire:
		return; // Nothing to do
		case SequentiallyConsistent:
		if (!IsStore)
		return; // Nothing to do
		morissetAuthorUnsubmitted Not Done Reply Inline Actions I just found that the test CodeGen/ARM/swift-atomics suggest that this case should fall-through (i.e. that a DMB ishst is correct in this case for Swift processor). I can change it, but I would love to link to some documentation saying so more clearly first, and I cannot seem to find it. Does anyone know where this is documented (or at least can confirm that dmb ishst is a valid leading fence for seq_cst stores)? morisset: I just found that the test CodeGen/ARM/swift-atomics suggest that this case should fall-through…
		/FALLTHROUGH/
		case Release:
		case AcquireRelease:
		if (Subtarget->isSwift())
		MakeDMB(Builder, ARM_MB::ISHST);
		//FIXME: add a comment with a link to documentation justifying this.
		else
		MakeDMB(Builder, ARM_MB::ISH);
		return;
		}
		}

		void ARMTargetLowering::emitTrailingFence(IRBuilder<> &Builder,
		AtomicOrdering Ord,
		bool IsStore, bool IsLoad) const {
		if (!getInsertFencesForAtomic())
		return;

		switch (Ord) {
		case NotAtomic:
		case Unordered:
		llvm_unreachable("Invalid fence: unordered/not-atomic");
		rengolinUnsubmitted Not Done Reply Inline Actions idem rengolin: idem
		case Monotonic:
		case Release:
		return; // Nothing to do
		case Acquire:
		case AcquireRelease:
		jfbUnsubmitted Not Done Reply Inline Actions Link to documentation that explains this. jfb: Link to documentation that explains this.
		// FIXME: Too conservative, an isb after a dependent branch is enough
		// See http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html, confirmed in
		t.p.northoverUnsubmitted Not Done Reply Inline Actions By my tests, an ISB is even heavier than a DMB, so we probably don't want to encourage this alternative. I'd just skip the comment entirely. t.p.northover: By my tests, an ISB is even heavier than a DMB, so we probably don't want to encourage this…
		morissetAuthorUnsubmitted Not Done Reply Inline Actions Awesome, I was planning on doing benchmarks to evaluate this possibility; thanks for having done them :-) I will remove the comment. morisset: Awesome, I was planning on doing benchmarks to evaluate this possibility; thanks for having…
		jfbUnsubmitted Not Done Reply Inline Actions I'd remove the comment but I think the benchmark is still good to do: different ARM implementations perform very differently. I assume Tim measured on Apple hardware? We can try out the ones we have on hand here to make sure the results are the same. jfb: I'd remove the comment but I think the benchmark is still good to do: different ARM…
		// http://www0.cs.ucl.ac.uk/staff/j.alglave/papers/toplas14.pdf page 36,
		// and hinted at by the ARM documentation section A3.8.3.
		case SequentiallyConsistent:
		MakeDMB(Builder, ARM_MB::ISH);
		return;
		}
		}

bool ARMTargetLowering::shouldExpandAtomicInIR(Instruction *Inst) const {		bool ARMTargetLowering::shouldExpandAtomicInIR(Instruction *Inst) const {
// Loads and stores less than 64-bits are already atomic; ones above that		// Loads and stores less than 64-bits are already atomic; ones above that
// are doomed anyway, so defer to the default libcall and blame the OS when		// are doomed anyway, so defer to the default libcall and blame the OS when
// things go wrong. Cortex M doesn't have ldrexd/strexd though, so don't emit		// things go wrong. Cortex M doesn't have ldrexd/strexd though, so don't emit
// anything for those.		// anything for those.
bool IsMClass = Subtarget->isMClass();		bool IsMClass = Subtarget->isMClass();
if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {		if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {
unsigned Size = SI->getValueOperand()->getType()->getPrimitiveSizeInBits();		unsigned Size = SI->getValueOperand()->getType()->getPrimitiveSizeInBits();
▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

test/Transforms/AtomicExpand/ARM/atomic-expansion-v7.ll

	; RUN: opt -S -o - -mtriple=armv7-apple-ios7.0 -atomic-expand %s \| FileCheck %s			; RUN: opt -S -o - -mtriple=armv7-apple-ios7.0 -atomic-expand %s \| FileCheck %s

	define i8 @test_atomic_xchg_i8(i8* %ptr, i8 %xchgend) {			define i8 @test_atomic_xchg_i8(i8* %ptr, i8 %xchgend) {
	; CHECK-LABEL: @test_atomic_xchg_i8			; CHECK-LABEL: @test_atomic_xchg_i8
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[NEWVAL32:%.*]] = zext i8 %xchgend to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 %xchgend to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw xchg i8* %ptr, i8 %xchgend monotonic			%res = atomicrmw xchg i8* %ptr, i8 %xchgend monotonic
	ret i8 %res			ret i8 %res
	}			}

	define i16 @test_atomic_add_i16(i16* %ptr, i16 %addend) {			define i16 @test_atomic_add_i16(i16* %ptr, i16 %addend) {
	; CHECK-LABEL: @test_atomic_add_i16			; CHECK-LABEL: @test_atomic_add_i16
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i16(i16 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i16(i16 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i16			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i16
	; CHECK: [[NEWVAL:%.*]] = add i16 [[OLDVAL]], %addend			; CHECK: [[NEWVAL:%.*]] = add i16 [[OLDVAL]], %addend
	; CHECK: [[NEWVAL32:%.*]] = zext i16 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i16 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i16(i32 [[NEWVAL32]], i16 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i16(i32 [[NEWVAL32]], i16 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i16 [[OLDVAL]]			; CHECK: ret i16 [[OLDVAL]]
	%res = atomicrmw add i16* %ptr, i16 %addend seq_cst			%res = atomicrmw add i16* %ptr, i16 %addend seq_cst
	ret i16 %res			ret i16 %res
	}			}

	define i32 @test_atomic_sub_i32(i32* %ptr, i32 %subend) {			define i32 @test_atomic_sub_i32(i32* %ptr, i32 %subend) {
	; CHECK-LABEL: @test_atomic_sub_i32			; CHECK-LABEL: @test_atomic_sub_i32
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %ptr)			; CHECK: [[OLDVAL:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %ptr)
	; CHECK: [[NEWVAL:%.*]] = sub i32 [[OLDVAL]], %subend			; CHECK: [[NEWVAL:%.*]] = sub i32 [[OLDVAL]], %subend
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i32(i32 [[NEWVAL]], i32 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i32(i32 [[NEWVAL]], i32 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: fence acquire			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i32 [[OLDVAL]]			; CHECK: ret i32 [[OLDVAL]]
	%res = atomicrmw sub i32* %ptr, i32 %subend acquire			%res = atomicrmw sub i32* %ptr, i32 %subend acquire
	ret i32 %res			ret i32 %res
	}			}

	define i8 @test_atomic_and_i8(i8* %ptr, i8 %andend) {			define i8 @test_atomic_and_i8(i8* %ptr, i8 %andend) {
	; CHECK-LABEL: @test_atomic_and_i8			; CHECK-LABEL: @test_atomic_and_i8
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[NEWVAL:%.*]] = and i8 [[OLDVAL]], %andend			; CHECK: [[NEWVAL:%.*]] = and i8 [[OLDVAL]], %andend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw and i8* %ptr, i8 %andend release			%res = atomicrmw and i8* %ptr, i8 %andend release
	ret i8 %res			ret i8 %res
	}			}

	define i16 @test_atomic_nand_i16(i16* %ptr, i16 %nandend) {			define i16 @test_atomic_nand_i16(i16* %ptr, i16 %nandend) {
	; CHECK-LABEL: @test_atomic_nand_i16			; CHECK-LABEL: @test_atomic_nand_i16
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i16(i16 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i16(i16 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i16			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i16
	; CHECK: [[NEWVAL_TMP:%.*]] = and i16 [[OLDVAL]], %nandend			; CHECK: [[NEWVAL_TMP:%.*]] = and i16 [[OLDVAL]], %nandend
	; CHECK: [[NEWVAL:%.*]] = xor i16 [[NEWVAL_TMP]], -1			; CHECK: [[NEWVAL:%.*]] = xor i16 [[NEWVAL_TMP]], -1
	; CHECK: [[NEWVAL32:%.*]] = zext i16 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i16 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i16(i32 [[NEWVAL32]], i16 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i16(i32 [[NEWVAL32]], i16 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i16 [[OLDVAL]]			; CHECK: ret i16 [[OLDVAL]]
	%res = atomicrmw nand i16* %ptr, i16 %nandend seq_cst			%res = atomicrmw nand i16* %ptr, i16 %nandend seq_cst
	ret i16 %res			ret i16 %res
	}			}

	define i64 @test_atomic_or_i64(i64* %ptr, i64 %orend) {			define i64 @test_atomic_or_i64(i64* %ptr, i64 %orend) {
	; CHECK-LABEL: @test_atomic_or_i64			; CHECK-LABEL: @test_atomic_or_i64
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*			; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*
	; CHECK: [[LOHI:%.]] = call { i32, i32 } @llvm.arm.ldrexd(i8 [[PTR8]])			; CHECK: [[LOHI:%.]] = call { i32, i32 } @llvm.arm.ldrexd(i8 [[PTR8]])
	; CHECK: [[LO:%.*]] = extractvalue { i32, i32 } [[LOHI]], 0			; CHECK: [[LO:%.*]] = extractvalue { i32, i32 } [[LOHI]], 0
	; CHECK: [[HI:%.*]] = extractvalue { i32, i32 } [[LOHI]], 1			; CHECK: [[HI:%.*]] = extractvalue { i32, i32 } [[LOHI]], 1
	; CHECK: [[LO64:%.*]] = zext i32 [[LO]] to i64			; CHECK: [[LO64:%.*]] = zext i32 [[LO]] to i64
	; CHECK: [[HI64_TMP:%.*]] = zext i32 [[HI]] to i64			; CHECK: [[HI64_TMP:%.*]] = zext i32 [[HI]] to i64
	; CHECK: [[HI64:%.*]] = shl i64 [[HI64_TMP]], 32			; CHECK: [[HI64:%.*]] = shl i64 [[HI64_TMP]], 32
	; CHECK: [[OLDVAL:%.*]] = or i64 [[LO64]], [[HI64]]			; CHECK: [[OLDVAL:%.*]] = or i64 [[LO64]], [[HI64]]
	; CHECK: [[NEWVAL:%.*]] = or i64 [[OLDVAL]], %orend			; CHECK: [[NEWVAL:%.*]] = or i64 [[OLDVAL]], %orend
	; CHECK: [[NEWLO:%.*]] = trunc i64 [[NEWVAL]] to i32			; CHECK: [[NEWLO:%.*]] = trunc i64 [[NEWVAL]] to i32
	; CHECK: [[NEWHI_TMP:%.*]] = lshr i64 [[NEWVAL]], 32			; CHECK: [[NEWHI_TMP:%.*]] = lshr i64 [[NEWVAL]], 32
	; CHECK: [[NEWHI:%.*]] = trunc i64 [[NEWHI_TMP]] to i32			; CHECK: [[NEWHI:%.*]] = trunc i64 [[NEWHI_TMP]] to i32
	; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*			; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strexd(i32 [[NEWLO]], i32 [[NEWHI]], i8 [[PTR8]])			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strexd(i32 [[NEWLO]], i32 [[NEWHI]], i8 [[PTR8]])
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i64 [[OLDVAL]]			; CHECK: ret i64 [[OLDVAL]]
	%res = atomicrmw or i64* %ptr, i64 %orend seq_cst			%res = atomicrmw or i64* %ptr, i64 %orend seq_cst
	ret i64 %res			ret i64 %res
	}			}

	define i8 @test_atomic_xor_i8(i8* %ptr, i8 %xorend) {			define i8 @test_atomic_xor_i8(i8* %ptr, i8 %xorend) {
	; CHECK-LABEL: @test_atomic_xor_i8			; CHECK-LABEL: @test_atomic_xor_i8
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[NEWVAL:%.*]] = xor i8 [[OLDVAL]], %xorend			; CHECK: [[NEWVAL:%.*]] = xor i8 [[OLDVAL]], %xorend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw xor i8* %ptr, i8 %xorend seq_cst			%res = atomicrmw xor i8* %ptr, i8 %xorend seq_cst
	ret i8 %res			ret i8 %res
	}			}

	define i8 @test_atomic_max_i8(i8* %ptr, i8 %maxend) {			define i8 @test_atomic_max_i8(i8* %ptr, i8 %maxend) {
	; CHECK-LABEL: @test_atomic_max_i8			; CHECK-LABEL: @test_atomic_max_i8
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[WANT_OLD:%.*]] = icmp sgt i8 [[OLDVAL]], %maxend			; CHECK: [[WANT_OLD:%.*]] = icmp sgt i8 [[OLDVAL]], %maxend
	; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %maxend			; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %maxend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw max i8* %ptr, i8 %maxend seq_cst			%res = atomicrmw max i8* %ptr, i8 %maxend seq_cst
	ret i8 %res			ret i8 %res
	}			}

	define i8 @test_atomic_min_i8(i8* %ptr, i8 %minend) {			define i8 @test_atomic_min_i8(i8* %ptr, i8 %minend) {
	; CHECK-LABEL: @test_atomic_min_i8			; CHECK-LABEL: @test_atomic_min_i8
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[WANT_OLD:%.*]] = icmp sle i8 [[OLDVAL]], %minend			; CHECK: [[WANT_OLD:%.*]] = icmp sle i8 [[OLDVAL]], %minend
	; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %minend			; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %minend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw min i8* %ptr, i8 %minend seq_cst			%res = atomicrmw min i8* %ptr, i8 %minend seq_cst
	ret i8 %res			ret i8 %res
	}			}

	define i8 @test_atomic_umax_i8(i8* %ptr, i8 %umaxend) {			define i8 @test_atomic_umax_i8(i8* %ptr, i8 %umaxend) {
	; CHECK-LABEL: @test_atomic_umax_i8			; CHECK-LABEL: @test_atomic_umax_i8
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[WANT_OLD:%.*]] = icmp ugt i8 [[OLDVAL]], %umaxend			; CHECK: [[WANT_OLD:%.*]] = icmp ugt i8 [[OLDVAL]], %umaxend
	; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %umaxend			; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %umaxend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw umax i8* %ptr, i8 %umaxend seq_cst			%res = atomicrmw umax i8* %ptr, i8 %umaxend seq_cst
	ret i8 %res			ret i8 %res
	}			}

	define i8 @test_atomic_umin_i8(i8* %ptr, i8 %uminend) {			define i8 @test_atomic_umin_i8(i8* %ptr, i8 %uminend) {
	; CHECK-LABEL: @test_atomic_umin_i8			; CHECK-LABEL: @test_atomic_umin_i8
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[WANT_OLD:%.*]] = icmp ule i8 [[OLDVAL]], %uminend			; CHECK: [[WANT_OLD:%.*]] = icmp ule i8 [[OLDVAL]], %uminend
	; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %uminend			; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %uminend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw umin i8* %ptr, i8 %uminend seq_cst			%res = atomicrmw umin i8* %ptr, i8 %uminend seq_cst
	ret i8 %res			ret i8 %res
	}			}

	define i8 @test_cmpxchg_i8_seqcst_seqcst(i8* %ptr, i8 %desired, i8 %newval) {			define i8 @test_cmpxchg_i8_seqcst_seqcst(i8* %ptr, i8 %desired, i8 %newval) {
	; CHECK-LABEL: @test_cmpxchg_i8_seqcst_seqcst			; CHECK-LABEL: @test_cmpxchg_i8_seqcst_seqcst
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]

	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 %1 to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 %1 to i8
	; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i8 [[OLDVAL]], %desired			; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i8 [[OLDVAL]], %desired
	; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]			; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]

	; CHECK: [[TRY_STORE]]:			; CHECK: [[TRY_STORE]]:
	; CHECK: [[NEWVAL32:%.*]] = zext i8 %newval to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 %newval to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp eq i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp eq i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[SUCCESS_BB:.*]], label %[[LOOP]]			; CHECK: br i1 [[TST]], label %[[SUCCESS_BB:.*]], label %[[LOOP]]

	; CHECK: [[SUCCESS_BB]]:			; CHECK: [[SUCCESS_BB]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[DONE:.*]]			; CHECK: br label %[[DONE:.*]]

	; CHECK: [[FAILURE_BB]]:			; CHECK: [[FAILURE_BB]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[DONE]]			; CHECK: br label %[[DONE]]

	; CHECK: [[DONE]]:			; CHECK: [[DONE]]:
	; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]			; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]

	%pairold = cmpxchg i8* %ptr, i8 %desired, i8 %newval seq_cst seq_cst			%pairold = cmpxchg i8* %ptr, i8 %desired, i8 %newval seq_cst seq_cst
	%old = extractvalue { i8, i1 } %pairold, 0			%old = extractvalue { i8, i1 } %pairold, 0
	ret i8 %old			ret i8 %old
	}			}

	define i16 @test_cmpxchg_i16_seqcst_monotonic(i16* %ptr, i16 %desired, i16 %newval) {			define i16 @test_cmpxchg_i16_seqcst_monotonic(i16* %ptr, i16 %desired, i16 %newval) {
	; CHECK-LABEL: @test_cmpxchg_i16_seqcst_monotonic			; CHECK-LABEL: @test_cmpxchg_i16_seqcst_monotonic
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]

	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i16(i16 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i16(i16 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 %1 to i16			; CHECK: [[OLDVAL:%.*]] = trunc i32 %1 to i16
	; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i16 [[OLDVAL]], %desired			; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i16 [[OLDVAL]], %desired
	; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]			; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]

	; CHECK: [[TRY_STORE]]:			; CHECK: [[TRY_STORE]]:
	; CHECK: [[NEWVAL32:%.*]] = zext i16 %newval to i32			; CHECK: [[NEWVAL32:%.*]] = zext i16 %newval to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i16(i32 [[NEWVAL32]], i16 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i16(i32 [[NEWVAL32]], i16 %ptr)
	; CHECK: [[TST:%.*]] = icmp eq i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp eq i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[SUCCESS_BB:.*]], label %[[LOOP]]			; CHECK: br i1 [[TST]], label %[[SUCCESS_BB:.*]], label %[[LOOP]]

	; CHECK: [[SUCCESS_BB]]:			; CHECK: [[SUCCESS_BB]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[DONE:.*]]			; CHECK: br label %[[DONE:.*]]

	; CHECK: [[FAILURE_BB]]:			; CHECK: [[FAILURE_BB]]:
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: br label %[[DONE]]			; CHECK: br label %[[DONE]]

	; CHECK: [[DONE]]:			; CHECK: [[DONE]]:
	; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]			; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]
	; CHECK: ret i16 [[OLDVAL]]			; CHECK: ret i16 [[OLDVAL]]

	%pairold = cmpxchg i16* %ptr, i16 %desired, i16 %newval seq_cst monotonic			%pairold = cmpxchg i16* %ptr, i16 %desired, i16 %newval seq_cst monotonic
	%old = extractvalue { i16, i1 } %pairold, 0			%old = extractvalue { i16, i1 } %pairold, 0
	ret i16 %old			ret i16 %old
	}			}

	define i32 @test_cmpxchg_i32_acquire_acquire(i32* %ptr, i32 %desired, i32 %newval) {			define i32 @test_cmpxchg_i32_acquire_acquire(i32* %ptr, i32 %desired, i32 %newval) {
	; CHECK-LABEL: @test_cmpxchg_i32_acquire_acquire			; CHECK-LABEL: @test_cmpxchg_i32_acquire_acquire
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]

	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %ptr)			; CHECK: [[OLDVAL:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %ptr)
	; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i32 [[OLDVAL]], %desired			; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i32 [[OLDVAL]], %desired
	; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]			; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]

	; CHECK: [[TRY_STORE]]:			; CHECK: [[TRY_STORE]]:
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i32(i32 %newval, i32 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i32(i32 %newval, i32 %ptr)
	; CHECK: [[TST:%.*]] = icmp eq i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp eq i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[SUCCESS_BB:.*]], label %[[LOOP]]			; CHECK: br i1 [[TST]], label %[[SUCCESS_BB:.*]], label %[[LOOP]]

	; CHECK: [[SUCCESS_BB]]:			; CHECK: [[SUCCESS_BB]]:
	; CHECK: fence acquire			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[DONE:.*]]			; CHECK: br label %[[DONE:.*]]

	; CHECK: [[FAILURE_BB]]:			; CHECK: [[FAILURE_BB]]:
	; CHECK: fence acquire			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[DONE]]			; CHECK: br label %[[DONE]]

	; CHECK: [[DONE]]:			; CHECK: [[DONE]]:
	; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]			; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]
	; CHECK: ret i32 [[OLDVAL]]			; CHECK: ret i32 [[OLDVAL]]

	%pairold = cmpxchg i32* %ptr, i32 %desired, i32 %newval acquire acquire			%pairold = cmpxchg i32* %ptr, i32 %desired, i32 %newval acquire acquire
	%old = extractvalue { i32, i1 } %pairold, 0			%old = extractvalue { i32, i1 } %pairold, 0
	ret i32 %old			ret i32 %old
	}			}

	define i64 @test_cmpxchg_i64_monotonic_monotonic(i64* %ptr, i64 %desired, i64 %newval) {			define i64 @test_cmpxchg_i64_monotonic_monotonic(i64* %ptr, i64 %desired, i64 %newval) {
	; CHECK-LABEL: @test_cmpxchg_i64_monotonic_monotonic			; CHECK-LABEL: @test_cmpxchg_i64_monotonic_monotonic
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]

	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*			; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*
	; CHECK: [[LOHI:%.]] = call { i32, i32 } @llvm.arm.ldrexd(i8 [[PTR8]])			; CHECK: [[LOHI:%.]] = call { i32, i32 } @llvm.arm.ldrexd(i8 [[PTR8]])
	; CHECK: [[LO:%.*]] = extractvalue { i32, i32 } [[LOHI]], 0			; CHECK: [[LO:%.*]] = extractvalue { i32, i32 } [[LOHI]], 0
	; CHECK: [[HI:%.*]] = extractvalue { i32, i32 } [[LOHI]], 1			; CHECK: [[HI:%.*]] = extractvalue { i32, i32 } [[LOHI]], 1
	; CHECK: [[LO64:%.*]] = zext i32 [[LO]] to i64			; CHECK: [[LO64:%.*]] = zext i32 [[LO]] to i64
	; CHECK: [[HI64_TMP:%.*]] = zext i32 [[HI]] to i64			; CHECK: [[HI64_TMP:%.*]] = zext i32 [[HI]] to i64
	; CHECK: [[HI64:%.*]] = shl i64 [[HI64_TMP]], 32			; CHECK: [[HI64:%.*]] = shl i64 [[HI64_TMP]], 32
	; CHECK: [[OLDVAL:%.*]] = or i64 [[LO64]], [[HI64]]			; CHECK: [[OLDVAL:%.*]] = or i64 [[LO64]], [[HI64]]
	; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i64 [[OLDVAL]], %desired			; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i64 [[OLDVAL]], %desired
	; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]			; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]

	; CHECK: [[TRY_STORE]]:			; CHECK: [[TRY_STORE]]:
	; CHECK: [[NEWLO:%.*]] = trunc i64 %newval to i32			; CHECK: [[NEWLO:%.*]] = trunc i64 %newval to i32
	; CHECK: [[NEWHI_TMP:%.*]] = lshr i64 %newval, 32			; CHECK: [[NEWHI_TMP:%.*]] = lshr i64 %newval, 32
	; CHECK: [[NEWHI:%.*]] = trunc i64 [[NEWHI_TMP]] to i32			; CHECK: [[NEWHI:%.*]] = trunc i64 [[NEWHI_TMP]] to i32
	; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*			; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strexd(i32 [[NEWLO]], i32 [[NEWHI]], i8 [[PTR8]])			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strexd(i32 [[NEWLO]], i32 [[NEWHI]], i8 [[PTR8]])
	; CHECK: [[TST:%.*]] = icmp eq i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp eq i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[SUCCESS_BB:.*]], label %[[LOOP]]			; CHECK: br i1 [[TST]], label %[[SUCCESS_BB:.*]], label %[[LOOP]]

	; CHECK: [[SUCCESS_BB]]:			; CHECK: [[SUCCESS_BB]]:
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: br label %[[DONE:.*]]			; CHECK: br label %[[DONE:.*]]

	; CHECK: [[FAILURE_BB]]:			; CHECK: [[FAILURE_BB]]:
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: br label %[[DONE]]			; CHECK: br label %[[DONE]]

	; CHECK: [[DONE]]:			; CHECK: [[DONE]]:
	; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]			; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]
	; CHECK: ret i64 [[OLDVAL]]			; CHECK: ret i64 [[OLDVAL]]

	%pairold = cmpxchg i64* %ptr, i64 %desired, i64 %newval monotonic monotonic			%pairold = cmpxchg i64* %ptr, i64 %desired, i64 %newval monotonic monotonic
	%old = extractvalue { i64, i1 } %pairold, 0			%old = extractvalue { i64, i1 } %pairold, 0
	ret i64 %old			ret i64 %old
	}			}

test/Transforms/AtomicExpand/ARM/cmpxchg-weak.ll

	; RUN: opt -atomic-expand -S -mtriple=thumbv7s-apple-ios7.0 %s \| FileCheck %s			; RUN: opt -atomic-expand -S -mtriple=thumbv7s-apple-ios7.0 %s \| FileCheck %s

	define i32 @test_cmpxchg_seq_cst(i32* %addr, i32 %desired, i32 %new) {			define i32 @test_cmpxchg_seq_cst(i32* %addr, i32 %desired, i32 %new) {
	; CHECK-LABEL: @test_cmpxchg_seq_cst			; CHECK-LABEL: @test_cmpxchg_seq_cst
	; CHECK: fence release			; Intrinsic for "dmb ishst" is then expected
				; CHECK: call void @llvm.arm.dmb(i32 10)
	; CHECK: br label %[[START:.*]]			; CHECK: br label %[[START:.*]]

	; CHECK: [[START]]:			; CHECK: [[START]]:
	; CHECK: [[LOADED:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %addr)			; CHECK: [[LOADED:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %addr)
	; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i32 [[LOADED]], %desired			; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i32 [[LOADED]], %desired
	; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]			; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]

	; CHECK: [[TRY_STORE]]:			; CHECK: [[TRY_STORE]]:
	; CHECK: [[STREX:%.]] = call i32 @llvm.arm.strex.p0i32(i32 %new, i32 %addr)			; CHECK: [[STREX:%.]] = call i32 @llvm.arm.strex.p0i32(i32 %new, i32 %addr)
	; CHECK: [[SUCCESS:%.*]] = icmp eq i32 [[STREX]], 0			; CHECK: [[SUCCESS:%.*]] = icmp eq i32 [[STREX]], 0
	; CHECK: br i1 [[SUCCESS]], label %[[SUCCESS_BB:.*]], label %[[FAILURE_BB]]			; CHECK: br i1 [[SUCCESS]], label %[[SUCCESS_BB:.*]], label %[[FAILURE_BB]]

	; CHECK: [[SUCCESS_BB]]:			; CHECK: [[SUCCESS_BB]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[END:.*]]			; CHECK: br label %[[END:.*]]

	; CHECK: [[FAILURE_BB]]:			; CHECK: [[FAILURE_BB]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[END]]			; CHECK: br label %[[END]]

	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]			; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]
	; CHECK: ret i32 [[LOADED]]			; CHECK: ret i32 [[LOADED]]

	%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new seq_cst seq_cst			%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new seq_cst seq_cst
	%oldval = extractvalue { i32, i1 } %pair, 0			%oldval = extractvalue { i32, i1 } %pair, 0
	ret i32 %oldval			ret i32 %oldval
	}			}

	define i1 @test_cmpxchg_weak_fail(i32* %addr, i32 %desired, i32 %new) {			define i1 @test_cmpxchg_weak_fail(i32* %addr, i32 %desired, i32 %new) {
	; CHECK-LABEL: @test_cmpxchg_weak_fail			; CHECK-LABEL: @test_cmpxchg_weak_fail
	; CHECK: fence release			; CHECK: call void @llvm.arm.dmb(i32 10)
	; CHECK: br label %[[START:.*]]			; CHECK: br label %[[START:.*]]

	; CHECK: [[START]]:			; CHECK: [[START]]:
	; CHECK: [[LOADED:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %addr)			; CHECK: [[LOADED:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %addr)
	; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i32 [[LOADED]], %desired			; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i32 [[LOADED]], %desired
	; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]			; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]

	; CHECK: [[TRY_STORE]]:			; CHECK: [[TRY_STORE]]:
	; CHECK: [[STREX:%.]] = call i32 @llvm.arm.strex.p0i32(i32 %new, i32 %addr)			; CHECK: [[STREX:%.]] = call i32 @llvm.arm.strex.p0i32(i32 %new, i32 %addr)
	; CHECK: [[SUCCESS:%.*]] = icmp eq i32 [[STREX]], 0			; CHECK: [[SUCCESS:%.*]] = icmp eq i32 [[STREX]], 0
	; CHECK: br i1 [[SUCCESS]], label %[[SUCCESS_BB:.]], label %[[FAILURE_BB:.]]			; CHECK: br i1 [[SUCCESS]], label %[[SUCCESS_BB:.]], label %[[FAILURE_BB:.]]

	; CHECK: [[SUCCESS_BB]]:			; CHECK: [[SUCCESS_BB]]:
	; CHECK: fence seq_cst			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[END:.*]]			; CHECK: br label %[[END:.*]]

	; CHECK: [[FAILURE_BB]]:			; CHECK: [[FAILURE_BB]]:
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: br label %[[END]]			; CHECK: br label %[[END]]

	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]			; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]
	; CHECK: ret i1 [[SUCCESS]]			; CHECK: ret i1 [[SUCCESS]]

	%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new seq_cst monotonic			%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new seq_cst monotonic
	%oldval = extractvalue { i32, i1 } %pair, 1			%oldval = extractvalue { i32, i1 } %pair, 1
	ret i1 %oldval			ret i1 %oldval
	}			}

	define i32 @test_cmpxchg_monotonic(i32* %addr, i32 %desired, i32 %new) {			define i32 @test_cmpxchg_monotonic(i32* %addr, i32 %desired, i32 %new) {
	; CHECK-LABEL: @test_cmpxchg_monotonic			; CHECK-LABEL: @test_cmpxchg_monotonic
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: br label %[[START:.*]]			; CHECK: br label %[[START:.*]]

	; CHECK: [[START]]:			; CHECK: [[START]]:
	; CHECK: [[LOADED:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %addr)			; CHECK: [[LOADED:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %addr)
	; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i32 [[LOADED]], %desired			; CHECK: [[SHOULD_STORE:%.*]] = icmp eq i32 [[LOADED]], %desired
	; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]			; CHECK: br i1 [[SHOULD_STORE]], label %[[TRY_STORE:.]], label %[[FAILURE_BB:.]]

	; CHECK: [[TRY_STORE]]:			; CHECK: [[TRY_STORE]]:
	; CHECK: [[STREX:%.]] = call i32 @llvm.arm.strex.p0i32(i32 %new, i32 %addr)			; CHECK: [[STREX:%.]] = call i32 @llvm.arm.strex.p0i32(i32 %new, i32 %addr)
	; CHECK: [[SUCCESS:%.*]] = icmp eq i32 [[STREX]], 0			; CHECK: [[SUCCESS:%.*]] = icmp eq i32 [[STREX]], 0
	; CHECK: br i1 [[SUCCESS]], label %[[SUCCESS_BB:.]], label %[[FAILURE_BB:.]]			; CHECK: br i1 [[SUCCESS]], label %[[SUCCESS_BB:.]], label %[[FAILURE_BB:.]]

	; CHECK: [[SUCCESS_BB]]:			; CHECK: [[SUCCESS_BB]]:
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: br label %[[END:.*]]			; CHECK: br label %[[END:.*]]

	; CHECK: [[FAILURE_BB]]:			; CHECK: [[FAILURE_BB]]:
	; CHECK-NOT: fence			; CHECK-NOT: dmb
	; CHECK: br label %[[END]]			; CHECK: br label %[[END]]

	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]			; CHECK: [[SUCCESS:%.*]] = phi i1 [ true, %[[SUCCESS_BB]] ], [ false, %[[FAILURE_BB]] ]
	; CHECK: ret i32 [[LOADED]]			; CHECK: ret i32 [[LOADED]]

	%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new monotonic monotonic			%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new monotonic monotonic
	%oldval = extractvalue { i32, i1 } %pair, 0			%oldval = extractvalue { i32, i1 } %pair, 0
	ret i32 %oldval			ret i32 %oldval
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

Implement emitLeading/TrailingFence in the ARM backendClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 12869

lib/CodeGen/AtomicExpandPass.cpp

lib/Target/ARM/ARMISelLowering.h

lib/Target/ARM/ARMISelLowering.cpp

test/Transforms/AtomicExpand/ARM/atomic-expansion-v7.ll

test/Transforms/AtomicExpand/ARM/cmpxchg-weak.ll

Implement emitLeading/TrailingFence in the ARM backend
ClosedPublic