This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/
-
CodeGen/
-
AtomicExpandPass.cpp
-
Target/X86/
-
X86/
-
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
atomic-load-store-wide.ll

Differential D5404

[X86] Make wide loads be managed by AtomicExpand
ClosedPublic

Authored by morisset on Sep 18 2014, 3:52 PM.

Download Raw Diff

Details

Reviewers

jfb

Commits

rG6dbbbc28b0fd: [X86] Make wide loads be managed by AtomicExpand
rL218332: [X86] Make wide loads be managed by AtomicExpand

Summary

AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.

Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).

Diff Detail

Repository: rL LLVM

Event Timeline

morisset updated this revision to Diff 13854.Sep 18 2014, 3:52 PM

morisset retitled this revision from to [X86] Make wide loads be managed by AtomicExpand.

morisset updated this object.

morisset edited the test plan for this revision. (Show Details)

morisset added a reviewer: jfb.

morisset added a subscriber: Unknown Object (MLST).

The tests all stay the same after this change? Were they actually sufficient? In particular, I want to make sure that the lock prefix remains where it should be.

lib/Target/X86/X86ISelLowering.cpp
17360 ↗	(On Diff #13854)	These FIXMEs should probably be kept. How would they get fixed with this new infrastructure? `shouldExpandAtomicLoadInIR` would change to take that into account, and `ReplaceATOMIC_LOAD` would do the work?

The tests for cmpxchg16b were indeed testing for the lock prefix, but not the tests for the 8b version. I will upload a new patch with an improved test (and preserved FIXME).

lib/Target/X86/X86ISelLowering.cpp
17360 ↗	(On Diff #13854)	The second comment looks obsolete to me: 16-bytes Cmpxchg definitely works on x86 in recent versions of LLVM, and there are lots of tests of it. The first comment may indeed more interesting to preserve, I will keep it in this file (in shouldExpandAtomicLoad). If someone wants to try it someday, they will have to make shouldExpandAtomicLoad return false for this case, and do the lowering themselves (as it is a completely target-dependent trick).

Preserve FIXME comment, and refine tests.

jfb added inline comments.Sep 23 2014, 10:17 AM

test/CodeGen/X86/atomic-load-store-wide.ll
1 ↗	(On Diff #14000)	Is there also a cmpxchg8b test for x86-64?
9 ↗	(On Diff #14000)	Use `CHECK-LABEL` and `CHECK-NEXT`.

morisset added inline comments.Sep 23 2014, 10:24 AM

test/CodeGen/X86/atomic-load-store-wide.ll
1 ↗	(On Diff #14000)	I do not understand this point, 64 bits accesses are naturally atomic on x86-64 (provided some alignment constraints are respected). So there is no need for cmpxchg8b, and I do not know a case where they can be emitted by LLVM on x86-64.
9 ↗	(On Diff #14000)	Thanks, fixed.

Use CHECK-LABEL and CHECK-NEXT

jfb accepted this revision.Sep 23 2014, 10:34 AM

jfb edited edge metadata.

jfb added inline comments.

test/CodeGen/X86/atomic-load-store-wide.ll
1 ↗	(On Diff #14000)	Oh right! Ignore :)

This revision is now accepted and ready to land.Sep 23 2014, 10:34 AM

Closed by commit rL218332 (authored by @morisset).

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

AtomicExpandPass.cpp

29 lines

Target/

X86/

X86ISelLowering.cpp

34 lines

test/

CodeGen/

X86/

atomic-load-store-wide.ll

10 lines

Diff 14020

llvm/trunk/lib/CodeGen/AtomicExpandPass.cpp

Show All 38 Lines	public:
}		}

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

private:		private:
bool bracketInstWithFences(Instruction *I, AtomicOrdering Order,		bool bracketInstWithFences(Instruction *I, AtomicOrdering Order,
bool IsStore, bool IsLoad);		bool IsStore, bool IsLoad);
bool expandAtomicLoad(LoadInst *LI);		bool expandAtomicLoad(LoadInst *LI);
		bool expandAtomicLoadToLL(LoadInst *LI);
		bool expandAtomicLoadToCmpXchg(LoadInst *LI);
bool expandAtomicStore(StoreInst *SI);		bool expandAtomicStore(StoreInst *SI);
bool expandAtomicRMW(AtomicRMWInst *AI);		bool expandAtomicRMW(AtomicRMWInst *AI);
bool expandAtomicRMWToLLSC(AtomicRMWInst *AI);		bool expandAtomicRMWToLLSC(AtomicRMWInst *AI);
bool expandAtomicRMWToCmpXchg(AtomicRMWInst *AI);		bool expandAtomicRMWToCmpXchg(AtomicRMWInst *AI);
bool expandAtomicCmpXchg(AtomicCmpXchgInst *CI);		bool expandAtomicCmpXchg(AtomicCmpXchgInst *CI);
};		};
}		}

▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	if (TrailingFence) {
TrailingFence->removeFromParent();		TrailingFence->removeFromParent();
TrailingFence->insertAfter(I);		TrailingFence->insertAfter(I);
}		}

return (LeadingFence \|\| TrailingFence);		return (LeadingFence \|\| TrailingFence);
}		}

bool AtomicExpand::expandAtomicLoad(LoadInst *LI) {		bool AtomicExpand::expandAtomicLoad(LoadInst *LI) {
		if (TM->getSubtargetImpl()
		->getTargetLowering()
		->hasLoadLinkedStoreConditional())
		return expandAtomicLoadToLL(LI);
		else
		return expandAtomicLoadToCmpXchg(LI);
		}

		bool AtomicExpand::expandAtomicLoadToLL(LoadInst *LI) {
auto TLI = TM->getSubtargetImpl()->getTargetLowering();		auto TLI = TM->getSubtargetImpl()->getTargetLowering();
IRBuilder<> Builder(LI);		IRBuilder<> Builder(LI);

// On some architectures, load-linked instructions are atomic for larger		// On some architectures, load-linked instructions are atomic for larger
// sizes than normal loads. For example, the only 64-bit load guaranteed		// sizes than normal loads. For example, the only 64-bit load guaranteed
// to be single-copy atomic by ARM is an ldrexd (A3.5.3).		// to be single-copy atomic by ARM is an ldrexd (A3.5.3).
Value *Val =		Value *Val =
TLI->emitLoadLinked(Builder, LI->getPointerOperand(), LI->getOrdering());		TLI->emitLoadLinked(Builder, LI->getPointerOperand(), LI->getOrdering());

LI->replaceAllUsesWith(Val);		LI->replaceAllUsesWith(Val);
LI->eraseFromParent();		LI->eraseFromParent();

return true;		return true;
}		}

		bool AtomicExpand::expandAtomicLoadToCmpXchg(LoadInst *LI) {
		IRBuilder<> Builder(LI);
		AtomicOrdering Order = LI->getOrdering();
		Value *Addr = LI->getPointerOperand();
		Type *Ty = cast<PointerType>(Addr->getType())->getElementType();
		Constant *DummyVal = Constant::getNullValue(Ty);

		Value *Pair = Builder.CreateAtomicCmpXchg(
		Addr, DummyVal, DummyVal, Order,
		AtomicCmpXchgInst::getStrongestFailureOrdering(Order));
		Value *Loaded = Builder.CreateExtractValue(Pair, 0, "loaded");

		LI->replaceAllUsesWith(Loaded);
		LI->eraseFromParent();

		return true;
		}

bool AtomicExpand::expandAtomicStore(StoreInst *SI) {		bool AtomicExpand::expandAtomicStore(StoreInst *SI) {
// This function is only called on atomic stores that are too large to be		// This function is only called on atomic stores that are too large to be
// atomic if implemented as a native store. So we replace them by an		// atomic if implemented as a native store. So we replace them by an
// atomic swap, that can be implemented for example as a ldrex/strex on ARM		// atomic swap, that can be implemented for example as a ldrex/strex on ARM
// or lock cmpxchg8/16b on X86, as these are atomic for larger sizes.		// or lock cmpxchg8/16b on X86, as these are atomic for larger sizes.
// It is the responsibility of the target to only return true in		// It is the responsibility of the target to only return true in
// shouldExpandAtomicRMW in cases where this is required and possible.		// shouldExpandAtomicRMW in cases where this is required and possible.
IRBuilder<> Builder(SI);		IRBuilder<> Builder(SI);
▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,471 Lines • ▼ Show 20 Lines	bool X86TargetLowering::needsCmpXchgNb(const Type *MemType) const {
else		else
return false;		return false;
}		}

bool X86TargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {		bool X86TargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {
return needsCmpXchgNb(SI->getValueOperand()->getType());		return needsCmpXchgNb(SI->getValueOperand()->getType());
}		}

bool X86TargetLowering::shouldExpandAtomicLoadInIR(LoadInst *SI) const {		// Note: this turns large loads into lock cmpxchg8b/16b.
return false; // FIXME, currently these are expanded separately in this file.		// FIXME: On 32 bits x86, fild/movq might be faster than lock cmpxchg8b.
		bool X86TargetLowering::shouldExpandAtomicLoadInIR(LoadInst *LI) const {
		auto PTy = cast<PointerType>(LI->getPointerOperand()->getType());
		return needsCmpXchgNb(PTy->getElementType());
}		}

bool X86TargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {		bool X86TargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
const X86Subtarget &Subtarget =		const X86Subtarget &Subtarget =
getTargetMachine().getSubtarget<X86Subtarget>();		getTargetMachine().getSubtarget<X86Subtarget>();
unsigned NativeWidth = Subtarget.is64Bit() ? 64 : 32;		unsigned NativeWidth = Subtarget.is64Bit() ? 64 : 32;
const Type *MemType = AI->getType();		const Type *MemType = AI->getType();

▲ Show 20 Lines • Show All 360 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::SUBC:		case ISD::SUBC:
case ISD::SUBE: return LowerADDC_ADDE_SUBC_SUBE(Op, DAG);		case ISD::SUBE: return LowerADDC_ADDE_SUBC_SUBE(Op, DAG);
case ISD::ADD: return LowerADD(Op, DAG);		case ISD::ADD: return LowerADD(Op, DAG);
case ISD::SUB: return LowerSUB(Op, DAG);		case ISD::SUB: return LowerSUB(Op, DAG);
case ISD::FSINCOS: return LowerFSINCOS(Op, Subtarget, DAG);		case ISD::FSINCOS: return LowerFSINCOS(Op, Subtarget, DAG);
}		}
}		}

static void ReplaceATOMIC_LOAD(SDNode *Node,
SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) {
SDLoc dl(Node);
EVT VT = cast<AtomicSDNode>(Node)->getMemoryVT();

// Convert wide load -> cmpxchg8b/cmpxchg16b
// FIXME: On 32-bit, load -> fild or movq would be more efficient
// (The only way to get a 16-byte load is cmpxchg16b)
// FIXME: 16-byte ATOMIC_CMP_SWAP isn't actually hooked up at the moment.
SDValue Zero = DAG.getConstant(0, VT);
SDVTList VTs = DAG.getVTList(VT, MVT::i1, MVT::Other);
SDValue Swap =
DAG.getAtomicCmpSwap(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, dl, VT, VTs,
Node->getOperand(0), Node->getOperand(1), Zero, Zero,
cast<AtomicSDNode>(Node)->getMemOperand(),
cast<AtomicSDNode>(Node)->getOrdering(),
cast<AtomicSDNode>(Node)->getOrdering(),
cast<AtomicSDNode>(Node)->getSynchScope());
Results.push_back(Swap.getValue(0));
Results.push_back(Swap.getValue(2));
}

/// ReplaceNodeResults - Replace a node with an illegal result type		/// ReplaceNodeResults - Replace a node with an illegal result type
/// with a new node built out of custom code.		/// with a new node built out of custom code.
void X86TargetLowering::ReplaceNodeResults(SDNode *N,		void X86TargetLowering::ReplaceNodeResults(SDNode *N,
SmallVectorImpl<SDValue>&Results,		SmallVectorImpl<SDValue>&Results,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc dl(N);		SDLoc dl(N);
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
switch (N->getOpcode()) {		switch (N->getOpcode()) {
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	void X86TargetLowering::ReplaceNodeResults(SDNode *N,
case ISD::ATOMIC_LOAD_AND:		case ISD::ATOMIC_LOAD_AND:
case ISD::ATOMIC_LOAD_OR:		case ISD::ATOMIC_LOAD_OR:
case ISD::ATOMIC_LOAD_XOR:		case ISD::ATOMIC_LOAD_XOR:
case ISD::ATOMIC_LOAD_NAND:		case ISD::ATOMIC_LOAD_NAND:
case ISD::ATOMIC_LOAD_MIN:		case ISD::ATOMIC_LOAD_MIN:
case ISD::ATOMIC_LOAD_MAX:		case ISD::ATOMIC_LOAD_MAX:
case ISD::ATOMIC_LOAD_UMIN:		case ISD::ATOMIC_LOAD_UMIN:
case ISD::ATOMIC_LOAD_UMAX:		case ISD::ATOMIC_LOAD_UMAX:
		case ISD::ATOMIC_LOAD: {
// Delegate to generic TypeLegalization. Situations we can really handle		// Delegate to generic TypeLegalization. Situations we can really handle
// should have already been dealt with by AtomicExpandPass.cpp.		// should have already been dealt with by AtomicExpandPass.cpp.
break;		break;
case ISD::ATOMIC_LOAD: {
ReplaceATOMIC_LOAD(N, Results, DAG);
return;
}		}
case ISD::BITCAST: {		case ISD::BITCAST: {
assert(Subtarget->hasSSE2() && "Requires at least SSE2!");		assert(Subtarget->hasSSE2() && "Requires at least SSE2!");
EVT DstVT = N->getValueType(0);		EVT DstVT = N->getValueType(0);
EVT SrcVT = N->getOperand(0)->getValueType(0);		EVT SrcVT = N->getOperand(0)->getValueType(0);

if (SrcVT != MVT::f64 \|\|		if (SrcVT != MVT::f64 \|\|
(DstVT != MVT::v2i32 && DstVT != MVT::v4i16 && DstVT != MVT::v8i8))		(DstVT != MVT::v2i32 && DstVT != MVT::v4i16 && DstVT != MVT::v8i8))
▲ Show 20 Lines • Show All 6,653 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/atomic-load-store-wide.ll

	; RUN: llc < %s -mcpu=corei7 -march=x86 -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -mcpu=corei7 -march=x86 -verify-machineinstrs \| FileCheck %s

	; 64-bit load/store on x86-32			; 64-bit load/store on x86-32
	; FIXME: The generated code can be substantially improved.			; FIXME: The generated code can be substantially improved.

	define void @test1(i64* %ptr, i64 %val1) {			define void @test1(i64* %ptr, i64 %val1) {
	; CHECK: test1			; CHECK-LABEL: test1
	; CHECK: cmpxchg8b			; CHECK: lock
				; CHECK-NEXT: cmpxchg8b
	; CHECK-NEXT: jne			; CHECK-NEXT: jne
	store atomic i64 %val1, i64* %ptr seq_cst, align 8			store atomic i64 %val1, i64* %ptr seq_cst, align 8
	ret void			ret void
	}			}

	define i64 @test2(i64* %ptr) {			define i64 @test2(i64* %ptr) {
	; CHECK: test2			; CHECK-LABEL: test2
	; CHECK: cmpxchg8b			; CHECK: lock
				; CHECK-NEXT: cmpxchg8b
	%val = load atomic i64* %ptr seq_cst, align 8			%val = load atomic i64* %ptr seq_cst, align 8
	ret i64 %val			ret i64 %val
	}			}