This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
2/2
AArch64ISelLowering.cpp
-
AArch64InstrInfo.td
-
GISel/
1/1
AArch64LegalizerInfo.cpp
-
test/CodeGen/AArch64/Atomics/
-
CodeGen/
-
AArch64/
-
Atomics/
-
aarch64-atomic-load-rcpc3.ll
-
aarch64-atomic-store-rcpc3.ll
5
aarch64_be-atomic-load-rcpc3.ll
-
aarch64_be-atomic-store-rcpc3.ll

Differential D141429

[AArch64] Codegen for FEAT_LRCPC3
ClosedPublic

Authored by tmatheson on Jan 10 2023, 2:00 PM.

Download Raw Diff

Details

Reviewers

lenary
pratlucas
stuij

Commits

rG7c84f94eb9f9: [AArch64] Codegen for FEAT_LRCPC3

Summary

Implements support for the following 128-bit atomic operations with +rcpc3:

store-release
store-release volatile
load-acquire
load-acquire volatile
load-acquire const
load-acquire const volatile

D126250 and D137590 added support for emitting LDAPR (Load-Acquire RCPc) rather
than LDAP (Load-Acquire) when +rcpc is available. This patch allows emitting
the 128-bit RCPc instructions added in FEAT_LRCPC3 LDIAPP/STILP. The
implementation is different from LDAPR, because there are no non-RCPc
equivalents for these new instructions.

Support for the offset variants will be added in a later patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,210 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases/Posix::stack-overflow.cpp
	60,110 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases/Posix::stack-use-after-return.cpp

Event Timeline

tmatheson created this revision.Jan 10 2023, 2:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 2:00 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

tmatheson requested review of this revision.Jan 10 2023, 2:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 2:00 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B206928: Diff 487985.Jan 10 2023, 7:47 PM

You need FEAT_LSE2 if you're going to generate STILP/LDIAPP from a 128-bit access, in order to get single-copy atomicity.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5776	Outdated?
22233	Yes that sounds reasonable
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
1218

Require LSE2

Harbormaster completed remote builds in B209664: Diff 491788.Jan 24 2023, 10:51 AM

lenary accepted this revision.Jan 25 2023, 3:55 AM

This revision is now accepted and ready to land.Jan 25 2023, 3:55 AM

This revision was landed with ongoing or failed builds.Jan 25 2023, 4:28 AM

Closed by commit rG7c84f94eb9f9: [AArch64] Codegen for FEAT_LRCPC3 (authored by tmatheson). · Explain Why

This revision was automatically updated to reflect the committed changes.

tmatheson added a commit: rG7c84f94eb9f9: [AArch64] Codegen for FEAT_LRCPC3.

efriedma added a subscriber: efriedma.Jan 26 2023, 1:50 PM

efriedma added inline comments.

llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll
331	Sort of orthogonal to this change, but can someone at ARM verify if this is the sequence we actually want for sequentially consistent loads with lse2, as opposed to using caspal? (I'm a bit concerned given the issues we ran into with narrower widths on Windows; see D141748.)

efriedma added inline comments.Feb 7 2023, 1:02 PM

llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll
331	Any update here?

tmatheson added inline comments.Feb 8 2023, 2:01 AM

llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll
331	Not yet but I haven't forgotten about it.

tmatheson added a subscriber: LukeGeeson.Mar 17 2023, 6:33 AM

tmatheson added inline comments.

llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll
331	Sorry this has taken a while, I've had to work on other things. The first thing to note is that a CASP implementation is slower, so as long as both are correct we should use the `ldp+dmb`. @LukeGeeson has been working on adding CASP to herd7 so that we can compare `ldp+dmb` with `casp`. Using his Telechat tool, which can compare the C++ semantics with the machine code semantics and identify new behaviours introduced by the machine code, we aren't seeing any differences so far, although admittedly the number of cases we've checked is low because each CASP case has to be manually written. We are still looking for at a way to run more extensive testing automatically, but if you have any concerns about specific scenarios we can check them.

efriedma added inline comments.Mar 20 2023, 12:57 PM

llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll
331	I guess my biggest concern here is that we're basically committing to never using sequentially consistent (non-RCPc) load/store instructions for i128, even if the instruction set adds them in the future. If you're fine with that, then I guess I don't have any other specific concerns?

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

3 lines

AArch64ISelLowering.cpp

59 lines

AArch64InstrInfo.td

9 lines

GISel/

AArch64LegalizerInfo.cpp

55 lines

test/

CodeGen/

AArch64/

Atomics/

aarch64-atomic-load-rcpc3.ll

24 lines

aarch64-atomic-store-rcpc3.ll

16 lines

aarch64_be-atomic-load-rcpc3.ll

24 lines

aarch64_be-atomic-store-rcpc3.ll

14 lines

Diff 487985

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 469 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
ST4LANEpost,		ST4LANEpost,

STG,		STG,
STZG,		STZG,
ST2G,		ST2G,
STZ2G,		STZ2G,

LDP,		LDP,
		LDIAPP,
LDNP,		LDNP,
STP,		STP,
		STILP,
STNP,		STNP,

// Memory Operations		// Memory Operations
MOPS_MEMSET,		MOPS_MEMSET,
MOPS_MEMSET_TAGGING,		MOPS_MEMSET_TAGGING,
MOPS_MEMCOPY,		MOPS_MEMCOPY,
MOPS_MEMMOVE,		MOPS_MEMMOVE,
};		};
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	public:
Value emitLoadLinked(IRBuilderBase &Builder, Type ValueTy, Value *Addr,		Value emitLoadLinked(IRBuilderBase &Builder, Type ValueTy, Value *Addr,
AtomicOrdering Ord) const override;		AtomicOrdering Ord) const override;
Value emitStoreConditional(IRBuilderBase &Builder, Value Val, Value *Addr,		Value emitStoreConditional(IRBuilderBase &Builder, Value Val, Value *Addr,
AtomicOrdering Ord) const override;		AtomicOrdering Ord) const override;

void emitAtomicCmpXchgNoStoreLLBalance(IRBuilderBase &Builder) const override;		void emitAtomicCmpXchgNoStoreLLBalance(IRBuilderBase &Builder) const override;

bool isOpSuitableForLDPSTP(const Instruction *I) const;		bool isOpSuitableForLDPSTP(const Instruction *I) const;
		bool isOpSuitableForRCPC3(const Instruction *I) const;
bool shouldInsertFencesForAtomic(const Instruction *I) const override;		bool shouldInsertFencesForAtomic(const Instruction *I) const override;

TargetLoweringBase::AtomicExpansionKind		TargetLoweringBase::AtomicExpansionKind
shouldExpandAtomicLoadInIR(LoadInst *LI) const override;		shouldExpandAtomicLoadInIR(LoadInst *LI) const override;
TargetLoweringBase::AtomicExpansionKind		TargetLoweringBase::AtomicExpansionKind
shouldExpandAtomicStoreInIR(StoreInst *SI) const override;		shouldExpandAtomicStoreInIR(StoreInst *SI) const override;
TargetLoweringBase::AtomicExpansionKind		TargetLoweringBase::AtomicExpansionKind
shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const override;		shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const override;
▲ Show 20 Lines • Show All 505 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 799 Lines • ▼ Show 20 Lines
#undef LCALLNAME5		#undef LCALLNAME5
}		}

// 128-bit loads and stores can be done without expanding		// 128-bit loads and stores can be done without expanding
setOperationAction(ISD::LOAD, MVT::i128, Custom);		setOperationAction(ISD::LOAD, MVT::i128, Custom);
setOperationAction(ISD::STORE, MVT::i128, Custom);		setOperationAction(ISD::STORE, MVT::i128, Custom);

// Aligned 128-bit loads and stores are single-copy atomic according to the		// Aligned 128-bit loads and stores are single-copy atomic according to the
// v8.4a spec.		// v8.4a spec. FEAT_LRCPC3 introduces 128-bit STILP/LDIAPP.
if (Subtarget->hasLSE2()) {		if (Subtarget->hasLSE2() \|\| Subtarget->hasRCPC3()) {
setOperationAction(ISD::ATOMIC_LOAD, MVT::i128, Custom);		setOperationAction(ISD::ATOMIC_LOAD, MVT::i128, Custom);
setOperationAction(ISD::ATOMIC_STORE, MVT::i128, Custom);		setOperationAction(ISD::ATOMIC_STORE, MVT::i128, Custom);
}		}

// 256 bit non-temporal stores can be lowered to STNP. Do this as part of the		// 256 bit non-temporal stores can be lowered to STNP. Do this as part of the
// custom lowering, as there are no un-paired non-temporal stores and		// custom lowering, as there are no un-paired non-temporal stores and
// legalization will break up 256 bit inputs.		// legalization will break up 256 bit inputs.
setOperationAction(ISD::STORE, MVT::v32i8, Custom);		setOperationAction(ISD::STORE, MVT::v32i8, Custom);
▲ Show 20 Lines • Show All 1,720 Lines • ▼ Show 20 Lines	case AArch64ISD::FIRST_NUMBER:
MAKE_CASE(AArch64ISD::SST1_SXTW_PRED)		MAKE_CASE(AArch64ISD::SST1_SXTW_PRED)
MAKE_CASE(AArch64ISD::SST1_UXTW_PRED)		MAKE_CASE(AArch64ISD::SST1_UXTW_PRED)
MAKE_CASE(AArch64ISD::SST1_SXTW_SCALED_PRED)		MAKE_CASE(AArch64ISD::SST1_SXTW_SCALED_PRED)
MAKE_CASE(AArch64ISD::SST1_UXTW_SCALED_PRED)		MAKE_CASE(AArch64ISD::SST1_UXTW_SCALED_PRED)
MAKE_CASE(AArch64ISD::SST1_IMM_PRED)		MAKE_CASE(AArch64ISD::SST1_IMM_PRED)
MAKE_CASE(AArch64ISD::SSTNT1_PRED)		MAKE_CASE(AArch64ISD::SSTNT1_PRED)
MAKE_CASE(AArch64ISD::SSTNT1_INDEX_PRED)		MAKE_CASE(AArch64ISD::SSTNT1_INDEX_PRED)
MAKE_CASE(AArch64ISD::LDP)		MAKE_CASE(AArch64ISD::LDP)
		MAKE_CASE(AArch64ISD::LDIAPP)
MAKE_CASE(AArch64ISD::LDNP)		MAKE_CASE(AArch64ISD::LDNP)
MAKE_CASE(AArch64ISD::STP)		MAKE_CASE(AArch64ISD::STP)
		MAKE_CASE(AArch64ISD::STILP)
MAKE_CASE(AArch64ISD::STNP)		MAKE_CASE(AArch64ISD::STNP)
MAKE_CASE(AArch64ISD::BITREVERSE_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::BITREVERSE_MERGE_PASSTHRU)
MAKE_CASE(AArch64ISD::BSWAP_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::BSWAP_MERGE_PASSTHRU)
MAKE_CASE(AArch64ISD::REVH_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::REVH_MERGE_PASSTHRU)
MAKE_CASE(AArch64ISD::REVW_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::REVW_MERGE_PASSTHRU)
MAKE_CASE(AArch64ISD::REVD_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::REVD_MERGE_PASSTHRU)
MAKE_CASE(AArch64ISD::CTLZ_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::CTLZ_MERGE_PASSTHRU)
MAKE_CASE(AArch64ISD::CTPOP_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::CTPOP_MERGE_PASSTHRU)
▲ Show 20 Lines • Show All 3,202 Lines • ▼ Show 20 Lines
}		}

/// Lower atomic or volatile 128-bit stores to a single STP instruction.		/// Lower atomic or volatile 128-bit stores to a single STP instruction.
SDValue AArch64TargetLowering::LowerStore128(SDValue Op,		SDValue AArch64TargetLowering::LowerStore128(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
MemSDNode *StoreNode = cast<MemSDNode>(Op);		MemSDNode *StoreNode = cast<MemSDNode>(Op);
assert(StoreNode->getMemoryVT() == MVT::i128);		assert(StoreNode->getMemoryVT() == MVT::i128);
assert(StoreNode->isVolatile() \|\| StoreNode->isAtomic());		assert(StoreNode->isVolatile() \|\| StoreNode->isAtomic());
assert(!StoreNode->isAtomic() \|\|
		bool isStoreRelease =
		StoreNode->getMergedOrdering() == AtomicOrdering::Release;
		if (StoreNode->isAtomic())
		assert((Subtarget->hasFeature(AArch64::FeatureRCPC3) && isStoreRelease) \|\|
StoreNode->getMergedOrdering() == AtomicOrdering::Unordered \|\|		StoreNode->getMergedOrdering() == AtomicOrdering::Unordered \|\|
StoreNode->getMergedOrdering() == AtomicOrdering::Monotonic);		StoreNode->getMergedOrdering() == AtomicOrdering::Monotonic);

		// FIXME RCPC3 not actually implemented, will lower to STP
		lenaryUnsubmitted Done Reply Inline Actions Outdated? lenary: Outdated?

SDValue Value = StoreNode->getOpcode() == ISD::STORE		SDValue Value = StoreNode->getOpcode() == ISD::STORE
? StoreNode->getOperand(1)		? StoreNode->getOperand(1)
: StoreNode->getOperand(2);		: StoreNode->getOperand(2);
SDLoc DL(Op);		SDLoc DL(Op);
SDValue Lo = DAG.getNode(ISD::EXTRACT_ELEMENT, DL, MVT::i64, Value,		SDValue Lo = DAG.getNode(ISD::EXTRACT_ELEMENT, DL, MVT::i64, Value,
DAG.getConstant(0, DL, MVT::i64));		DAG.getConstant(0, DL, MVT::i64));
SDValue Hi = DAG.getNode(ISD::EXTRACT_ELEMENT, DL, MVT::i64, Value,		SDValue Hi = DAG.getNode(ISD::EXTRACT_ELEMENT, DL, MVT::i64, Value,
DAG.getConstant(1, DL, MVT::i64));		DAG.getConstant(1, DL, MVT::i64));

		unsigned Opcode = isStoreRelease ? AArch64ISD::STILP : AArch64ISD::STP;
SDValue Result = DAG.getMemIntrinsicNode(		SDValue Result = DAG.getMemIntrinsicNode(
AArch64ISD::STP, DL, DAG.getVTList(MVT::Other),		Opcode, DL, DAG.getVTList(MVT::Other),
{StoreNode->getChain(), Lo, Hi, StoreNode->getBasePtr()},		{StoreNode->getChain(), Lo, Hi, StoreNode->getBasePtr()},
StoreNode->getMemoryVT(), StoreNode->getMemOperand());		StoreNode->getMemoryVT(), StoreNode->getMemOperand());
return Result;		return Result;
}		}

SDValue AArch64TargetLowering::LowerLOAD(SDValue Op,		SDValue AArch64TargetLowering::LowerLOAD(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
▲ Show 20 Lines • Show All 256 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
return LowerINTRINSIC_W_CHAIN(Op, DAG);		return LowerINTRINSIC_W_CHAIN(Op, DAG);
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return LowerINTRINSIC_WO_CHAIN(Op, DAG);		return LowerINTRINSIC_WO_CHAIN(Op, DAG);
case ISD::INTRINSIC_VOID:		case ISD::INTRINSIC_VOID:
return LowerINTRINSIC_VOID(Op, DAG);		return LowerINTRINSIC_VOID(Op, DAG);
case ISD::ATOMIC_STORE:		case ISD::ATOMIC_STORE:
if (cast<MemSDNode>(Op)->getMemoryVT() == MVT::i128) {		if (cast<MemSDNode>(Op)->getMemoryVT() == MVT::i128) {
assert(Subtarget->hasLSE2());		assert(Subtarget->hasLSE2() \|\| Subtarget->hasRCPC3());
return LowerStore128(Op, DAG);		return LowerStore128(Op, DAG);
}		}
return SDValue();		return SDValue();
case ISD::STORE:		case ISD::STORE:
return LowerSTORE(Op, DAG);		return LowerSTORE(Op, DAG);
case ISD::MSTORE:		case ISD::MSTORE:
return LowerFixedLengthVectorMStoreToSVE(Op, DAG);		return LowerFixedLengthVectorMStoreToSVE(Op, DAG);
case ISD::MGATHER:		case ISD::MGATHER:
▲ Show 20 Lines • Show All 16,003 Lines • ▼ Show 20 Lines	case ISD::LOAD: {
if ((!LoadNode->isVolatile() && !LoadNode->isAtomic()) \|\|		if ((!LoadNode->isVolatile() && !LoadNode->isAtomic()) \|\|
LoadNode->getMemoryVT() != MVT::i128) {		LoadNode->getMemoryVT() != MVT::i128) {
// Non-volatile or atomic loads are optimized later in AArch64's load/store		// Non-volatile or atomic loads are optimized later in AArch64's load/store
// optimizer.		// optimizer.
return;		return;
}		}

if (SDValue(N, 0).getValueType() == MVT::i128) {		if (SDValue(N, 0).getValueType() == MVT::i128) {
		auto *AN = dyn_cast<AtomicSDNode>(LoadNode);
		bool isLoadAcquire =
		AN && AN->getSuccessOrdering() == AtomicOrdering::Acquire;
		unsigned Opcode = isLoadAcquire ? AArch64ISD::LDIAPP : AArch64ISD::LDP;

		if (isLoadAcquire)
		assert(Subtarget->hasFeature(AArch64::FeatureRCPC3));

SDValue Result = DAG.getMemIntrinsicNode(		SDValue Result = DAG.getMemIntrinsicNode(
AArch64ISD::LDP, SDLoc(N),		Opcode, SDLoc(N), DAG.getVTList({MVT::i64, MVT::i64, MVT::Other}),
DAG.getVTList({MVT::i64, MVT::i64, MVT::Other}),
{LoadNode->getChain(), LoadNode->getBasePtr()},		{LoadNode->getChain(), LoadNode->getBasePtr()},
LoadNode->getMemoryVT(), LoadNode->getMemOperand());		LoadNode->getMemoryVT(), LoadNode->getMemOperand());

SDValue Pair = DAG.getNode(ISD::BUILD_PAIR, SDLoc(N), MVT::i128,		SDValue Pair = DAG.getNode(ISD::BUILD_PAIR, SDLoc(N), MVT::i128,
Result.getValue(0), Result.getValue(1));		Result.getValue(0), Result.getValue(1));
Results.append({Pair, Result.getValue(2) /* Chain */});		Results.append({Pair, Result.getValue(2) /* Chain */});
}		}
return;		return;
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::isOpSuitableForLDPSTP(const Instruction *I) const {

if (auto SI = dyn_cast<StoreInst>(I))		if (auto SI = dyn_cast<StoreInst>(I))
return SI->getValueOperand()->getType()->getPrimitiveSizeInBits() == 128 &&		return SI->getValueOperand()->getType()->getPrimitiveSizeInBits() == 128 &&
SI->getAlign() >= Align(16);		SI->getAlign() >= Align(16);

return false;		return false;
}		}

		bool AArch64TargetLowering::isOpSuitableForRCPC3(const Instruction *I) const {
		if (!Subtarget->hasRCPC3())
		return false;

		if (auto LI = dyn_cast<LoadInst>(I))
		return LI->getType()->getPrimitiveSizeInBits() == 128 &&
		LI->getAlign() >= Align(16) &&
		LI->getOrdering() == AtomicOrdering::Acquire;

		if (auto SI = dyn_cast<StoreInst>(I))
		return SI->getValueOperand()->getType()->getPrimitiveSizeInBits() == 128 &&
		SI->getAlign() >= Align(16) &&
		SI->getOrdering() == AtomicOrdering::Release;

		return false;
		}

bool AArch64TargetLowering::shouldInsertFencesForAtomic(		bool AArch64TargetLowering::shouldInsertFencesForAtomic(
const Instruction *I) const {		const Instruction *I) const {
		// Inserting fences changes the load/store ordering to monotonic.
		lenaryUnsubmitted Done Reply Inline Actions Yes that sounds reasonable lenary: Yes that sounds reasonable
		if (isOpSuitableForRCPC3(I))
		return false;
return isOpSuitableForLDPSTP(I);		return isOpSuitableForLDPSTP(I);
}		}

// Loads and stores less than 128-bits are already atomic; ones above that		// Loads and stores less than 128-bits are already atomic; ones above that
// are doomed anyway, so defer to the default libcall and blame the OS when		// are doomed anyway, so defer to the default libcall and blame the OS when
// things go wrong.		// things go wrong.
TargetLoweringBase::AtomicExpansionKind		TargetLoweringBase::AtomicExpansionKind
AArch64TargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {		AArch64TargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {
unsigned Size = SI->getValueOperand()->getType()->getPrimitiveSizeInBits();		unsigned Size = SI->getValueOperand()->getType()->getPrimitiveSizeInBits();
if (Size != 128 \|\| isOpSuitableForLDPSTP(SI))		if (Size != 128 \|\| isOpSuitableForLDPSTP(SI) \|\| isOpSuitableForRCPC3(SI))
return AtomicExpansionKind::None;		return AtomicExpansionKind::None;
return AtomicExpansionKind::Expand;		return AtomicExpansionKind::Expand;
}		}

// Loads and stores less than 128-bits are already atomic; ones above that		// Loads and stores less than 128-bits are already atomic; ones above that
// are doomed anyway, so defer to the default libcall and blame the OS when		// are doomed anyway, so defer to the default libcall and blame the OS when
// things go wrong.		// things go wrong.
TargetLowering::AtomicExpansionKind		TargetLowering::AtomicExpansionKind
AArch64TargetLowering::shouldExpandAtomicLoadInIR(LoadInst *LI) const {		AArch64TargetLowering::shouldExpandAtomicLoadInIR(LoadInst *LI) const {
unsigned Size = LI->getType()->getPrimitiveSizeInBits();		unsigned Size = LI->getType()->getPrimitiveSizeInBits();

if (Size != 128 \|\| isOpSuitableForLDPSTP(LI))		if (Size != 128 \|\| isOpSuitableForLDPSTP(LI) \|\| isOpSuitableForRCPC3(LI))
return AtomicExpansionKind::None;		return AtomicExpansionKind::None;

// At -O0, fast-regalloc cannot cope with the live vregs necessary to		// At -O0, fast-regalloc cannot cope with the live vregs necessary to
// implement atomicrmw without spilling. If the target address is also on the		// implement atomicrmw without spilling. If the target address is also on the
// stack and close enough to the spill slot, this can lead to a situation		// stack and close enough to the spill slot, this can lead to a situation
// where the monitor always gets cleared and the atomic operation can never		// where the monitor always gets cleared and the atomic operation can never
// succeed. So at -O0 lower this operation to a CAS loop.		// succeed. So at -O0 lower this operation to a CAS loop.
if (getTargetMachine().getOptLevel() == CodeGenOpt::None)		if (getTargetMachine().getOptLevel() == CodeGenOpt::None)
▲ Show 20 Lines • Show All 1,695 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 356 Lines • ▼ Show 20 Lines
def SDT_AArch64ITOF : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisSameAs<0,1>]>;		def SDT_AArch64ITOF : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisSameAs<0,1>]>;

def SDT_AArch64TLSDescCall : SDTypeProfile<0, -2, [SDTCisPtrTy<0>,		def SDT_AArch64TLSDescCall : SDTypeProfile<0, -2, [SDTCisPtrTy<0>,
SDTCisPtrTy<1>]>;		SDTCisPtrTy<1>]>;

def SDT_AArch64uaddlp : SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisVec<1>]>;		def SDT_AArch64uaddlp : SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisVec<1>]>;

def SDT_AArch64ldp : SDTypeProfile<2, 1, [SDTCisVT<0, i64>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;		def SDT_AArch64ldp : SDTypeProfile<2, 1, [SDTCisVT<0, i64>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;
		def SDT_AArch64ldiapp : SDTypeProfile<2, 1, [SDTCisVT<0, i64>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;
def SDT_AArch64ldnp : SDTypeProfile<2, 1, [SDTCisVT<0, v4i32>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;		def SDT_AArch64ldnp : SDTypeProfile<2, 1, [SDTCisVT<0, v4i32>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;
def SDT_AArch64stp : SDTypeProfile<0, 3, [SDTCisVT<0, i64>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;		def SDT_AArch64stp : SDTypeProfile<0, 3, [SDTCisVT<0, i64>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;
		def SDT_AArch64stilp : SDTypeProfile<0, 3, [SDTCisVT<0, i64>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;
def SDT_AArch64stnp : SDTypeProfile<0, 3, [SDTCisVT<0, v4i32>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;		def SDT_AArch64stnp : SDTypeProfile<0, 3, [SDTCisVT<0, v4i32>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;

// Generates the general dynamic sequences, i.e.		// Generates the general dynamic sequences, i.e.
// adrp x0, :tlsdesc:var		// adrp x0, :tlsdesc:var
// ldr x1, [x0, #:tlsdesc_lo12:var]		// ldr x1, [x0, #:tlsdesc_lo12:var]
// add x0, x0, #:tlsdesc_lo12:var		// add x0, x0, #:tlsdesc_lo12:var
// .tlsdesccall var		// .tlsdesccall var
// blr x1		// blr x1
▲ Show 20 Lines • Show All 406 Lines • ▼ Show 20 Lines	def SDT_AArch64unpk : SDTypeProfile<1, 1, [
SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>		SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>
]>;		]>;
def AArch64sunpkhi : SDNode<"AArch64ISD::SUNPKHI", SDT_AArch64unpk>;		def AArch64sunpkhi : SDNode<"AArch64ISD::SUNPKHI", SDT_AArch64unpk>;
def AArch64sunpklo : SDNode<"AArch64ISD::SUNPKLO", SDT_AArch64unpk>;		def AArch64sunpklo : SDNode<"AArch64ISD::SUNPKLO", SDT_AArch64unpk>;
def AArch64uunpkhi : SDNode<"AArch64ISD::UUNPKHI", SDT_AArch64unpk>;		def AArch64uunpkhi : SDNode<"AArch64ISD::UUNPKHI", SDT_AArch64unpk>;
def AArch64uunpklo : SDNode<"AArch64ISD::UUNPKLO", SDT_AArch64unpk>;		def AArch64uunpklo : SDNode<"AArch64ISD::UUNPKLO", SDT_AArch64unpk>;

def AArch64ldp : SDNode<"AArch64ISD::LDP", SDT_AArch64ldp, [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		def AArch64ldp : SDNode<"AArch64ISD::LDP", SDT_AArch64ldp, [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
		def AArch64ldiapp : SDNode<"AArch64ISD::LDIAPP", SDT_AArch64ldiapp, [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def AArch64ldnp : SDNode<"AArch64ISD::LDNP", SDT_AArch64ldnp, [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		def AArch64ldnp : SDNode<"AArch64ISD::LDNP", SDT_AArch64ldnp, [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def AArch64stp : SDNode<"AArch64ISD::STP", SDT_AArch64stp, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		def AArch64stp : SDNode<"AArch64ISD::STP", SDT_AArch64stp, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
		def AArch64stilp : SDNode<"AArch64ISD::STILP", SDT_AArch64stilp, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
def AArch64stnp : SDNode<"AArch64ISD::STNP", SDT_AArch64stnp, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		def AArch64stnp : SDNode<"AArch64ISD::STNP", SDT_AArch64stnp, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;

def AArch64tbl : SDNode<"AArch64ISD::TBL", SDT_AArch64TBL>;		def AArch64tbl : SDNode<"AArch64ISD::TBL", SDT_AArch64TBL>;
def AArch64mrs : SDNode<"AArch64ISD::MRS",		def AArch64mrs : SDNode<"AArch64ISD::MRS",
SDTypeProfile<1, 1, [SDTCisVT<0, i64>, SDTCisVT<1, i32>]>,		SDTypeProfile<1, 1, [SDTCisVT<0, i64>, SDTCisVT<1, i32>]>,
[SDNPHasChain, SDNPOutGlue]>;		[SDNPHasChain, SDNPOutGlue]>;

// Match add node and also treat an 'or' node is as an 'add' if the or'ed operands		// Match add node and also treat an 'or' node is as an 'add' if the or'ed operands
▲ Show 20 Lines • Show All 2,604 Lines • ▼ Show 20 Lines

// Pair (pre-indexed)		// Pair (pre-indexed)
def STPWpre : StorePairPreIdx<0b00, 0, GPR32z, simm7s4, "stp">;		def STPWpre : StorePairPreIdx<0b00, 0, GPR32z, simm7s4, "stp">;
def STPXpre : StorePairPreIdx<0b10, 0, GPR64z, simm7s8, "stp">;		def STPXpre : StorePairPreIdx<0b10, 0, GPR64z, simm7s8, "stp">;
def STPSpre : StorePairPreIdx<0b00, 1, FPR32Op, simm7s4, "stp">;		def STPSpre : StorePairPreIdx<0b00, 1, FPR32Op, simm7s4, "stp">;
def STPDpre : StorePairPreIdx<0b01, 1, FPR64Op, simm7s8, "stp">;		def STPDpre : StorePairPreIdx<0b01, 1, FPR64Op, simm7s8, "stp">;
def STPQpre : StorePairPreIdx<0b10, 1, FPR128Op, simm7s16, "stp">;		def STPQpre : StorePairPreIdx<0b10, 1, FPR128Op, simm7s16, "stp">;

// Pair (pre-indexed)		// Pair (post-indexed)
def STPWpost : StorePairPostIdx<0b00, 0, GPR32z, simm7s4, "stp">;		def STPWpost : StorePairPostIdx<0b00, 0, GPR32z, simm7s4, "stp">;
def STPXpost : StorePairPostIdx<0b10, 0, GPR64z, simm7s8, "stp">;		def STPXpost : StorePairPostIdx<0b10, 0, GPR64z, simm7s8, "stp">;
def STPSpost : StorePairPostIdx<0b00, 1, FPR32Op, simm7s4, "stp">;		def STPSpost : StorePairPostIdx<0b00, 1, FPR32Op, simm7s4, "stp">;
def STPDpost : StorePairPostIdx<0b01, 1, FPR64Op, simm7s8, "stp">;		def STPDpost : StorePairPostIdx<0b01, 1, FPR64Op, simm7s8, "stp">;
def STPQpost : StorePairPostIdx<0b10, 1, FPR128Op, simm7s16, "stp">;		def STPQpost : StorePairPostIdx<0b10, 1, FPR128Op, simm7s16, "stp">;

// Pair (no allocate)		// Pair (no allocate)
defm STNPW : StorePairNoAlloc<0b00, 0, GPR32z, simm7s4, "stnp">;		defm STNPW : StorePairNoAlloc<0b00, 0, GPR32z, simm7s4, "stnp">;
▲ Show 20 Lines • Show All 5,259 Lines • ▼ Show 20 Lines	let Predicates = [HasRCPC3] in {
def STILPXpre: BaseLRCPC3IntegerLoadStorePair<0b11, 0b00, 0b0000, (outs GPR64sp:$wback), (ins GPR64:$Rt, GPR64:$Rt2, GPR64sp:$Rn), "stilp", "\t$Rt, $Rt2, [$Rn, #-16]!", "$Rn = $wback">;		def STILPXpre: BaseLRCPC3IntegerLoadStorePair<0b11, 0b00, 0b0000, (outs GPR64sp:$wback), (ins GPR64:$Rt, GPR64:$Rt2, GPR64sp:$Rn), "stilp", "\t$Rt, $Rt2, [$Rn, #-16]!", "$Rn = $wback">;
def STILPW: BaseLRCPC3IntegerLoadStorePair<0b10, 0b00, 0b0001, (outs), (ins GPR32:$Rt, GPR32:$Rt2, GPR64sp:$Rn), "stilp", "\t$Rt, $Rt2, [$Rn]", "">;		def STILPW: BaseLRCPC3IntegerLoadStorePair<0b10, 0b00, 0b0001, (outs), (ins GPR32:$Rt, GPR32:$Rt2, GPR64sp:$Rn), "stilp", "\t$Rt, $Rt2, [$Rn]", "">;
def STILPX: BaseLRCPC3IntegerLoadStorePair<0b11, 0b00, 0b0001, (outs), (ins GPR64:$Rt, GPR64:$Rt2, GPR64sp:$Rn), "stilp", "\t$Rt, $Rt2, [$Rn]", "">;		def STILPX: BaseLRCPC3IntegerLoadStorePair<0b11, 0b00, 0b0001, (outs), (ins GPR64:$Rt, GPR64:$Rt2, GPR64sp:$Rn), "stilp", "\t$Rt, $Rt2, [$Rn]", "">;
def LDIAPPWpre: BaseLRCPC3IntegerLoadStorePair<0b10, 0b01, 0b0000, (outs GPR64sp:$wback, GPR32:$Rt, GPR32:$Rt2), (ins GPR64sp:$Rn), "ldiapp", "\t$Rt, $Rt2, [$Rn], #8", "$Rn = $wback">;		def LDIAPPWpre: BaseLRCPC3IntegerLoadStorePair<0b10, 0b01, 0b0000, (outs GPR64sp:$wback, GPR32:$Rt, GPR32:$Rt2), (ins GPR64sp:$Rn), "ldiapp", "\t$Rt, $Rt2, [$Rn], #8", "$Rn = $wback">;
def LDIAPPXpre: BaseLRCPC3IntegerLoadStorePair<0b11, 0b01, 0b0000, (outs GPR64sp:$wback, GPR64:$Rt, GPR64:$Rt2), (ins GPR64sp:$Rn), "ldiapp", "\t$Rt, $Rt2, [$Rn], #16", "$Rn = $wback">;		def LDIAPPXpre: BaseLRCPC3IntegerLoadStorePair<0b11, 0b01, 0b0000, (outs GPR64sp:$wback, GPR64:$Rt, GPR64:$Rt2), (ins GPR64sp:$Rn), "ldiapp", "\t$Rt, $Rt2, [$Rn], #16", "$Rn = $wback">;
def LDIAPPW: BaseLRCPC3IntegerLoadStorePair<0b10, 0b01, 0b0001, (outs GPR32:$Rt, GPR32:$Rt2), (ins GPR64sp0:$Rn), "ldiapp", "\t$Rt, $Rt2, [$Rn]", "">;		def LDIAPPW: BaseLRCPC3IntegerLoadStorePair<0b10, 0b01, 0b0001, (outs GPR32:$Rt, GPR32:$Rt2), (ins GPR64sp0:$Rn), "ldiapp", "\t$Rt, $Rt2, [$Rn]", "">;
def LDIAPPX: BaseLRCPC3IntegerLoadStorePair<0b11, 0b01, 0b0001, (outs GPR64:$Rt, GPR64:$Rt2), (ins GPR64sp0:$Rn), "ldiapp", "\t$Rt, $Rt2, [$Rn]", "">;		def LDIAPPX: BaseLRCPC3IntegerLoadStorePair<0b11, 0b01, 0b0001, (outs GPR64:$Rt, GPR64:$Rt2), (ins GPR64sp0:$Rn), "ldiapp", "\t$Rt, $Rt2, [$Rn]", "">;

		def : Pat<(AArch64ldiapp GPR64sp:$Rn), (LDIAPPX GPR64sp:$Rn)>;
		def : Pat<(AArch64stilp GPR64:$Rt, GPR64:$Rt2, GPR64sp:$Rn), (STILPX GPR64:$Rt, GPR64:$Rt2, GPR64sp:$Rn)>;

// Aliases for when offset=0		// Aliases for when offset=0
def : InstAlias<"stilp\t$Rt, $Rt2, [$Rn, #0]", (STILPW GPR32: $Rt, GPR32: $Rt2, GPR64sp:$Rn)>;		def : InstAlias<"stilp\t$Rt, $Rt2, [$Rn, #0]", (STILPW GPR32: $Rt, GPR32: $Rt2, GPR64sp:$Rn)>;
def : InstAlias<"stilp\t$Rt, $Rt2, [$Rn, #0]", (STILPX GPR64: $Rt, GPR64: $Rt2, GPR64sp:$Rn)>;		def : InstAlias<"stilp\t$Rt, $Rt2, [$Rn, #0]", (STILPX GPR64: $Rt, GPR64: $Rt2, GPR64sp:$Rn)>;

// size opc		// size opc
def STLRWpre: BaseLRCPC3IntegerLoadStore<0b10, 0b10, (outs GPR64sp:$wback), (ins GPR32:$Rt, GPR64sp:$Rn), "stlr", "\t$Rt, [$Rn, #-4]!", "$Rn = $wback">;		def STLRWpre: BaseLRCPC3IntegerLoadStore<0b10, 0b10, (outs GPR64sp:$wback), (ins GPR32:$Rt, GPR64sp:$Rn), "stlr", "\t$Rt, [$Rn, #-4]!", "$Rn = $wback">;
def STLRXpre: BaseLRCPC3IntegerLoadStore<0b11, 0b10, (outs GPR64sp:$wback), (ins GPR64:$Rt, GPR64sp:$Rn), "stlr", "\t$Rt, [$Rn, #-8]!", "$Rn = $wback">;		def STLRXpre: BaseLRCPC3IntegerLoadStore<0b11, 0b10, (outs GPR64sp:$wback), (ins GPR64:$Rt, GPR64sp:$Rn), "stlr", "\t$Rt, [$Rn, #-8]!", "$Rn = $wback">;
def LDAPRWpre: BaseLRCPC3IntegerLoadStore<0b10, 0b11, (outs GPR64sp:$wback, GPR32:$Rt), (ins GPR64sp:$Rn), "ldapr", "\t$Rt, [$Rn], #4", "$Rn = $wback">;		def LDAPRWpre: BaseLRCPC3IntegerLoadStore<0b10, 0b11, (outs GPR64sp:$wback, GPR32:$Rt), (ins GPR64sp:$Rn), "ldapr", "\t$Rt, [$Rn], #4", "$Rn = $wback">;
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

//===- AArch64LegalizerInfo.cpp ----------------------------------*- C++ -*-==//

Lint: Lint

clang-format suggested style edits found:

Lint: Lint: clang-format suggested style edits found:

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

/// \file

/// This file implements the targeting of the Machinelegalizer class for

▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines

auto IsPtrVecPred = [=](const LegalityQuery &Query) {

const LLT &ValTy = Query.Types[0];

if (!ValTy.isVector())

return false;

const LLT EltTy = ValTy.getElementType();

return EltTy.isPointer() && EltTy.getAddressSpace() == 0;

};

getActionDefinitionsBuilder(G_LOAD)

.customIf([&](const LegalityQuery &Query) {

return ST.hasRCPC3() && Query.Types[0] == s128 &&

Query.MMODescrs[0].Ordering == AtomicOrdering::Acquire;

})

.customIf([=](const LegalityQuery &Query) {

return Query.Types[0] == s128 &&

Query.MMODescrs[0].Ordering != AtomicOrdering::NotAtomic;

})

.legalForTypesWithMemDesc({{s8, p0, s8, 8},

{s16, p0, s16, 8},

{s32, p0, s32, 8},

{s64, p0, s64, 8},

Show All 22 Lines

getActionDefinitionsBuilder(G_LOAD)

.clampMaxNumElements(0, s16, 8)

.clampMaxNumElements(0, s32, 4)

.clampMaxNumElements(0, s64, 2)

.clampMaxNumElements(0, p0, 2)

.customIf(IsPtrVecPred)

.scalarizeIf(typeIs(0, v2s16), 0);

getActionDefinitionsBuilder(G_STORE)

.customIf([&](const LegalityQuery &Query) {

return ST.hasRCPC3() && Query.Types[0] == s128 &&

Query.MMODescrs[0].Ordering == AtomicOrdering::Release;

})

.customIf([=](const LegalityQuery &Query) {

return Query.Types[0] == s128 &&

Query.MMODescrs[0].Ordering != AtomicOrdering::NotAtomic;

})

.legalForTypesWithMemDesc({{s8, p0, s8, 8},

{s16, p0, s8, 8}, // truncstorei8 from s16

{s32, p0, s8, 8}, // truncstorei8 from s32

{s64, p0, s8, 8}, // truncstorei8 from s64

▲ Show 20 Lines • Show All 825 Lines • ▼ Show 20 Lines

bool AArch64LegalizerInfo::legalizeLoadStore(

// Custom legalization requires the instruction, if not deleted, must be fully

// legalized. In order to allow further legalization of the inst, we create

// a new instruction and erase the existing one.

const LLT ValTy = MRI.getType(ValReg);

if (ValTy == LLT::scalar(128)) {

assert((*MI.memoperands_begin())->getSuccessOrdering() ==

AtomicOrdering::Monotonic ||

AtomicOrdering Ordering = (*MI.memoperands_begin())->getSuccessOrdering();

(*MI.memoperands_begin())->getSuccessOrdering() ==

bool isLoad = MI.getOpcode() == TargetOpcode::G_LOAD;

AtomicOrdering::Unordered);

bool isLoadAcquire = isLoad && Ordering == AtomicOrdering::Acquire;

assert(ST->hasLSE2() && "ldp/stp not single copy atomic without +lse2");

bool isStoreRelease = !isLoad && Ordering == AtomicOrdering::Release;

bool isRCPC3 = ST->hasRCPC3() && (isLoadAcquire || isStoreRelease);

LLT s64 = LLT::scalar(64);

unsigned Opcode;

if (isRCPC3) {

Opcode = isLoad ? AArch64::LDIAPPX : AArch64::STILPX;

} else {

// For LSE2, loads/stores should have been converted to monotonic and had

// a fence inserted after them.

assert(Ordering == AtomicOrdering::Monotonic ||

Ordering == AtomicOrdering::Unordered);

assert(ST->hasLSE2() && "ldp/stp not single copy atomic without +lse2");

Opcode = isLoad ? AArch64::LDPXi : Opcode = AArch64::STPXi;

lenaryUnsubmitted

Done

assert(ST->hasLSE2() && "ldp/stp not single copy atomic without +lse2");

- Opcode = isLoad ? AArch64::LDPXi : Opcode = AArch64::STPXi;

+ Opcode = isLoad ? AArch64::LDPXi : AArch64::STPXi;

}

MachineInstrBuilder NewI;

lenary:

}

MachineInstrBuilder NewI;

if (MI.getOpcode() == TargetOpcode::G_LOAD) {

if (isLoad) {

NewI = MIRBuilder.buildInstr(AArch64::LDPXi, {s64, s64}, {});

NewI = MIRBuilder.buildInstr(Opcode, {s64, s64}, {});

MIRBuilder.buildMerge(ValReg, {NewI->getOperand(0), NewI->getOperand(1)});

} else {

auto Split = MIRBuilder.buildUnmerge(s64, MI.getOperand(0));

NewI = MIRBuilder.buildInstr(

AArch64::STPXi, {}, {Split->getOperand(0), Split->getOperand(1)});

Opcode, {}, {Split->getOperand(0), Split->getOperand(1)});

}

if (isRCPC3) {

NewI.addUse(MI.getOperand(1).getReg());

} else {

int Offset;

matchLDPSTPAddrMode(MI.getOperand(1).getReg(), Base, Offset, MRI);

NewI.addUse(Base);

NewI.addImm(Offset / 8);

}

NewI.cloneMemRefs(MI);

constrainSelectedInstRegOperands(*NewI, *ST->getInstrInfo(),

*MRI.getTargetRegisterInfo(),

*ST->getRegBankInfo());

MI.eraseFromParent();

return true;

}

▲ Show 20 Lines • Show All 375 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc3.ll

	Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines
	; -O1-LABEL: load_atomic_i128_aligned_monotonic_const:			; -O1-LABEL: load_atomic_i128_aligned_monotonic_const:
	; -O1: ldxp x0, x1, [x8]			; -O1: ldxp x0, x1, [x8]
	; -O1: stxp w9, x0, x1, [x8]			; -O1: stxp w9, x0, x1, [x8]
	%r = load atomic i128, ptr %ptr monotonic, align 16			%r = load atomic i128, ptr %ptr monotonic, align 16
	ret i128 %r			ret i128 %r
	}			}

	define dso_local i128 @load_atomic_i128_aligned_acquire(ptr %ptr) {			define dso_local i128 @load_atomic_i128_aligned_acquire(ptr %ptr) {
	; -O0-LABEL: load_atomic_i128_aligned_acquire:			; CHECK-LABEL: load_atomic_i128_aligned_acquire:
	; -O0: ldaxp x0, x1, [x9]			; CHECK: ldiapp x0, x1, [x0]
	; -O0: cmp x0, x10
	; -O0: cmp x1, x10
	; -O0: stxp w8, x10, x10, [x9]
	; -O0: stxp w8, x0, x1, [x9]
	;
	; -O1-LABEL: load_atomic_i128_aligned_acquire:
	; -O1: ldaxp x0, x1, [x8]
	; -O1: stxp w9, x0, x1, [x8]
	%r = load atomic i128, ptr %ptr acquire, align 16			%r = load atomic i128, ptr %ptr acquire, align 16
	ret i128 %r			ret i128 %r
	}			}

	define dso_local i128 @load_atomic_i128_aligned_acquire_const(ptr readonly %ptr) {			define dso_local i128 @load_atomic_i128_aligned_acquire_const(ptr readonly %ptr) {
	; -O0-LABEL: load_atomic_i128_aligned_acquire_const:			; CHECK-LABEL: load_atomic_i128_aligned_acquire_const:
	; -O0: ldaxp x0, x1, [x9]			; CHECK: ldiapp x0, x1, [x0]
	; -O0: cmp x0, x10
	; -O0: cmp x1, x10
	; -O0: stxp w8, x10, x10, [x9]
	; -O0: stxp w8, x0, x1, [x9]
	;
	; -O1-LABEL: load_atomic_i128_aligned_acquire_const:
	; -O1: ldaxp x0, x1, [x8]
	; -O1: stxp w9, x0, x1, [x8]
	%r = load atomic i128, ptr %ptr acquire, align 16			%r = load atomic i128, ptr %ptr acquire, align 16
	ret i128 %r			ret i128 %r
	}			}

	define dso_local i128 @load_atomic_i128_aligned_seq_cst(ptr %ptr) {			define dso_local i128 @load_atomic_i128_aligned_seq_cst(ptr %ptr) {
	; -O0-LABEL: load_atomic_i128_aligned_seq_cst:			; -O0-LABEL: load_atomic_i128_aligned_seq_cst:
	; -O0: ldaxp x0, x1, [x9]			; -O0: ldaxp x0, x1, [x9]
	; -O0: cmp x0, x10			; -O0: cmp x0, x10
	▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-store-rcpc3.ll

	Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
	; -O1-LABEL: store_atomic_i128_aligned_monotonic:			; -O1-LABEL: store_atomic_i128_aligned_monotonic:
	; -O1: ldxp xzr, x8, [x2]			; -O1: ldxp xzr, x8, [x2]
	; -O1: stxp w8, x0, x1, [x2]			; -O1: stxp w8, x0, x1, [x2]
	store atomic i128 %value, ptr %ptr monotonic, align 16			store atomic i128 %value, ptr %ptr monotonic, align 16
	ret void			ret void
	}			}

	define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {			define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
	; -O0-LABEL: store_atomic_i128_aligned_release:			; CHECK-LABEL: store_atomic_i128_aligned_release:
	; -O0: ldxp x10, x9, [x11]			; CHECK: stilp x0, x1, [x2]
	; -O0: cmp x10, x12
	; -O0: cmp x9, x13
	; -O0: stlxp w8, x14, x15, [x11]
	; -O0: stlxp w8, x10, x9, [x11]
	; -O0: eor x8, x10, x8
	; -O0: eor x11, x9, x11
	; -O0: orr x8, x8, x11
	; -O0: subs x8, x8, #0
	;
	; -O1-LABEL: store_atomic_i128_aligned_release:
	; -O1: ldxp xzr, x8, [x2]
	; -O1: stlxp w8, x0, x1, [x2]
	store atomic i128 %value, ptr %ptr release, align 16			store atomic i128 %value, ptr %ptr release, align 16
	ret void			ret void
	}			}

	define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {			define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
	; -O0-LABEL: store_atomic_i128_aligned_seq_cst:			; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
	; -O0: ldaxp x10, x9, [x11]			; -O0: ldaxp x10, x9, [x11]
	; -O0: cmp x10, x12			; -O0: cmp x10, x12
	▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll

	Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines
	; -O1-LABEL: load_atomic_i128_aligned_monotonic_const:			; -O1-LABEL: load_atomic_i128_aligned_monotonic_const:
	; -O1: ldxp x1, x0, [x8]			; -O1: ldxp x1, x0, [x8]
	; -O1: stxp w9, x1, x0, [x8]			; -O1: stxp w9, x1, x0, [x8]
	%r = load atomic i128, ptr %ptr monotonic, align 16			%r = load atomic i128, ptr %ptr monotonic, align 16
	ret i128 %r			ret i128 %r
	}			}

	define dso_local i128 @load_atomic_i128_aligned_acquire(ptr %ptr) {			define dso_local i128 @load_atomic_i128_aligned_acquire(ptr %ptr) {
	; -O0-LABEL: load_atomic_i128_aligned_acquire:			; CHECK-LABEL: load_atomic_i128_aligned_acquire:
	; -O0: ldaxp x1, x0, [x9]			; CHECK: ldiapp x1, x0, [x0]
	; -O0: cmp x1, x10
	; -O0: cmp x0, x10
	; -O0: stxp w8, x10, x10, [x9]
	; -O0: stxp w8, x1, x0, [x9]
	;
	; -O1-LABEL: load_atomic_i128_aligned_acquire:
	; -O1: ldaxp x1, x0, [x8]
	; -O1: stxp w9, x1, x0, [x8]
	%r = load atomic i128, ptr %ptr acquire, align 16			%r = load atomic i128, ptr %ptr acquire, align 16
	ret i128 %r			ret i128 %r
	}			}

	define dso_local i128 @load_atomic_i128_aligned_acquire_const(ptr readonly %ptr) {			define dso_local i128 @load_atomic_i128_aligned_acquire_const(ptr readonly %ptr) {
	; -O0-LABEL: load_atomic_i128_aligned_acquire_const:			; CHECK-LABEL: load_atomic_i128_aligned_acquire_const:
	; -O0: ldaxp x1, x0, [x9]			; CHECK: ldiapp x1, x0, [x0]
	; -O0: cmp x1, x10
	; -O0: cmp x0, x10
	; -O0: stxp w8, x10, x10, [x9]
	; -O0: stxp w8, x1, x0, [x9]
	;
	; -O1-LABEL: load_atomic_i128_aligned_acquire_const:
	; -O1: ldaxp x1, x0, [x8]
	; -O1: stxp w9, x1, x0, [x8]
	%r = load atomic i128, ptr %ptr acquire, align 16			%r = load atomic i128, ptr %ptr acquire, align 16
	ret i128 %r			ret i128 %r
	}			}

	define dso_local i128 @load_atomic_i128_aligned_seq_cst(ptr %ptr) {			define dso_local i128 @load_atomic_i128_aligned_seq_cst(ptr %ptr) {
	; -O0-LABEL: load_atomic_i128_aligned_seq_cst:			; -O0-LABEL: load_atomic_i128_aligned_seq_cst:
	; -O0: ldaxp x1, x0, [x9]			; -O0: ldaxp x1, x0, [x9]
	; -O0: cmp x1, x10			; -O0: cmp x1, x10
	Show All 15 Lines
	; -O0: cmp x0, x10			; -O0: cmp x0, x10
	; -O0: stlxp w8, x10, x10, [x9]			; -O0: stlxp w8, x10, x10, [x9]
	; -O0: stlxp w8, x1, x0, [x9]			; -O0: stlxp w8, x1, x0, [x9]
	;			;
	; -O1-LABEL: load_atomic_i128_aligned_seq_cst_const:			; -O1-LABEL: load_atomic_i128_aligned_seq_cst_const:
	; -O1: ldaxp x1, x0, [x8]			; -O1: ldaxp x1, x0, [x8]
	; -O1: stlxp w9, x1, x0, [x8]			; -O1: stlxp w9, x1, x0, [x8]
	%r = load atomic i128, ptr %ptr seq_cst, align 16			%r = load atomic i128, ptr %ptr seq_cst, align 16
	ret i128 %r			ret i128 %r
				efriedmaUnsubmitted Not Done Reply Inline Actions Sort of orthogonal to this change, but can someone at ARM verify if this is the sequence we actually want for sequentially consistent loads with lse2, as opposed to using caspal? (I'm a bit concerned given the issues we ran into with narrower widths on Windows; see D141748.) efriedma: Sort of orthogonal to this change, but can someone at ARM verify if this is the sequence we…
				efriedmaUnsubmitted Not Done Reply Inline Actions Any update here? efriedma: Any update here?
				tmathesonAuthorUnsubmitted Not Done Reply Inline Actions Not yet but I haven't forgotten about it. tmatheson: Not yet but I haven't forgotten about it.
				tmathesonAuthorUnsubmitted Not Done Reply Inline Actions Sorry this has taken a while, I've had to work on other things. The first thing to note is that a CASP implementation is slower, so as long as both are correct we should use the `ldp+dmb`. @LukeGeeson has been working on adding CASP to herd7 so that we can compare `ldp+dmb` with `casp`. Using his Telechat tool, which can compare the C++ semantics with the machine code semantics and identify new behaviours introduced by the machine code, we aren't seeing any differences so far, although admittedly the number of cases we've checked is low because each CASP case has to be manually written. We are still looking for at a way to run more extensive testing automatically, but if you have any concerns about specific scenarios we can check them. tmatheson: Sorry this has taken a while, I've had to work on other things. The first thing to note is that…
				efriedmaUnsubmitted Not Done Reply Inline Actions I guess my biggest concern here is that we're basically committing to never using sequentially consistent (non-RCPc) load/store instructions for i128, even if the instruction set adds them in the future. If you're fine with that, then I guess I don't have any other specific concerns? efriedma: I guess my biggest concern here is that we're basically committing to never using sequentially…
	}			}

	define dso_local i8 @load_atomic_i8_unaligned_unordered(ptr %ptr) {			define dso_local i8 @load_atomic_i8_unaligned_unordered(ptr %ptr) {
	; CHECK-LABEL: load_atomic_i8_unaligned_unordered:			; CHECK-LABEL: load_atomic_i8_unaligned_unordered:
	; CHECK: ldrb w0, [x0]			; CHECK: ldrb w0, [x0]
	%r = load atomic i8, ptr %ptr unordered, align 1			%r = load atomic i8, ptr %ptr unordered, align 1
	ret i8 %r			ret i8 %r
	}			}
	▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc3.ll

	Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
	; -O1-LABEL: store_atomic_i128_aligned_monotonic:			; -O1-LABEL: store_atomic_i128_aligned_monotonic:
	; -O1: ldxp xzr, x8, [x2]			; -O1: ldxp xzr, x8, [x2]
	; -O1: stxp w8, x1, x0, [x2]			; -O1: stxp w8, x1, x0, [x2]
	store atomic i128 %value, ptr %ptr monotonic, align 16			store atomic i128 %value, ptr %ptr monotonic, align 16
	ret void			ret void
	}			}

	define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {			define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
	; -O0-LABEL: store_atomic_i128_aligned_release:			; CHECK-LABEL: store_atomic_i128_aligned_release:
	; -O0: ldxp x10, x12, [x9]			; CHECK: stilp x1, x0, [x2]
	; -O0: cmp x10, x11
	; -O0: cmp x12, x13
	; -O0: stlxp w8, x14, x15, [x9]
	; -O0: stlxp w8, x10, x12, [x9]
	; -O0: subs x12, x12, x13
	; -O0: ccmp x10, x11, #0, eq
	;
	; -O1-LABEL: store_atomic_i128_aligned_release:
	; -O1: ldxp xzr, x8, [x2]
	; -O1: stlxp w8, x1, x0, [x2]
	store atomic i128 %value, ptr %ptr release, align 16			store atomic i128 %value, ptr %ptr release, align 16
	ret void			ret void
	}			}

	define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {			define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
	; -O0-LABEL: store_atomic_i128_aligned_seq_cst:			; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
	; -O0: ldaxp x10, x12, [x9]			; -O0: ldaxp x10, x12, [x9]
	; -O0: cmp x10, x11			; -O0: cmp x10, x11
	▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Codegen for FEAT_LRCPC3ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 487985

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc3.ll

llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-store-rcpc3.ll

llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll

llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc3.ll

[AArch64] Codegen for FEAT_LRCPC3
ClosedPublic