This is an archive of the discontinued LLVM Phabricator instance.

Sorry @wwei, I was hoping to review both patches today but that hasn't really worked out. I'll take a proper look tomorrow. One observation: The patch could do with some floating point tests to verify the BITCAST logic for those types.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19555–19556	Calling `LowerToPredicatedOp` feels like overkill here compared to `DAG.getNode(RevOp, DL, NewVT, ..., DAG.getUNDEF(NewVT)`

Add some float test cases

Harbormaster completed remote builds in B138112: Diff 392701.Dec 8 2021, 4:04 AM

wwei added inline comments.Dec 8 2021, 4:05 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19555–19556	Since there's no unpredicated `revb/revh/revw` for SVE, `LowerToPredicatedOp` can help to construct a predicate operand, and can handle merge passthru opcode also. If using `DAG.getNode(RevOp, DL, NewVT, ..., DAG.getUNDEF(NewVT)`, we need some extra code to get a predicate operand and pass correct merge passthru operands.

paulwalker-arm added inline comments.Dec 8 2021, 10:46 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19555–19556	That's a good point, fair enough.
19565	Is it valid to use `class Instruction` methods this far into code generation? That said, `ShuffleVectorInst::isReverseMask` looks potentially unsafe in this context anyway because given a 4 element mask it'll return true for both `3, 2, 1, 0` and `7, 6, 5, 4` but as mentioned below `ISD::VECTOR_REVERSE` only supports a single operand and so you need to know specifically which of these scenarios apply. Do you care about the `7, 6, 5, 4` case? I can see `test_revv8i32v8i32` is a test for this but I think that is being simplified to a `3, 2, 1, 0` based shuffle before you get here. I'll note that `ARMISelLowering.cpp` has the `isReverseMask` helper function that could be useful.
19566	This looks wrong because ISD::VECTOR_REVERSE only takes a single operand? I presume `Op2` it just been ignored and there's nothing in getNode to ensure only a single operand.
llvm/test/CodeGen/AArch64/sve-fixed-length-permute-rev.ll
203–204	I originally wrote a comment asking why the i32 case emitted worse code than the f32 one but then spotted the attribute difference, so this seems more like a negative test. It's worth adding a function comment to make this clear to the user, which also goes for the other "negative" tests. Perhaps it's also worth adding `_vl256` to functions where `vscale_range` is set just to drive the point home?
271	Is there a need for `fadd` here? I'm kind of assuming that without it the float shuffles are being converted to integer ones and thus you lost test coverage? but then I can see `test_revv8f32` doesn't need it so figured I'd ask.

wwei updated this revision to Diff 393171.Dec 9 2021, 8:16 AM

wwei added inline comments.Dec 9 2021, 8:38 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19565	Maybe we use `ShuffleVectorInst::isReverseMask` here will be better, because it will call `isSingleSourceMask` to ensure the shuffle mask chooses elements from exactly one source vector(`isReverseMask` in `ARMISelLowering.cpp` will not check this). I tried various test case forms, but I couldn’t construct this kind(`7, 6, 5, 4`) of case. It would always be converted to `3, 2, 1, 0` form. To be on the safe side, I still added a check-- `Op2.isUndef()`, and `Op1` will be the only valid operand for `VECTOR_REVERSE`
19566	yeah, you're right. I removed `Op2` since it will always be `Undef`
llvm/test/CodeGen/AArch64/sve-fixed-length-permute-rev.ll
203–204	I rearranged the order of the test cases, sorted according to attributes, and added some comments for the `rev` instructions.
271	`fadd` is not needed, I removed it

Harbormaster completed remote builds in B138457: Diff 393171.Dec 9 2021, 9:28 AM

wwei added inline comments.Dec 10 2021, 4:25 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19565	And for `REVB/REVH/REVW` cases, I added `Op2.isUndef()` check too.

Functionally this looks good but I've added a few stylistic comments below to consider before committing. I found the tests harder to follow than necessary as it wasn't always clear what the intent of a specific test was, especially those when rev/b/h/w instructions are not emitted. More comments would help and I also think having two separate test files, one for rev (permute-rev) and another for revb/revh/revw (permute-rev-elts) would be advantageous.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19544	In this instance I don't believe this is required because `isREVMask` does what you need. It's only the uses of `ShuffleVectorInst::isReverseMask` that need extra protection, which I see you've covered below.
19555–19557	Personally I think the following looks better: Op = DAG.getNode(ISD::BITCAST, DL, NewVT, Op1); Op = LowerToPredicatedOp(Op, DAG, RevOp); Op = DAG.getNode(ISD::BITCAST, DL, ContainerVT, Op); but then I'm also not a fan of how the function's input parameter `Op` is overwritten.
llvm/test/CodeGen/AArch64/sve-fixed-length-permute-rev.ll
98–101	This test either looks misplaced (belongs with the other `rev` tests towards the bottom of this file?) or is perhaps redundant?
223	Can this be moved to the bottom with the other attributes?
284–286	This test looks like it belongs in the top half after `test_revhv32i16`? (i.e. so it part of the other element shuffle tests as apposed to this blocks of tests which is testing whole vector reversals).

This revision is now accepted and ready to land.Dec 14 2021, 7:20 AM

Closed by commit rGdc7b672f969b: [AArch64][SVE] Lower shuffles to permute instructions: rev/revb/revh/revw (authored by wwei). · Explain WhyDec 15 2021, 5:54 AM

This revision was automatically updated to reflect the committed changes.

wwei added a commit: rGdc7b672f969b: [AArch64][SVE] Lower shuffles to permute instructions: rev/revb/revh/revw.

@paulwalker-arm Thanks for your comments, the code has been modified based on your review comments, and the test file has been modified too, adding some necessary comments

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

2 lines

AArch64ISelLowering.cpp

38 lines

AArch64SVEInstrInfo.td

6 lines

SVEInstrFormats.td

6 lines

test/

CodeGen/

AArch64/

sve-fixed-length-permute-rev.ll

470 lines

Diff 394540

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 318 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
FMINNMV_PRED,		FMINNMV_PRED,

INSR,		INSR,
PTEST,		PTEST,
PTRUE,		PTRUE,

BITREVERSE_MERGE_PASSTHRU,		BITREVERSE_MERGE_PASSTHRU,
BSWAP_MERGE_PASSTHRU,		BSWAP_MERGE_PASSTHRU,
		REVH_MERGE_PASSTHRU,
		REVW_MERGE_PASSTHRU,
CTLZ_MERGE_PASSTHRU,		CTLZ_MERGE_PASSTHRU,
CTPOP_MERGE_PASSTHRU,		CTPOP_MERGE_PASSTHRU,
DUP_MERGE_PASSTHRU,		DUP_MERGE_PASSTHRU,
INDEX_VECTOR,		INDEX_VECTOR,

// Cast between vectors of the same element type but differ in length.		// Cast between vectors of the same element type but differ in length.
REINTERPRET_CAST,		REINTERPRET_CAST,

▲ Show 20 Lines • Show All 815 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 198 Lines • ▼ Show 20 Lines
// Returns true for ####_MERGE_PASSTHRU opcodes, whose operands have a leading		// Returns true for ####_MERGE_PASSTHRU opcodes, whose operands have a leading
// predicate and end with a passthru value matching the result type.		// predicate and end with a passthru value matching the result type.
static bool isMergePassthruOpcode(unsigned Opc) {		static bool isMergePassthruOpcode(unsigned Opc) {
switch (Opc) {		switch (Opc) {
default:		default:
return false;		return false;
case AArch64ISD::BITREVERSE_MERGE_PASSTHRU:		case AArch64ISD::BITREVERSE_MERGE_PASSTHRU:
case AArch64ISD::BSWAP_MERGE_PASSTHRU:		case AArch64ISD::BSWAP_MERGE_PASSTHRU:
		case AArch64ISD::REVH_MERGE_PASSTHRU:
		case AArch64ISD::REVW_MERGE_PASSTHRU:
case AArch64ISD::CTLZ_MERGE_PASSTHRU:		case AArch64ISD::CTLZ_MERGE_PASSTHRU:
case AArch64ISD::CTPOP_MERGE_PASSTHRU:		case AArch64ISD::CTPOP_MERGE_PASSTHRU:
case AArch64ISD::DUP_MERGE_PASSTHRU:		case AArch64ISD::DUP_MERGE_PASSTHRU:
case AArch64ISD::ABS_MERGE_PASSTHRU:		case AArch64ISD::ABS_MERGE_PASSTHRU:
case AArch64ISD::NEG_MERGE_PASSTHRU:		case AArch64ISD::NEG_MERGE_PASSTHRU:
case AArch64ISD::FNEG_MERGE_PASSTHRU:		case AArch64ISD::FNEG_MERGE_PASSTHRU:
case AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU:		case AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU:
case AArch64ISD::ZERO_EXTEND_INREG_MERGE_PASSTHRU:		case AArch64ISD::ZERO_EXTEND_INREG_MERGE_PASSTHRU:
▲ Show 20 Lines • Show All 2,007 Lines • ▼ Show 20 Lines	case AArch64ISD::FIRST_NUMBER:
MAKE_CASE(AArch64ISD::SST1_IMM_PRED)		MAKE_CASE(AArch64ISD::SST1_IMM_PRED)
MAKE_CASE(AArch64ISD::SSTNT1_PRED)		MAKE_CASE(AArch64ISD::SSTNT1_PRED)
MAKE_CASE(AArch64ISD::SSTNT1_INDEX_PRED)		MAKE_CASE(AArch64ISD::SSTNT1_INDEX_PRED)
MAKE_CASE(AArch64ISD::LDP)		MAKE_CASE(AArch64ISD::LDP)
MAKE_CASE(AArch64ISD::STP)		MAKE_CASE(AArch64ISD::STP)
MAKE_CASE(AArch64ISD::STNP)		MAKE_CASE(AArch64ISD::STNP)
MAKE_CASE(AArch64ISD::BITREVERSE_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::BITREVERSE_MERGE_PASSTHRU)
MAKE_CASE(AArch64ISD::BSWAP_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::BSWAP_MERGE_PASSTHRU)
		MAKE_CASE(AArch64ISD::REVH_MERGE_PASSTHRU)
		MAKE_CASE(AArch64ISD::REVW_MERGE_PASSTHRU)
MAKE_CASE(AArch64ISD::CTLZ_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::CTLZ_MERGE_PASSTHRU)
MAKE_CASE(AArch64ISD::CTPOP_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::CTPOP_MERGE_PASSTHRU)
MAKE_CASE(AArch64ISD::DUP_MERGE_PASSTHRU)		MAKE_CASE(AArch64ISD::DUP_MERGE_PASSTHRU)
MAKE_CASE(AArch64ISD::INDEX_VECTOR)		MAKE_CASE(AArch64ISD::INDEX_VECTOR)
MAKE_CASE(AArch64ISD::UADDLP)		MAKE_CASE(AArch64ISD::UADDLP)
MAKE_CASE(AArch64ISD::CALL_RVMARKER)		MAKE_CASE(AArch64ISD::CALL_RVMARKER)
MAKE_CASE(AArch64ISD::ASSERT_ZEXT_BOOL)		MAKE_CASE(AArch64ISD::ASSERT_ZEXT_BOOL)
}		}
▲ Show 20 Lines • Show All 1,970 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
}		}
case Intrinsic::aarch64_sve_rbit:		case Intrinsic::aarch64_sve_rbit:
return DAG.getNode(AArch64ISD::BITREVERSE_MERGE_PASSTHRU, dl,		return DAG.getNode(AArch64ISD::BITREVERSE_MERGE_PASSTHRU, dl,
Op.getValueType(), Op.getOperand(2), Op.getOperand(3),		Op.getValueType(), Op.getOperand(2), Op.getOperand(3),
Op.getOperand(1));		Op.getOperand(1));
case Intrinsic::aarch64_sve_revb:		case Intrinsic::aarch64_sve_revb:
return DAG.getNode(AArch64ISD::BSWAP_MERGE_PASSTHRU, dl, Op.getValueType(),		return DAG.getNode(AArch64ISD::BSWAP_MERGE_PASSTHRU, dl, Op.getValueType(),
Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));		Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));
		case Intrinsic::aarch64_sve_revh:
		return DAG.getNode(AArch64ISD::REVH_MERGE_PASSTHRU, dl, Op.getValueType(),
		Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));
		case Intrinsic::aarch64_sve_revw:
		return DAG.getNode(AArch64ISD::REVW_MERGE_PASSTHRU, dl, Op.getValueType(),
		Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));
case Intrinsic::aarch64_sve_sxtb:		case Intrinsic::aarch64_sve_sxtb:
return DAG.getNode(		return DAG.getNode(
AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU, dl, Op.getValueType(),		AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU, dl, Op.getValueType(),
Op.getOperand(2), Op.getOperand(3),		Op.getOperand(2), Op.getOperand(3),
DAG.getValueType(Op.getValueType().changeVectorElementType(MVT::i8)),		DAG.getValueType(Op.getValueType().changeVectorElementType(MVT::i8)),
Op.getOperand(1));		Op.getOperand(1));
case Intrinsic::aarch64_sve_sxth:		case Intrinsic::aarch64_sve_sxth:
return DAG.getNode(		return DAG.getNode(
▲ Show 20 Lines • Show All 15,301 Lines • ▼ Show 20 Lines	if ((ScalarTy == MVT::i8) \|\| (ScalarTy == MVT::i16))
ScalarTy = MVT::i32;		ScalarTy = MVT::i32;
SDValue Scalar = DAG.getNode(		SDValue Scalar = DAG.getNode(
ISD::EXTRACT_VECTOR_ELT, DL, ScalarTy, Op1,		ISD::EXTRACT_VECTOR_ELT, DL, ScalarTy, Op1,
DAG.getConstant(VT.getVectorNumElements() - 1, DL, MVT::i64));		DAG.getConstant(VT.getVectorNumElements() - 1, DL, MVT::i64));
Op = DAG.getNode(AArch64ISD::INSR, DL, ContainerVT, Op2, Scalar);		Op = DAG.getNode(AArch64ISD::INSR, DL, ContainerVT, Op2, Scalar);
return convertFromScalableVector(DAG, VT, Op);		return convertFromScalableVector(DAG, VT, Op);
}		}

		for (unsigned LaneSize : {64U, 32U, 16U}) {
		if (isREVMask(ShuffleMask, VT, LaneSize)) {
		paulwalker-armUnsubmitted Not Done Reply Inline Actions In this instance I don't believe this is required because `isREVMask` does what you need. It's only the uses of `ShuffleVectorInst::isReverseMask` that need extra protection, which I see you've covered below. paulwalker-arm: In this instance I don't believe this is required because `isREVMask` does what you need. It's…
		EVT NewVT =
		getPackedSVEVectorVT(EVT::getIntegerVT(*DAG.getContext(), LaneSize));
		unsigned RevOp;
		unsigned EltSz = VT.getScalarSizeInBits();
		if (EltSz == 8)
		RevOp = AArch64ISD::BSWAP_MERGE_PASSTHRU;
		else if (EltSz == 16)
		RevOp = AArch64ISD::REVH_MERGE_PASSTHRU;
		else
		RevOp = AArch64ISD::REVW_MERGE_PASSTHRU;

		Op = DAG.getNode(ISD::BITCAST, DL, NewVT, Op1);
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Calling `LowerToPredicatedOp` feels like overkill here compared to `DAG.getNode(RevOp, DL, NewVT, ..., DAG.getUNDEF(NewVT)` paulwalker-arm: Calling `LowerToPredicatedOp` feels like overkill here compared to `DAG.getNode(RevOp, DL…
		wweiAuthorUnsubmitted Done Reply Inline Actions Since there's no unpredicated `revb/revh/revw` for SVE, `LowerToPredicatedOp` can help to construct a predicate operand, and can handle merge passthru opcode also. If using `DAG.getNode(RevOp, DL, NewVT, ..., DAG.getUNDEF(NewVT)`, we need some extra code to get a predicate operand and pass correct merge passthru operands. wwei: Since there's no unpredicated `revb/revh/revw` for SVE, `LowerToPredicatedOp` can help to…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions That's a good point, fair enough. paulwalker-arm: That's a good point, fair enough.
		Op = LowerToPredicatedOp(Op, DAG, RevOp);
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Personally I think the following looks better: Op = DAG.getNode(ISD::BITCAST, DL, NewVT, Op1); Op = LowerToPredicatedOp(Op, DAG, RevOp); Op = DAG.getNode(ISD::BITCAST, DL, ContainerVT, Op); but then I'm also not a fan of how the function's input parameter `Op` is overwritten. paulwalker-arm: Personally I think the following looks better: ``` Op = DAG.getNode(ISD::BITCAST, DL, NewVT…
		Op = DAG.getNode(ISD::BITCAST, DL, ContainerVT, Op);
		return convertFromScalableVector(DAG, VT, Op);
		}
		}

		unsigned MinSVESize = Subtarget->getMinSVEVectorSizeInBits();
		unsigned MaxSVESize = Subtarget->getMaxSVEVectorSizeInBits();
		if (MinSVESize == MaxSVESize && MaxSVESize == VT.getSizeInBits() &&
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Is it valid to use `class Instruction` methods this far into code generation? That said, `ShuffleVectorInst::isReverseMask` looks potentially unsafe in this context anyway because given a 4 element mask it'll return true for both `3, 2, 1, 0` and `7, 6, 5, 4` but as mentioned below `ISD::VECTOR_REVERSE` only supports a single operand and so you need to know specifically which of these scenarios apply. Do you care about the `7, 6, 5, 4` case? I can see `test_revv8i32v8i32` is a test for this but I think that is being simplified to a `3, 2, 1, 0` based shuffle before you get here. I'll note that `ARMISelLowering.cpp` has the `isReverseMask` helper function that could be useful. paulwalker-arm: Is it valid to use `class Instruction` methods this far into code generation? That said…
		wweiAuthorUnsubmitted Done Reply Inline Actions Maybe we use `ShuffleVectorInst::isReverseMask` here will be better, because it will call `isSingleSourceMask` to ensure the shuffle mask chooses elements from exactly one source vector(`isReverseMask` in `ARMISelLowering.cpp` will not check this). I tried various test case forms, but I couldn’t construct this kind(`7, 6, 5, 4`) of case. It would always be converted to `3, 2, 1, 0` form. To be on the safe side, I still added a check-- `Op2.isUndef()`, and `Op1` will be the only valid operand for `VECTOR_REVERSE` wwei: Maybe we use `ShuffleVectorInst::isReverseMask` here will be better, because it will call…
		wweiAuthorUnsubmitted Done Reply Inline Actions And for `REVB/REVH/REVW` cases, I added `Op2.isUndef()` check too. wwei: And for `REVB/REVH/REVW` cases, I added `Op2.isUndef()` check too.
		ShuffleVectorInst::isReverseMask(ShuffleMask) && Op2.isUndef()) {
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This looks wrong because ISD::VECTOR_REVERSE only takes a single operand? I presume `Op2` it just been ignored and there's nothing in getNode to ensure only a single operand. paulwalker-arm: This looks wrong because ISD::VECTOR_REVERSE only takes a single operand? I presume `Op2` it…
		wweiAuthorUnsubmitted Done Reply Inline Actions yeah, you're right. I removed `Op2` since it will always be `Undef` wwei: yeah, you're right. I removed `Op2` since it will always be `Undef`
		Op = DAG.getNode(ISD::VECTOR_REVERSE, DL, ContainerVT, Op1);
		return convertFromScalableVector(DAG, VT, Op);
		}

return SDValue();		return SDValue();
}		}

SDValue AArch64TargetLowering::getSVESafeBitCast(EVT VT, SDValue Op,		SDValue AArch64TargetLowering::getSVESafeBitCast(EVT VT, SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
EVT InVT = Op.getValueType();		EVT InVT = Op.getValueType();
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

	Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines
	def AArch64frintx_mt : SDNode<"AArch64ISD::FRINT_MERGE_PASSTHRU", SDT_AArch64Arith>;			def AArch64frintx_mt : SDNode<"AArch64ISD::FRINT_MERGE_PASSTHRU", SDT_AArch64Arith>;
	def AArch64frinta_mt : SDNode<"AArch64ISD::FROUND_MERGE_PASSTHRU", SDT_AArch64Arith>;			def AArch64frinta_mt : SDNode<"AArch64ISD::FROUND_MERGE_PASSTHRU", SDT_AArch64Arith>;
	def AArch64frintn_mt : SDNode<"AArch64ISD::FROUNDEVEN_MERGE_PASSTHRU", SDT_AArch64Arith>;			def AArch64frintn_mt : SDNode<"AArch64ISD::FROUNDEVEN_MERGE_PASSTHRU", SDT_AArch64Arith>;
	def AArch64frintz_mt : SDNode<"AArch64ISD::FTRUNC_MERGE_PASSTHRU", SDT_AArch64Arith>;			def AArch64frintz_mt : SDNode<"AArch64ISD::FTRUNC_MERGE_PASSTHRU", SDT_AArch64Arith>;
	def AArch64fsqrt_mt : SDNode<"AArch64ISD::FSQRT_MERGE_PASSTHRU", SDT_AArch64Arith>;			def AArch64fsqrt_mt : SDNode<"AArch64ISD::FSQRT_MERGE_PASSTHRU", SDT_AArch64Arith>;
	def AArch64frecpx_mt : SDNode<"AArch64ISD::FRECPX_MERGE_PASSTHRU", SDT_AArch64Arith>;			def AArch64frecpx_mt : SDNode<"AArch64ISD::FRECPX_MERGE_PASSTHRU", SDT_AArch64Arith>;
	def AArch64rbit_mt : SDNode<"AArch64ISD::BITREVERSE_MERGE_PASSTHRU", SDT_AArch64Arith>;			def AArch64rbit_mt : SDNode<"AArch64ISD::BITREVERSE_MERGE_PASSTHRU", SDT_AArch64Arith>;
	def AArch64revb_mt : SDNode<"AArch64ISD::BSWAP_MERGE_PASSTHRU", SDT_AArch64Arith>;			def AArch64revb_mt : SDNode<"AArch64ISD::BSWAP_MERGE_PASSTHRU", SDT_AArch64Arith>;
				def AArch64revh_mt : SDNode<"AArch64ISD::REVH_MERGE_PASSTHRU", SDT_AArch64Arith>;
				def AArch64revw_mt : SDNode<"AArch64ISD::REVW_MERGE_PASSTHRU", SDT_AArch64Arith>;

	// These are like the above but we don't yet have need for ISD nodes. They allow			// These are like the above but we don't yet have need for ISD nodes. They allow
	// a single pattern to match intrinsic and ISD operand layouts.			// a single pattern to match intrinsic and ISD operand layouts.
	def AArch64cls_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_cls node:$pt, node:$pg, node:$op)]>;			def AArch64cls_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_cls node:$pt, node:$pg, node:$op)]>;
	def AArch64cnot_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_cnot node:$pt, node:$pg, node:$op)]>;			def AArch64cnot_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_cnot node:$pt, node:$pg, node:$op)]>;
	def AArch64not_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_not node:$pt, node:$pg, node:$op)]>;			def AArch64not_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_not node:$pt, node:$pg, node:$op)]>;

	def SDT_AArch64FCVT : SDTypeProfile<1, 3, [			def SDT_AArch64FCVT : SDTypeProfile<1, 3, [
	▲ Show 20 Lines • Show All 433 Lines • ▼ Show 20 Lines

	let Predicates = [HasSVEorStreamingSVE] in {			let Predicates = [HasSVEorStreamingSVE] in {
	defm INSR_ZR : sve_int_perm_insrs<"insr", AArch64insr>;			defm INSR_ZR : sve_int_perm_insrs<"insr", AArch64insr>;
	defm INSR_ZV : sve_int_perm_insrv<"insr", AArch64insr>;			defm INSR_ZV : sve_int_perm_insrv<"insr", AArch64insr>;
	defm EXT_ZZI : sve_int_perm_extract_i<"ext", AArch64ext>;			defm EXT_ZZI : sve_int_perm_extract_i<"ext", AArch64ext>;

	defm RBIT_ZPmZ : sve_int_perm_rev_rbit<"rbit", AArch64rbit_mt>;			defm RBIT_ZPmZ : sve_int_perm_rev_rbit<"rbit", AArch64rbit_mt>;
	defm REVB_ZPmZ : sve_int_perm_rev_revb<"revb", AArch64revb_mt>;			defm REVB_ZPmZ : sve_int_perm_rev_revb<"revb", AArch64revb_mt>;
	defm REVH_ZPmZ : sve_int_perm_rev_revh<"revh", int_aarch64_sve_revh>;			defm REVH_ZPmZ : sve_int_perm_rev_revh<"revh", AArch64revh_mt>;
	defm REVW_ZPmZ : sve_int_perm_rev_revw<"revw", int_aarch64_sve_revw>;			defm REVW_ZPmZ : sve_int_perm_rev_revw<"revw", AArch64revw_mt>;

	defm REV_PP : sve_int_perm_reverse_p<"rev", vector_reverse>;			defm REV_PP : sve_int_perm_reverse_p<"rev", vector_reverse>;
	defm REV_ZZ : sve_int_perm_reverse_z<"rev", vector_reverse>;			defm REV_ZZ : sve_int_perm_reverse_z<"rev", vector_reverse>;

	defm SUNPKLO_ZZ : sve_int_perm_unpk<0b00, "sunpklo", AArch64sunpklo>;			defm SUNPKLO_ZZ : sve_int_perm_unpk<0b00, "sunpklo", AArch64sunpklo>;
	defm SUNPKHI_ZZ : sve_int_perm_unpk<0b01, "sunpkhi", AArch64sunpkhi>;			defm SUNPKHI_ZZ : sve_int_perm_unpk<0b01, "sunpkhi", AArch64sunpkhi>;
	defm UUNPKLO_ZZ : sve_int_perm_unpk<0b10, "uunpklo", AArch64uunpklo>;			defm UUNPKLO_ZZ : sve_int_perm_unpk<0b10, "uunpklo", AArch64uunpklo>;
	defm UUNPKHI_ZZ : sve_int_perm_unpk<0b11, "uunpkhi", AArch64uunpkhi>;			defm UUNPKHI_ZZ : sve_int_perm_unpk<0b11, "uunpkhi", AArch64uunpkhi>;
	▲ Show 20 Lines • Show All 2,486 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,479 Lines • ▼ Show 20 Lines	multiclass sve_int_perm_rev_revb<string asm, SDPatternOperator op> {
def : SVE_1_Op_Passthru_Pat<nxv4i32, op, nxv4i1, nxv4i32, !cast<Instruction>(NAME # _S)>;		def : SVE_1_Op_Passthru_Pat<nxv4i32, op, nxv4i1, nxv4i32, !cast<Instruction>(NAME # _S)>;
def : SVE_1_Op_Passthru_Pat<nxv2i64, op, nxv2i1, nxv2i64, !cast<Instruction>(NAME # _D)>;		def : SVE_1_Op_Passthru_Pat<nxv2i64, op, nxv2i1, nxv2i64, !cast<Instruction>(NAME # _D)>;
}		}

multiclass sve_int_perm_rev_revh<string asm, SDPatternOperator op> {		multiclass sve_int_perm_rev_revh<string asm, SDPatternOperator op> {
def _S : sve_int_perm_rev<0b10, 0b01, asm, ZPR32>;		def _S : sve_int_perm_rev<0b10, 0b01, asm, ZPR32>;
def _D : sve_int_perm_rev<0b11, 0b01, asm, ZPR64>;		def _D : sve_int_perm_rev<0b11, 0b01, asm, ZPR64>;

def : SVE_3_Op_Pat<nxv4i32, op, nxv4i32, nxv4i1, nxv4i32, !cast<Instruction>(NAME # _S)>;		def : SVE_1_Op_Passthru_Pat<nxv4i32, op, nxv4i1, nxv4i32, !cast<Instruction>(NAME # _S)>;
def : SVE_3_Op_Pat<nxv2i64, op, nxv2i64, nxv2i1, nxv2i64, !cast<Instruction>(NAME # _D)>;		def : SVE_1_Op_Passthru_Pat<nxv2i64, op, nxv2i1, nxv2i64, !cast<Instruction>(NAME # _D)>;
}		}

multiclass sve_int_perm_rev_revw<string asm, SDPatternOperator op> {		multiclass sve_int_perm_rev_revw<string asm, SDPatternOperator op> {
def _D : sve_int_perm_rev<0b11, 0b10, asm, ZPR64>;		def _D : sve_int_perm_rev<0b11, 0b10, asm, ZPR64>;

def : SVE_3_Op_Pat<nxv2i64, op, nxv2i64, nxv2i1, nxv2i64, !cast<Instruction>(NAME # _D)>;		def : SVE_1_Op_Passthru_Pat<nxv2i64, op, nxv2i1, nxv2i64, !cast<Instruction>(NAME # _D)>;
}		}

class sve_int_perm_cpy_r<bits<2> sz8_64, string asm, ZPRRegOp zprty,		class sve_int_perm_cpy_r<bits<2> sz8_64, string asm, ZPRRegOp zprty,
RegisterClass srcRegType>		RegisterClass srcRegType>
: I<(outs zprty:$Zd), (ins zprty:$_Zd, PPR3bAny:$Pg, srcRegType:$Rn),		: I<(outs zprty:$Zd), (ins zprty:$_Zd, PPR3bAny:$Pg, srcRegType:$Rn),
asm, "\t$Zd, $Pg/m, $Rn",		asm, "\t$Zd, $Pg/m, $Rn",
"",		"",
[]>, Sched<[]> {		[]>, Sched<[]> {
▲ Show 20 Lines • Show All 1,957 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-permute-rev.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -aarch64-sve-vector-bits-min=256 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_EQ_256
				; RUN: llc -aarch64-sve-vector-bits-min=512 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_256

				target triple = "aarch64-unknown-linux-gnu"

				; REVB pattern for shuffle v32i8 -> v16i16
				define void @test_revbv16i16(<32 x i8>* %a) #0 {
				; CHECK-LABEL: test_revbv16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b, vl32
				; CHECK-NEXT: ptrue p1.h
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0]
				; CHECK-NEXT: revb z0.h, p1/m, z0.h
				; CHECK-NEXT: st1b { z0.b }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <32 x i8>, <32 x i8>* %a
				%tmp2 = shufflevector <32 x i8> %tmp1, <32 x i8> undef, <32 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14, i32 17, i32 16, i32 19, i32 18, i32 21, i32 20, i32 23, i32 22, i32 undef, i32 24, i32 27, i32 undef, i32 29, i32 28, i32 undef, i32 undef>
				store <32 x i8> %tmp2, <32 x i8>* %a
				ret void
				}

				; REVB pattern for shuffle v32i8 -> v8i32
				define void @test_revbv8i32(<32 x i8>* %a) #0 {
				; CHECK-LABEL: test_revbv8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b, vl32
				; CHECK-NEXT: ptrue p1.s
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0]
				; CHECK-NEXT: revb z0.s, p1/m, z0.s
				; CHECK-NEXT: st1b { z0.b }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <32 x i8>, <32 x i8>* %a
				%tmp2 = shufflevector <32 x i8> %tmp1, <32 x i8> undef, <32 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12, i32 19, i32 18, i32 17, i32 16, i32 23, i32 22, i32 21, i32 20, i32 27, i32 undef, i32 undef, i32 undef, i32 31, i32 30, i32 29, i32 undef>
				store <32 x i8> %tmp2, <32 x i8>* %a
				ret void
				}

				; REVB pattern for shuffle v32i8 -> v4i64
				define void @test_revbv4i64(<32 x i8>* %a) #0 {
				; CHECK-LABEL: test_revbv4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b, vl32
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0]
				; CHECK-NEXT: revb z0.d, p1/m, z0.d
				; CHECK-NEXT: st1b { z0.b }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <32 x i8>, <32 x i8>* %a
				%tmp2 = shufflevector <32 x i8> %tmp1, <32 x i8> undef, <32 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 31, i32 30, i32 29, i32 undef, i32 27, i32 undef, i32 undef, i32 undef>
				store <32 x i8> %tmp2, <32 x i8>* %a
				ret void
				}

				; REVH pattern for shuffle v16i16 -> v8i32
				define void @test_revhv8i32(<16 x i16>* %a) #0 {
				; CHECK-LABEL: test_revhv8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h, vl16
				; CHECK-NEXT: ptrue p1.s
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: revh z0.s, p1/m, z0.s
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <16 x i16>, <16 x i16>* %a
				%tmp2 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>
				store <16 x i16> %tmp2, <16 x i16>* %a
				ret void
				}

				; REVH pattern for shuffle v16f16 -> v8f32
				define void @test_revhv8f32(<16 x half>* %a) #0 {
				; CHECK-LABEL: test_revhv8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h, vl16
				; CHECK-NEXT: ptrue p1.s
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: revh z0.s, p1/m, z0.s
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <16 x half>, <16 x half>* %a
				%tmp2 = shufflevector <16 x half> %tmp1, <16 x half> undef, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>
				store <16 x half> %tmp2, <16 x half>* %a
				ret void
				}

				; REVH pattern for shuffle v16i16 -> v4i64
				define void @test_revhv4i64(<16 x i16>* %a) #0 {
				; CHECK-LABEL: test_revhv4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h, vl16
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: revh z0.d, p1/m, z0.d
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <16 x i16>, <16 x i16>* %a
				%tmp2 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <16 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12>
				store <16 x i16> %tmp2, <16 x i16>* %a
				ret void
				}
				paulwalker-armUnsubmitted Not Done Reply Inline Actions This test either looks misplaced (belongs with the other `rev` tests towards the bottom of this file?) or is perhaps redundant? paulwalker-arm: This test either looks misplaced (belongs with the other `rev` tests towards the bottom of this…

				; REVW pattern for shuffle v8i32 -> v4i64
				define void @test_revwv4i64(<8 x i32>* %a) #0 {
				; CHECK-LABEL: test_revwv4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl8
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: revw z0.d, p1/m, z0.d
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <8 x i32>, <8 x i32>* %a
				%tmp2 = shufflevector <8 x i32> %tmp1, <8 x i32> undef, <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>
				store <8 x i32> %tmp2, <8 x i32>* %a
				ret void
				}

				; REVW pattern for shuffle v8f32 -> v4f64
				define void @test_revwv4f64(<8 x float>* %a) #0 {
				; CHECK-LABEL: test_revwv4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl8
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: revw z0.d, p1/m, z0.d
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <8 x float>, <8 x float>* %a
				%tmp2 = shufflevector <8 x float> %tmp1, <8 x float> undef, <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>
				store <8 x float> %tmp2, <8 x float>* %a
				ret void
				}

				; Don't use SVE for 128-bit vectors
				define <16 x i8> @test_revv16i8(<16 x i8>* %a) #0 {
				; CHECK-LABEL: test_revv16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: rev64 v0.16b, v0.16b
				; CHECK-NEXT: ret
				%tmp1 = load <16 x i8>, <16 x i8>* %a
				%tmp2 = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <16 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8>
				ret <16 x i8> %tmp2
				}

				; REVW pattern for shuffle two v8i32 inputs with the second input available.
				define void @test_revwv8i32v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: test_revwv8i32v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl8
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x1]
				; CHECK-NEXT: revw z0.d, p1/m, z0.d
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <8 x i32>, <8 x i32>* %a
				%tmp2 = load <8 x i32>, <8 x i32>* %b
				%tmp3 = shufflevector <8 x i32> %tmp1, <8 x i32> %tmp2, <8 x i32> <i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>
				store <8 x i32> %tmp3, <8 x i32>* %a
				ret void
				}

				; REVH pattern for shuffle v32i16 with 256 bits and 512 bits SVE.
				define void @test_revhv32i16(<32 x i16>* %a) #0 {
				; VBITS_EQ_256-LABEL: test_revhv32i16:
				; VBITS_EQ_256: // %bb.0:
				; VBITS_EQ_256-NEXT: mov x8, #16
				; VBITS_EQ_256-NEXT: ptrue p0.h, vl16
				; VBITS_EQ_256-NEXT: ptrue p1.d
				; VBITS_EQ_256-NEXT: ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
				; VBITS_EQ_256-NEXT: ld1h { z1.h }, p0/z, [x0]
				; VBITS_EQ_256-NEXT: revh z0.d, p1/m, z0.d
				; VBITS_EQ_256-NEXT: revh z1.d, p1/m, z1.d
				; VBITS_EQ_256-NEXT: st1h { z0.h }, p0, [x0, x8, lsl #1]
				; VBITS_EQ_256-NEXT: st1h { z1.h }, p0, [x0]
				; VBITS_EQ_256-NEXT: ret
				;
				; VBITS_GE_256-LABEL: test_revhv32i16:
				; VBITS_GE_256: // %bb.0:
				; VBITS_GE_256-NEXT: ptrue p0.h, vl32
				; VBITS_GE_256-NEXT: ptrue p1.d
				; VBITS_GE_256-NEXT: ld1h { z0.h }, p0/z, [x0]
				; VBITS_GE_256-NEXT: revh z0.d, p1/m, z0.d
				; VBITS_GE_256-NEXT: st1h { z0.h }, p0, [x0]
				; VBITS_GE_256-NEXT: ret
				%tmp1 = load <32 x i16>, <32 x i16>* %a
				%tmp2 = shufflevector <32 x i16> %tmp1, <32 x i16> undef, <32 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12, i32 19, i32 18, i32 17, i32 16, i32 23, i32 22, i32 21, i32 20, i32 27, i32 undef, i32 undef, i32 undef, i32 31, i32 30, i32 29, i32 undef>
				store <32 x i16> %tmp2, <32 x i16>* %a
				ret void
				}

				; Only support to reverse bytes / halfwords / words within elements
				define void @test_rev_elts_fail(<4 x i64>* %a) #1 {
				; CHECK-LABEL: test_rev_elts_fail:
				; CHECK: // %bb.0:
				; CHECK-NEXT: stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
				; CHECK-NEXT: sub x9, sp, #48
				; CHECK-NEXT: mov x29, sp
				; CHECK-NEXT: and sp, x9, #0xffffffffffffffe0
				; CHECK-NEXT: .cfi_def_cfa w29, 16
				; CHECK-NEXT: .cfi_offset w30, -8
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.d
				paulwalker-armUnsubmitted Not Done Reply Inline Actions I originally wrote a comment asking why the i32 case emitted worse code than the f32 one but then spotted the attribute difference, so this seems more like a negative test. It's worth adding a function comment to make this clear to the user, which also goes for the other "negative" tests. Perhaps it's also worth adding `_vl256` to functions where `vscale_range` is set just to drive the point home? paulwalker-arm: I originally wrote a comment asking why the i32 case emitted worse code than the f32 one but…
				wweiAuthorUnsubmitted Done Reply Inline Actions I rearranged the order of the test cases, sorted according to attributes, and added some comments for the `rev` instructions. wwei: I rearranged the order of the test cases, sorted according to attributes, and added some…
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: mov z1.d, z0.d[2]
				; CHECK-NEXT: mov z2.d, z0.d[3]
				; CHECK-NEXT: mov x10, v0.d[1]
				; CHECK-NEXT: fmov x8, d1
				; CHECK-NEXT: fmov x9, d2
				; CHECK-NEXT: fmov x11, d0
				; CHECK-NEXT: stp x9, x8, [sp, #16]
				; CHECK-NEXT: stp x10, x11, [sp]
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [sp]
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK-NEXT: mov sp, x29
				; CHECK-NEXT: ldp x29, x30, [sp], #16 // 16-byte Folded Reload
				; CHECK-NEXT: ret
				%tmp1 = load <4 x i64>, <4 x i64>* %a
				%tmp2 = shufflevector <4 x i64> %tmp1, <4 x i64> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>
				store <4 x i64> %tmp2, <4 x i64>* %a
				ret void
				}
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Can this be moved to the bottom with the other attributes? paulwalker-arm: Can this be moved to the bottom with the other attributes?

				; REV instruction will reverse the order of all elements in the vector.
				; When the vector length and the target register size are inconsistent,
				; the correctness of generated REV instruction for shuffle pattern cannot be guaranteed.

				; sve-vector-bits-min=256, sve-vector-bits-max is not set, REV inst can't be generated.
				define void @test_revv8i32(<8 x i32>* %a) #0 {
				; CHECK-LABEL: test_revv8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
				; CHECK-NEXT: sub x9, sp, #48
				; CHECK-NEXT: mov x29, sp
				; CHECK-NEXT: and sp, x9, #0xffffffffffffffe0
				; CHECK-NEXT: .cfi_def_cfa w29, 16
				; CHECK-NEXT: .cfi_offset w30, -8
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.s, vl8
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: mov w8, v0.s[1]
				; CHECK-NEXT: mov w9, v0.s[2]
				; CHECK-NEXT: mov w11, v0.s[3]
				; CHECK-NEXT: fmov w10, s0
				; CHECK-NEXT: mov z1.s, z0.s[4]
				; CHECK-NEXT: mov z2.s, z0.s[5]
				; CHECK-NEXT: mov z3.s, z0.s[6]
				; CHECK-NEXT: mov z0.s, z0.s[7]
				; CHECK-NEXT: stp w8, w10, [sp, #24]
				; CHECK-NEXT: fmov w10, s1
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: stp w11, w9, [sp, #16]
				; CHECK-NEXT: fmov w9, s3
				; CHECK-NEXT: fmov w11, s0
				; CHECK-NEXT: stp w8, w10, [sp, #8]
				; CHECK-NEXT: stp w11, w9, [sp]
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [sp]
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: mov sp, x29
				; CHECK-NEXT: ldp x29, x30, [sp], #16 // 16-byte Folded Reload
				; CHECK-NEXT: ret
				%tmp1 = load <8 x i32>, <8 x i32>* %a
				%tmp2 = shufflevector <8 x i32> %tmp1, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
				store <8 x i32> %tmp2, <8 x i32>* %a
				ret void
				}

				; REV pattern for v32i8 shuffle with vscale_range(2,2)
				define void @test_revv32i8_vl256(<32 x i8>* %a) #1 {
				; CHECK-LABEL: test_revv32i8_vl256:
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Is there a need for `fadd` here? I'm kind of assuming that without it the float shuffles are being converted to integer ones and thus you lost test coverage? but then I can see `test_revv8f32` doesn't need it so figured I'd ask. paulwalker-arm: Is there a need for `fadd` here? I'm kind of assuming that without it the float shuffles are…
				wweiAuthorUnsubmitted Done Reply Inline Actions `fadd` is not needed, I removed it wwei: `fadd` is not needed, I removed it
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0]
				; CHECK-NEXT: rev z0.b, z0.b
				; CHECK-NEXT: st1b { z0.b }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <32 x i8>, <32 x i8>* %a
				%tmp2 = shufflevector <32 x i8> %tmp1, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
				store <32 x i8> %tmp2, <32 x i8>* %a
				ret void
				}

				; REV pattern for v16i16 shuffle with vscale_range(2,2)
				define void @test_revv16i16_vl256(<16 x i16>* %a) #1 {
				; CHECK-LABEL: test_revv16i16_vl256:
				paulwalker-armUnsubmitted Not Done Reply Inline Actions This test looks like it belongs in the top half after `test_revhv32i16`? (i.e. so it part of the other element shuffle tests as apposed to this blocks of tests which is testing whole vector reversals). paulwalker-arm: This test looks like it belongs in the top half after `test_revhv32i16`? (i.e. so it part of…
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: rev z0.h, z0.h
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <16 x i16>, <16 x i16>* %a
				%tmp2 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <16 x i32> <i32 undef, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
				store <16 x i16> %tmp2, <16 x i16>* %a
				ret void
				}

				; REV pattern for v8f32 shuffle with vscale_range(2,2)
				define void @test_revv8f32_vl256(<8 x float>* %a) #1 {
				; CHECK-LABEL: test_revv8f32_vl256:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: rev z0.s, z0.s
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <8 x float>, <8 x float>* %a
				%tmp2 = shufflevector <8 x float> %tmp1, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
				store <8 x float> %tmp2, <8 x float>* %a
				ret void
				}

				; REV pattern for v4f64 shuffle with vscale_range(2,2)
				define void @test_revv4f64_vl256(<4 x double>* %a) #1 {
				; CHECK-LABEL: test_revv4f64_vl256:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: rev z0.d, z0.d
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <4 x double>, <4 x double>* %a
				%tmp2 = shufflevector <4 x double> %tmp1, <4 x double> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
				store <4 x double> %tmp2, <4 x double>* %a
				ret void
				}

				; REV pattern for shuffle two v8i32 inputs with the second input available, vscale_range(2,2).
				define void @test_revv8i32v8i32(<8 x i32>* %a, <8 x i32>* %b) #1 {
				; CHECK-LABEL: test_revv8i32v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x1]
				; CHECK-NEXT: rev z0.s, z0.s
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%tmp1 = load <8 x i32>, <8 x i32>* %a
				%tmp2 = load <8 x i32>, <8 x i32>* %b
				%tmp3 = shufflevector <8 x i32> %tmp1, <8 x i32> %tmp2, <8 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8>
				store <8 x i32> %tmp3, <8 x i32>* %a
				ret void
				}

				; Illegal REV pattern.
				define void @test_rev_fail(<16 x i16>* %a) #1 {
				; CHECK-LABEL: test_rev_fail:
				; CHECK: // %bb.0:
				; CHECK-NEXT: stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
				; CHECK-NEXT: sub x9, sp, #48
				; CHECK-NEXT: mov x29, sp
				; CHECK-NEXT: and sp, x9, #0xffffffffffffffe0
				; CHECK-NEXT: .cfi_def_cfa w29, 16
				; CHECK-NEXT: .cfi_offset w30, -8
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: mov z1.h, z0.h[8]
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: fmov w9, s1
				; CHECK-NEXT: mov z5.h, z0.h[12]
				; CHECK-NEXT: mov z2.h, z0.h[9]
				; CHECK-NEXT: mov z3.h, z0.h[10]
				; CHECK-NEXT: mov z4.h, z0.h[11]
				; CHECK-NEXT: fmov w11, s2
				; CHECK-NEXT: strh w9, [sp, #30]
				; CHECK-NEXT: fmov w9, s5
				; CHECK-NEXT: fmov w12, s3
				; CHECK-NEXT: strh w8, [sp, #14]
				; CHECK-NEXT: fmov w8, s4
				; CHECK-NEXT: mov z6.h, z0.h[13]
				; CHECK-NEXT: mov z7.h, z0.h[14]
				; CHECK-NEXT: mov z16.h, z0.h[15]
				; CHECK-NEXT: umov w10, v0.h[1]
				; CHECK-NEXT: strh w9, [sp, #22]
				; CHECK-NEXT: umov w9, v0.h[2]
				; CHECK-NEXT: strh w11, [sp, #28]
				; CHECK-NEXT: fmov w11, s6
				; CHECK-NEXT: strh w12, [sp, #26]
				; CHECK-NEXT: fmov w12, s7
				; CHECK-NEXT: strh w8, [sp, #24]
				; CHECK-NEXT: fmov w8, s16
				; CHECK-NEXT: strh w10, [sp, #12]
				; CHECK-NEXT: strh w11, [sp, #20]
				; CHECK-NEXT: umov w11, v0.h[3]
				; CHECK-NEXT: strh w12, [sp, #18]
				; CHECK-NEXT: umov w12, v0.h[4]
				; CHECK-NEXT: strh w8, [sp, #16]
				; CHECK-NEXT: umov w8, v0.h[5]
				; CHECK-NEXT: umov w10, v0.h[6]
				; CHECK-NEXT: strh w9, [sp, #10]
				; CHECK-NEXT: umov w9, v0.h[7]
				; CHECK-NEXT: strh w11, [sp, #8]
				; CHECK-NEXT: strh w12, [sp, #6]
				; CHECK-NEXT: strh w8, [sp, #4]
				; CHECK-NEXT: strh w10, [sp, #2]
				; CHECK-NEXT: strh w9, [sp]
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [sp]
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: mov sp, x29
				; CHECK-NEXT: ldp x29, x30, [sp], #16 // 16-byte Folded Reload
				; CHECK-NEXT: ret
				%tmp1 = load <16 x i16>, <16 x i16>* %a
				%tmp2 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <16 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8>
				store <16 x i16> %tmp2, <16 x i16>* %a
				ret void
				}

				; Don't use SVE for 128-bit shuffle with two inputs
				define void @test_revv8i16v8i16(<8 x i16>* %a, <8 x i16>* %b, <16 x i16>* %c) #1 {
				; CHECK-LABEL: test_revv8i16v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
				; CHECK-NEXT: sub x9, sp, #48
				; CHECK-NEXT: mov x29, sp
				; CHECK-NEXT: and sp, x9, #0xffffffffffffffe0
				; CHECK-NEXT: .cfi_def_cfa w29, 16
				; CHECK-NEXT: .cfi_offset w30, -8
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: ldr q0, [x1]
				; CHECK-NEXT: orr x9, x8, #0x1e
				; CHECK-NEXT: orr x10, x8, #0x1c
				; CHECK-NEXT: ldr q1, [x0]
				; CHECK-NEXT: orr x12, x8, #0x10
				; CHECK-NEXT: orr x11, x8, #0x18
				; CHECK-NEXT: str h0, [sp, #22]
				; CHECK-NEXT: st1 { v0.h }[4], [x9]
				; CHECK-NEXT: orr x9, x8, #0xe
				; CHECK-NEXT: st1 { v0.h }[5], [x10]
				; CHECK-NEXT: orr x10, x8, #0xc
				; CHECK-NEXT: st1 { v0.h }[3], [x12]
				; CHECK-NEXT: mov w12, #26
				; CHECK-NEXT: st1 { v1.h }[4], [x9]
				; CHECK-NEXT: orr x9, x8, #0x8
				; CHECK-NEXT: st1 { v0.h }[7], [x11]
				; CHECK-NEXT: orr x11, x8, #0x2
				; CHECK-NEXT: st1 { v1.h }[5], [x10]
				; CHECK-NEXT: orr x10, x8, #0x4
				; CHECK-NEXT: st1 { v1.h }[7], [x9]
				; CHECK-NEXT: orr x9, x8, x12
				; CHECK-NEXT: st1 { v1.h }[2], [x11]
				; CHECK-NEXT: mov w11, #10
				; CHECK-NEXT: st1 { v1.h }[1], [x10]
				; CHECK-NEXT: mov w10, #18
				; CHECK-NEXT: st1 { v0.h }[6], [x9]
				; CHECK-NEXT: mov w9, #20
				; CHECK-NEXT: orr x9, x8, x9
				; CHECK-NEXT: orr x10, x8, x10
				; CHECK-NEXT: st1 { v1.h }[3], [x8]
				; CHECK-NEXT: orr x8, x8, x11
				; CHECK-NEXT: str h1, [sp, #6]
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: st1 { v0.h }[1], [x9]
				; CHECK-NEXT: st1 { v0.h }[2], [x10]
				; CHECK-NEXT: st1 { v1.h }[6], [x8]
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [sp]
				; CHECK-NEXT: st1h { z0.h }, p0, [x2]
				; CHECK-NEXT: mov sp, x29
				; CHECK-NEXT: ldp x29, x30, [sp], #16 // 16-byte Folded Reload
				; CHECK-NEXT: ret
				%tmp1 = load <8 x i16>, <8 x i16>* %a
				%tmp2 = load <8 x i16>, <8 x i16>* %b
				%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <16 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12>
				store <16 x i16> %tmp3, <16 x i16>* %c
				ret void
				}

				attributes #0 = { "target-features"="+sve" }
				attributes #1 = { "target-features"="+sve" vscale_range(2,2) }

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Lower shuffles to permute instructions: rev/revb/revh/revwClosedPublic

Details

Diff Detail

Event Timeline