This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.cpp
-
AArch64InstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
fixed-vector-deinterleave.ll
-
neon-bitwise-instructions.ll

Differential D144550

[AArch64] Remove 64bit->128bit vector insert lowering
ClosedPublic

Authored by dmgreen on Feb 22 2023, 3:25 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
samtebbs
bipmis
david-arm
t.p.northover

Commits

rG18af85302200: [AArch64] Remove 64bit->128bit vector insert lowering

Summary

The AArch64 backend, during lowering, will convert an 64bit vector insert to a 128bit vector:

vector_insert %dreg, %v, %idx
=>
%qreg = insert_subvector undef, %dreg, 0
%ins = vector_insert %qreg, %v, %idx
EXTRACT_SUBREG %ins, dsub

This creates a bit of mess in the DAG, and the EXTRACT_SUBREG being a machine nodes makes it difficult to simplify. This patch removes that, treating the 64bit vector insert as legal and handling them with extra tablegen patterns.

The end result is a simpler DAG that is easier to write tablegen patterns for. Unfortunately one of the tests here does get larger, because that simplification now allows it to optimize away sign_extend_inreg from a vector insert due to the bits not being demanded. In that case it now generates both SMOV and UMOV though, requiring more total instructions. This is unfortunate but seems like an unrelated issue.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Feb 22 2023, 3:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2023, 3:25 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

dmgreen requested review of this revision.Feb 22 2023, 3:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2023, 3:25 AM

dmgreen added a child revision: D144086: [AArch64] Load into zero vector patterns.Feb 22 2023, 3:25 AM

dmgreen added a parent revision: D144018: [AArch64] More consistently use buildvector for zero and all-ones constants.

Harbormaster completed remote builds in B215207: Diff 499397.Feb 22 2023, 4:15 AM

dmgreen mentioned this in D144850: [AArch64] Don't remove free sext_inreg(vector_extract(x)) if it leads to multiple extracts.Feb 27 2023, 12:52 AM

dmgreen added a child revision: D144850: [AArch64] Don't remove free sext_inreg(vector_extract(x)) if it leads to multiple extracts.

The idea makes sense I think, but just to put things into context, do you already have a case or patch where we can see the benefit of this?

In D144550#4154607, @SjoerdMeijer wrote:

The idea makes sense I think, but just to put things into context, do you already have a case or patch where we can see the benefit of this?

The insert-load-into-zero from D144086 is helped by this. I wrote them in a slightly odd order due to trying to work through the regressions in this change. Otherwise some of the patterns in that patch would need to look for insert(extractsubreg(zerovec), load, 0), which isn't impossible but it's cleaner without all the extracts.

Speaking of which I've put up D144850 to help with the regressions in combine_srem_sdiv.

Ok, cheers, and this looks very reasonable to me.

This revision is now accepted and ready to land.Feb 27 2023, 5:14 AM

dmgreen mentioned this in rG06daa515b270: [AArch64] Don't remove free sext_inreg(vector_extract(x)) if it leads to….Feb 27 2023, 11:20 AM

This revision was landed with ongoing or failed builds.Mar 1 2023, 1:40 AM

Closed by commit rG18af85302200: [AArch64] Remove 64bit->128bit vector insert lowering (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG18af85302200: [AArch64] Remove 64bit->128bit vector insert lowering.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

36 lines

AArch64InstrInfo.td

57 lines

test/

CodeGen/

AArch64/

fixed-vector-deinterleave.ll

8 lines

neon-bitwise-instructions.ll

2 lines

Diff 501419

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 10,321 Lines • ▼ Show 20 Lines

	/// getExtFactor - Determine the adjustment factor for the position when			/// getExtFactor - Determine the adjustment factor for the position when
	/// generating an "extract from vector registers" instruction.			/// generating an "extract from vector registers" instruction.
	static unsigned getExtFactor(SDValue &V) {			static unsigned getExtFactor(SDValue &V) {
	EVT EltType = V.getValueType().getVectorElementType();			EVT EltType = V.getValueType().getVectorElementType();
	return EltType.getSizeInBits() / 8;			return EltType.getSizeInBits() / 8;
	}			}

	/// NarrowVector - Given a value in the V128 register class, produce the
	/// equivalent value in the V64 register class.
	static SDValue NarrowVector(SDValue V128Reg, SelectionDAG &DAG) {
	EVT VT = V128Reg.getValueType();
	unsigned WideSize = VT.getVectorNumElements();
	MVT EltTy = VT.getVectorElementType().getSimpleVT();
	MVT NarrowTy = MVT::getVectorVT(EltTy, WideSize / 2);
	SDLoc DL(V128Reg);

	return DAG.getTargetExtractSubreg(AArch64::dsub, DL, NarrowTy, V128Reg);
	}

	// Gather data to see if the operation can be modelled as a			// Gather data to see if the operation can be modelled as a
	// shuffle in combination with VEXTs.			// shuffle in combination with VEXTs.
	SDValue AArch64TargetLowering::ReconstructShuffle(SDValue Op,			SDValue AArch64TargetLowering::ReconstructShuffle(SDValue Op,
	SelectionDAG &DAG) const {			SelectionDAG &DAG) const {
	assert(Op.getOpcode() == ISD::BUILD_VECTOR && "Unknown opcode!");			assert(Op.getOpcode() == ISD::BUILD_VECTOR && "Unknown opcode!");
	LLVM_DEBUG(dbgs() << "AArch64TargetLowering::ReconstructShuffle\n");			LLVM_DEBUG(dbgs() << "AArch64TargetLowering::ReconstructShuffle\n");
	SDLoc dl(Op);			SDLoc dl(Op);
	EVT VT = Op.getValueType();			EVT VT = Op.getValueType();
	▲ Show 20 Lines • Show All 2,239 Lines • ▼ Show 20 Lines
	SDValue AArch64TargetLowering::LowerINSERT_VECTOR_ELT(SDValue Op,			SDValue AArch64TargetLowering::LowerINSERT_VECTOR_ELT(SDValue Op,
	SelectionDAG &DAG) const {			SelectionDAG &DAG) const {
	assert(Op.getOpcode() == ISD::INSERT_VECTOR_ELT && "Unknown opcode!");			assert(Op.getOpcode() == ISD::INSERT_VECTOR_ELT && "Unknown opcode!");

	if (useSVEForFixedLengthVectorVT(Op.getValueType(),			if (useSVEForFixedLengthVectorVT(Op.getValueType(),
	Subtarget->forceStreamingCompatibleSVE()))			Subtarget->forceStreamingCompatibleSVE()))
	return LowerFixedLengthInsertVectorElt(Op, DAG);			return LowerFixedLengthInsertVectorElt(Op, DAG);

	// Check for non-constant or out of range lane.
	EVT VT = Op.getOperand(0).getValueType();			EVT VT = Op.getOperand(0).getValueType();

	if (VT.getScalarType() == MVT::i1) {			if (VT.getScalarType() == MVT::i1) {
	EVT VectorVT = getPromotedVTForPredicate(VT);			EVT VectorVT = getPromotedVTForPredicate(VT);
	SDLoc DL(Op);			SDLoc DL(Op);
	SDValue ExtendedVector =			SDValue ExtendedVector =
	DAG.getAnyExtOrTrunc(Op.getOperand(0), DL, VectorVT);			DAG.getAnyExtOrTrunc(Op.getOperand(0), DL, VectorVT);
	SDValue ExtendedValue =			SDValue ExtendedValue =
	DAG.getAnyExtOrTrunc(Op.getOperand(1), DL,			DAG.getAnyExtOrTrunc(Op.getOperand(1), DL,
	VectorVT.getScalarType().getSizeInBits() < 32			VectorVT.getScalarType().getSizeInBits() < 32
	? MVT::i32			? MVT::i32
	: VectorVT.getScalarType());			: VectorVT.getScalarType());
	ExtendedVector =			ExtendedVector =
	DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VectorVT, ExtendedVector,			DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VectorVT, ExtendedVector,
	ExtendedValue, Op.getOperand(2));			ExtendedValue, Op.getOperand(2));
	return DAG.getAnyExtOrTrunc(ExtendedVector, DL, VT);			return DAG.getAnyExtOrTrunc(ExtendedVector, DL, VT);
	}			}

				// Check for non-constant or out of range lane.
	ConstantSDNode *CI = dyn_cast<ConstantSDNode>(Op.getOperand(2));			ConstantSDNode *CI = dyn_cast<ConstantSDNode>(Op.getOperand(2));
	if (!CI \|\| CI->getZExtValue() >= VT.getVectorNumElements())			if (!CI \|\| CI->getZExtValue() >= VT.getVectorNumElements())
	return SDValue();			return SDValue();

	// Insertion/extraction are legal for V128 types.
	if (VT == MVT::v16i8 \|\| VT == MVT::v8i16 \|\| VT == MVT::v4i32 \|\|
	VT == MVT::v2i64 \|\| VT == MVT::v4f32 \|\| VT == MVT::v2f64 \|\|
	VT == MVT::v8f16 \|\| VT == MVT::v8bf16)
	return Op;			return Op;

	if (VT != MVT::v8i8 && VT != MVT::v4i16 && VT != MVT::v2i32 &&
	VT != MVT::v1i64 && VT != MVT::v2f32 && VT != MVT::v4f16 &&
	VT != MVT::v4bf16)
	return SDValue();

	// For V64 types, we perform insertion by expanding the value
	// to a V128 type and perform the insertion on that.
	SDLoc DL(Op);
	SDValue WideVec = WidenVector(Op.getOperand(0), DAG);
	EVT WideTy = WideVec.getValueType();

	SDValue Node = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, WideTy, WideVec,
	Op.getOperand(1), Op.getOperand(2));
	// Re-narrow the resultant vector.
	return NarrowVector(Node, DAG);
	}			}

	SDValue			SDValue
	AArch64TargetLowering::LowerEXTRACT_VECTOR_ELT(SDValue Op,			AArch64TargetLowering::LowerEXTRACT_VECTOR_ELT(SDValue Op,
	SelectionDAG &DAG) const {			SelectionDAG &DAG) const {
	assert(Op.getOpcode() == ISD::EXTRACT_VECTOR_ELT && "Unknown opcode!");			assert(Op.getOpcode() == ISD::EXTRACT_VECTOR_ELT && "Unknown opcode!");
	EVT VT = Op.getOperand(0).getValueType();			EVT VT = Op.getOperand(0).getValueType();

	▲ Show 20 Lines • Show All 11,879 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,926 Lines • ▼ Show 20 Lines	def : Pat<(v4f16 (vector_insert (v4f16 V64:$Rn),
(EXTRACT_SUBREG		(EXTRACT_SUBREG
(INSvi16lane		(INSvi16lane
(v8f16 (INSERT_SUBREG (v8f16 (IMPLICIT_DEF)), V64:$Rn, dsub)),		(v8f16 (INSERT_SUBREG (v8f16 (IMPLICIT_DEF)), V64:$Rn, dsub)),
VectorIndexS:$imm,		VectorIndexS:$imm,
(v8f16 (INSERT_SUBREG (v8f16 (IMPLICIT_DEF)), FPR16:$Rm, hsub)),		(v8f16 (INSERT_SUBREG (v8f16 (IMPLICIT_DEF)), FPR16:$Rm, hsub)),
(i64 0)),		(i64 0)),
dsub)>;		dsub)>;

def : Pat<(vector_insert (v8f16 v8f16:$Rn), (f16 fpimm0),		def : Pat<(vector_insert (v8f16 V128:$Rn), (f16 fpimm0), (i64 VectorIndexH:$imm)),
(i64 VectorIndexH:$imm)),
(INSvi16gpr V128:$Rn, VectorIndexH:$imm, WZR)>;		(INSvi16gpr V128:$Rn, VectorIndexH:$imm, WZR)>;
def : Pat<(vector_insert v4f32:$Rn, (f32 fpimm0),		def : Pat<(vector_insert (v4f16 V64:$Rn), (f16 fpimm0), (i64 VectorIndexH:$imm)),
(i64 VectorIndexS:$imm)),		(EXTRACT_SUBREG (INSvi16gpr (v8f16 (INSERT_SUBREG (v8f16 (IMPLICIT_DEF)), V64:$Rn, dsub)), VectorIndexH:$imm, WZR), dsub)>;
		def : Pat<(vector_insert (v4f32 V128:$Rn), (f32 fpimm0), (i64 VectorIndexS:$imm)),
(INSvi32gpr V128:$Rn, VectorIndexS:$imm, WZR)>;		(INSvi32gpr V128:$Rn, VectorIndexS:$imm, WZR)>;
def : Pat<(vector_insert v2f64:$Rn, (f64 fpimm0),		def : Pat<(vector_insert (v2f32 V64:$Rn), (f32 fpimm0), (i64 VectorIndexS:$imm)),
(i64 VectorIndexD:$imm)),		(EXTRACT_SUBREG (INSvi32gpr (v4f32 (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), V64:$Rn, dsub)), VectorIndexS:$imm, WZR), dsub)>;
		def : Pat<(vector_insert v2f64:$Rn, (f64 fpimm0), (i64 VectorIndexD:$imm)),
(INSvi64gpr V128:$Rn, VectorIndexS:$imm, XZR)>;		(INSvi64gpr V128:$Rn, VectorIndexS:$imm, XZR)>;

def : Pat<(v8f16 (vector_insert (v8f16 V128:$Rn),		def : Pat<(v8f16 (vector_insert (v8f16 V128:$Rn),
(f16 FPR16:$Rm), (i64 VectorIndexH:$imm))),		(f16 FPR16:$Rm), (i64 VectorIndexH:$imm))),
(INSvi16lane		(INSvi16lane
V128:$Rn, VectorIndexH:$imm,		V128:$Rn, VectorIndexH:$imm,
(v8f16 (INSERT_SUBREG (v8f16 (IMPLICIT_DEF)), FPR16:$Rm, hsub)),		(v8f16 (INSERT_SUBREG (v8f16 (IMPLICIT_DEF)), FPR16:$Rm, hsub)),
(i64 0))>;		(i64 0))>;
Show All 32 Lines	def : Pat<(v4f32 (vector_insert (v4f32 V128:$Rn),
(i64 0))>;		(i64 0))>;
def : Pat<(v2f64 (vector_insert (v2f64 V128:$Rn),		def : Pat<(v2f64 (vector_insert (v2f64 V128:$Rn),
(f64 FPR64:$Rm), (i64 VectorIndexD:$imm))),		(f64 FPR64:$Rm), (i64 VectorIndexD:$imm))),
(INSvi64lane		(INSvi64lane
V128:$Rn, VectorIndexD:$imm,		V128:$Rn, VectorIndexD:$imm,
(v2f64 (INSERT_SUBREG (v2f64 (IMPLICIT_DEF)), FPR64:$Rm, dsub)),		(v2f64 (INSERT_SUBREG (v2f64 (IMPLICIT_DEF)), FPR64:$Rm, dsub)),
(i64 0))>;		(i64 0))>;

		def : Pat<(v2i32 (vector_insert (v2i32 V64:$Rn), (i32 GPR32:$Rm), (i64 VectorIndexS:$imm))),
		(EXTRACT_SUBREG
		(INSvi32gpr (v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), V64:$Rn, dsub)),
		VectorIndexS:$imm, GPR32:$Rm),
		dsub)>;
		def : Pat<(v4i16 (vector_insert (v4i16 V64:$Rn), (i32 GPR32:$Rm), (i64 VectorIndexH:$imm))),
		(EXTRACT_SUBREG
		(INSvi16gpr (v8i16 (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), V64:$Rn, dsub)),
		VectorIndexH:$imm, GPR32:$Rm),
		dsub)>;
		def : Pat<(v8i8 (vector_insert (v8i8 V64:$Rn), (i32 GPR32:$Rm), (i64 VectorIndexB:$imm))),
		(EXTRACT_SUBREG
		(INSvi8gpr (v16i8 (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)), V64:$Rn, dsub)),
		VectorIndexB:$imm, GPR32:$Rm),
		dsub)>;

// Copy an element at a constant index in one vector into a constant indexed		// Copy an element at a constant index in one vector into a constant indexed
// element of another.		// element of another.
// FIXME refactor to a shared class/dev parameterized on vector type, vector		// FIXME refactor to a shared class/dev parameterized on vector type, vector
// index type and INS extension		// index type and INS extension
def : Pat<(v16i8 (int_aarch64_neon_vcopy_lane		def : Pat<(v16i8 (int_aarch64_neon_vcopy_lane
(v16i8 V128:$Vd), VectorIndexB:$idx, (v16i8 V128:$Vs),		(v16i8 V128:$Vd), VectorIndexB:$idx, (v16i8 V128:$Vs),
VectorIndexB:$idx2)),		VectorIndexB:$idx2)),
(v16i8 (INSvi8lane		(v16i8 (INSvi8lane
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	def : Pat<(VT64 (vector_insert V64:$src,
dsub)>;		dsub)>;
}		}

defm : Neon_INS_elt_pattern<v8f16, v4f16, f16, INSvi16lane>;		defm : Neon_INS_elt_pattern<v8f16, v4f16, f16, INSvi16lane>;
defm : Neon_INS_elt_pattern<v8bf16, v4bf16, bf16, INSvi16lane>;		defm : Neon_INS_elt_pattern<v8bf16, v4bf16, bf16, INSvi16lane>;
defm : Neon_INS_elt_pattern<v4f32, v2f32, f32, INSvi32lane>;		defm : Neon_INS_elt_pattern<v4f32, v2f32, f32, INSvi32lane>;
defm : Neon_INS_elt_pattern<v2f64, v1f64, f64, INSvi64lane>;		defm : Neon_INS_elt_pattern<v2f64, v1f64, f64, INSvi64lane>;

		defm : Neon_INS_elt_pattern<v16i8, v8i8, i32, INSvi8lane>;
		defm : Neon_INS_elt_pattern<v8i16, v4i16, i32, INSvi16lane>;
		defm : Neon_INS_elt_pattern<v4i32, v2i32, i32, INSvi32lane>;
		defm : Neon_INS_elt_pattern<v2i64, v1i64, i64, INSvi64lane>;

// Insert from bitcast		// Insert from bitcast
// vector_insert(bitcast(f32 src), n, lane) -> INSvi32lane(src, lane, INSERT_SUBREG(-, n), 0)		// vector_insert(bitcast(f32 src), n, lane) -> INSvi32lane(src, lane, INSERT_SUBREG(-, n), 0)
def : Pat<(v4i32 (vector_insert v4i32:$src, (i32 (bitconvert (f32 FPR32:$Sn))), imm:$Immd)),		def : Pat<(v4i32 (vector_insert v4i32:$src, (i32 (bitconvert (f32 FPR32:$Sn))), imm:$Immd)),
(INSvi32lane V128:$src, imm:$Immd, (INSERT_SUBREG (IMPLICIT_DEF), FPR32:$Sn, ssub), 0)>;		(INSvi32lane V128:$src, imm:$Immd, (INSERT_SUBREG (IMPLICIT_DEF), FPR32:$Sn, ssub), 0)>;
		def : Pat<(v2i32 (vector_insert v2i32:$src, (i32 (bitconvert (f32 FPR32:$Sn))), imm:$Immd)),
		(EXTRACT_SUBREG
		(INSvi32lane (v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), V64:$src, dsub)),
		imm:$Immd, (INSERT_SUBREG (IMPLICIT_DEF), FPR32:$Sn, ssub), 0),
		dsub)>;
def : Pat<(v2i64 (vector_insert v2i64:$src, (i64 (bitconvert (f64 FPR64:$Sn))), imm:$Immd)),		def : Pat<(v2i64 (vector_insert v2i64:$src, (i64 (bitconvert (f64 FPR64:$Sn))), imm:$Immd)),
(INSvi64lane V128:$src, imm:$Immd, (INSERT_SUBREG (IMPLICIT_DEF), FPR64:$Sn, dsub), 0)>;		(INSvi64lane V128:$src, imm:$Immd, (INSERT_SUBREG (IMPLICIT_DEF), FPR64:$Sn, dsub), 0)>;

// bitcast of an extract		// bitcast of an extract
// f32 bitcast(vector_extract(v4i32 src, lane)) -> EXTRACT_SUBREG(INSvi32lane(-, 0, src, lane))		// f32 bitcast(vector_extract(v4i32 src, lane)) -> EXTRACT_SUBREG(INSvi32lane(-, 0, src, lane))
def : Pat<(f32 (bitconvert (i32 (vector_extract v4i32:$src, imm:$Immd)))),		def : Pat<(f32 (bitconvert (i32 (vector_extract v4i32:$src, imm:$Immd)))),
(EXTRACT_SUBREG (INSvi32lane (IMPLICIT_DEF), 0, V128:$src, imm:$Immd), ssub)>;		(EXTRACT_SUBREG (INSvi32lane (IMPLICIT_DEF), 0, V128:$src, imm:$Immd), ssub)>;
def : Pat<(f32 (bitconvert (i32 (vector_extract v4i32:$src, 0)))),		def : Pat<(f32 (bitconvert (i32 (vector_extract v4i32:$src, 0)))),
▲ Show 20 Lines • Show All 1,212 Lines • ▼ Show 20 Lines
// Generate LD1 for extload if memory type does not match the		// Generate LD1 for extload if memory type does not match the
// destination type, for example:		// destination type, for example:
//		//
// (v4i32 (insert_vector_elt (load anyext from i8) idx))		// (v4i32 (insert_vector_elt (load anyext from i8) idx))
//		//
// In this case, the index must be adjusted to match LD1 type.		// In this case, the index must be adjusted to match LD1 type.
//		//
class Ld1Lane128IdxOpPat<SDPatternOperator scalar_load, Operand		class Ld1Lane128IdxOpPat<SDPatternOperator scalar_load, Operand
VecIndex, ValueType VTy, ValueType STy,		VecIndex, ValueType VTy, ValueType STy,
Instruction LD1, SDNodeXForm IdxOp>		Instruction LD1, SDNodeXForm IdxOp>
: Pat<(vector_insert (VTy VecListOne128:$Rd),		: Pat<(vector_insert (VTy VecListOne128:$Rd),
(STy (scalar_load GPR64sp:$Rn)), VecIndex:$idx),		(STy (scalar_load GPR64sp:$Rn)), VecIndex:$idx),
(LD1 VecListOne128:$Rd, (IdxOp VecIndex:$idx), GPR64sp:$Rn)>;		(LD1 VecListOne128:$Rd, (IdxOp VecIndex:$idx), GPR64sp:$Rn)>;

		class Ld1Lane64IdxOpPat<SDPatternOperator scalar_load, Operand VecIndex,
		ValueType VTy, ValueType STy, Instruction LD1,
		SDNodeXForm IdxOp>
		: Pat<(vector_insert (VTy VecListOne64:$Rd),
		(STy (scalar_load GPR64sp:$Rn)), VecIndex:$idx),
		(EXTRACT_SUBREG
		(LD1 (SUBREG_TO_REG (i32 0), VecListOne64:$Rd, dsub),
		(IdxOp VecIndex:$idx), GPR64sp:$Rn),
		dsub)>;

def VectorIndexStoH : SDNodeXForm<imm, [{		def VectorIndexStoH : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(N->getZExtValue() * 2, SDLoc(N), MVT::i64);		return CurDAG->getTargetConstant(N->getZExtValue() * 2, SDLoc(N), MVT::i64);
}]>;		}]>;
def VectorIndexStoB : SDNodeXForm<imm, [{		def VectorIndexStoB : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(N->getZExtValue() * 4, SDLoc(N), MVT::i64);		return CurDAG->getTargetConstant(N->getZExtValue() * 4, SDLoc(N), MVT::i64);
}]>;		}]>;
def VectorIndexHtoB : SDNodeXForm<imm, [{		def VectorIndexHtoB : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(N->getZExtValue() * 2, SDLoc(N), MVT::i64);		return CurDAG->getTargetConstant(N->getZExtValue() * 2, SDLoc(N), MVT::i64);
}]>;		}]>;

def : Ld1Lane128IdxOpPat<extloadi16, VectorIndexS, v4i32, i32, LD1i16, VectorIndexStoH>;		def : Ld1Lane128IdxOpPat<extloadi16, VectorIndexS, v4i32, i32, LD1i16, VectorIndexStoH>;
def : Ld1Lane128IdxOpPat<extloadi8, VectorIndexS, v4i32, i32, LD1i8, VectorIndexStoB>;		def : Ld1Lane128IdxOpPat<extloadi8, VectorIndexS, v4i32, i32, LD1i8, VectorIndexStoB>;
def : Ld1Lane128IdxOpPat<extloadi8, VectorIndexH, v8i16, i32, LD1i8, VectorIndexHtoB>;		def : Ld1Lane128IdxOpPat<extloadi8, VectorIndexH, v8i16, i32, LD1i8, VectorIndexHtoB>;

		def : Ld1Lane64IdxOpPat<extloadi16, VectorIndexS, v2i32, i32, LD1i16, VectorIndexStoH>;
		def : Ld1Lane64IdxOpPat<extloadi8, VectorIndexS, v2i32, i32, LD1i8, VectorIndexStoB>;
		def : Ld1Lane64IdxOpPat<extloadi8, VectorIndexH, v4i16, i32, LD1i8, VectorIndexHtoB>;

// Same as above, but the first element is populated using		// Same as above, but the first element is populated using
// scalar_to_vector + insert_subvector instead of insert_vector_elt.		// scalar_to_vector + insert_subvector instead of insert_vector_elt.
let Predicates = [NotInStreamingSVEMode] in {		let Predicates = [NotInStreamingSVEMode] in {
class Ld1Lane128FirstElm<ValueType ResultTy, ValueType VecTy,		class Ld1Lane128FirstElm<ValueType ResultTy, ValueType VecTy,
SDPatternOperator ExtLoad, Instruction LD1>		SDPatternOperator ExtLoad, Instruction LD1>
: Pat<(ResultTy (scalar_to_vector (i32 (ExtLoad GPR64sp:$Rn)))),		: Pat<(ResultTy (scalar_to_vector (i32 (ExtLoad GPR64sp:$Rn)))),
(ResultTy (EXTRACT_SUBREG		(ResultTy (EXTRACT_SUBREG
(LD1 (VecTy (IMPLICIT_DEF)), 0, GPR64sp:$Rn), dsub))>;		(LD1 (VecTy (IMPLICIT_DEF)), 0, GPR64sp:$Rn), dsub))>;
▲ Show 20 Lines • Show All 1,498 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/fixed-vector-deinterleave.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=aarch64-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-linux-gnu \| FileCheck %s

	define {<2 x half>, <2 x half>} @vector_deinterleave_v2f16_v4f16(<4 x half> %vec) {			define {<2 x half>, <2 x half>} @vector_deinterleave_v2f16_v4f16(<4 x half> %vec) {
	; CHECK-LABEL: vector_deinterleave_v2f16_v4f16:			; CHECK-LABEL: vector_deinterleave_v2f16_v4f16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0			; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
	; CHECK-NEXT: dup v1.2s, v0.s[1]			; CHECK-NEXT: dup v2.2s, v0.s[1]
	; CHECK-NEXT: mov v2.16b, v0.16b			; CHECK-NEXT: mov v1.16b, v2.16b
	; CHECK-NEXT: mov v2.h[1], v1.h[0]
	; CHECK-NEXT: mov v1.h[0], v0.h[1]			; CHECK-NEXT: mov v1.h[0], v0.h[1]
				; CHECK-NEXT: mov v0.h[1], v2.h[0]
	; CHECK-NEXT: // kill: def $d1 killed $d1 killed $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 killed $q1
	; CHECK-NEXT: fmov d0, d2			; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%retval = call {<2 x half>, <2 x half>} @llvm.experimental.vector.deinterleave2.v4f16(<4 x half> %vec)			%retval = call {<2 x half>, <2 x half>} @llvm.experimental.vector.deinterleave2.v4f16(<4 x half> %vec)
	ret {<2 x half>, <2 x half>} %retval			ret {<2 x half>, <2 x half>} %retval
	}			}

	define {<4 x half>, <4 x half>} @vector_deinterleave_v4f16_v8f16(<8 x half> %vec) {			define {<4 x half>, <4 x half>} @vector_deinterleave_v4f16_v8f16(<8 x half> %vec) {
	; CHECK-LABEL: vector_deinterleave_v4f16_v8f16:			; CHECK-LABEL: vector_deinterleave_v4f16_v8f16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/neon-bitwise-instructions.ll

Show First 20 Lines • Show All 1,015 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret <8 x i16> %c		ret <8 x i16> %c
}		}

define <4 x i16> @vselect_equivalent_shuffle_v4i16(<4 x i16> %a, <4 x i16> %b) {		define <4 x i16> @vselect_equivalent_shuffle_v4i16(<4 x i16> %a, <4 x i16> %b) {
; CHECK-LABEL: vselect_equivalent_shuffle_v4i16:		; CHECK-LABEL: vselect_equivalent_shuffle_v4i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0		; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1		; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
; CHECK-NEXT: mov v0.h[2], v1.h[1]
; CHECK-NEXT: mov v0.h[1], v1.h[0]		; CHECK-NEXT: mov v0.h[1], v1.h[0]
		; CHECK-NEXT: mov v0.h[2], v1.h[1]
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0		; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%c = shufflevector <4 x i16> %a, <4 x i16> %b, <4 x i32> <i32 0, i32 4, i32 5, i32 3>		%c = shufflevector <4 x i16> %a, <4 x i16> %b, <4 x i32> <i32 0, i32 4, i32 5, i32 3>
ret <4 x i16> %c		ret <4 x i16> %c
}		}

define <4 x i32> @vselect_equivalent_shuffle_v4i32(<4 x i32> %a, <4 x i32> %b) {		define <4 x i32> @vselect_equivalent_shuffle_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: vselect_equivalent_shuffle_v4i32:		; CHECK-LABEL: vselect_equivalent_shuffle_v4i32:
▲ Show 20 Lines • Show All 773 Lines • Show Last 20 Lines