This is an archive of the discontinued LLVM Phabricator instance.

[TTI, AArch64] Add transpose shuffle kind
ClosedPublic

Authored by mssimpso on Apr 23 2018, 1:05 PM.

Download Raw Diff

Details

Reviewers

rengolin
samparker
evandro
hfinkel
javed.absar

Commits

rGb4096ebe2600: [TTI, AArch64] Add transpose shuffle kind
rL330941: [TTI, AArch64] Add transpose shuffle kind

Summary

This patch adds a new shuffle kind useful for transposing a 2xn matrix. These transpose shuffle masks read corresponding even- or odd-numbered vector elements from two n-dimensional source vectors and write each result into consecutive elements of an n-dimensional destination vector. The transpose shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such, this patch also considers transpose shuffles in the AArch64 implementation of getShuffleCost.

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso created this revision.Apr 23 2018, 1:05 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 23 2018, 1:05 PM

Herald added a subscriber: kristof.beyls. · View Herald Transcript

javed.absar added inline comments.Apr 23 2018, 2:27 PM

include/llvm/Analysis/TargetTransformInfo.h
642 ↗	(On Diff #143623)	Not sure why indentation change is needed
lib/Analysis/TargetTransformInfo.cpp
707 ↗	(On Diff #143623)	would it be more accurate to say '..such that one vector contains _interleaved_ elements from all the even numbered rows ..."
737 ↗	(On Diff #143623)	Test 4 and 5 could be perhaps combined into one loop?

mssimpso added inline comments.Apr 24 2018, 6:42 AM

include/llvm/Analysis/TargetTransformInfo.h
642 ↗	(On Diff #143623)	Ah, clang-format decided to do that. I will undo the formatting for now.
lib/Analysis/TargetTransformInfo.cpp
707 ↗	(On Diff #143623)	Sounds good to me.
737 ↗	(On Diff #143623)	Yes, that would probably be a little more straightforward. Thanks!

Addressed Javed's comments. Thanks!

LGTM. Probably wait a day before committing in case Renato/others have a comment/suggestion.

This revision is now accepted and ready to land.Apr 24 2018, 12:57 PM

In D45982#1077263, @javed.absar wrote:

LGTM. Probably wait a day before committing in case Renato/others have a comment/suggestion.

Thanks for the review, Javed! Will do.

Closed by commit rL330941: [TTI, AArch64] Add transpose shuffle kind (authored by mssimpso). · Explain WhyApr 26 2018, 6:51 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

TargetTransformInfo.h

1 line

CodeGen/

BasicTTIImpl.h

10 lines

lib/

Analysis/

TargetTransformInfo.cpp

84 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

2 lines

AArch64TargetTransformInfo.cpp

27 lines

test/

Analysis/

CostModel/

AArch64/

shuffle-transpose.ll

40 lines

Diff 144108

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 636 Lines • ▼ Show 20 Lines	public:
/// \name Vector Target Information		/// \name Vector Target Information
/// @{		/// @{

/// \brief The various kinds of shuffle patterns for vector queries.		/// \brief The various kinds of shuffle patterns for vector queries.
enum ShuffleKind {		enum ShuffleKind {
SK_Broadcast, ///< Broadcast element 0 to all other elements.		SK_Broadcast, ///< Broadcast element 0 to all other elements.
SK_Reverse, ///< Reverse the order of the vector.		SK_Reverse, ///< Reverse the order of the vector.
SK_Alternate, ///< Choose alternate elements from vector.		SK_Alternate, ///< Choose alternate elements from vector.
		SK_Transpose, ///< Transpose two vectors.
SK_InsertSubvector, ///< InsertSubvector. Index indicates start offset.		SK_InsertSubvector, ///< InsertSubvector. Index indicates start offset.
SK_ExtractSubvector,///< ExtractSubvector Index indicates start offset.		SK_ExtractSubvector,///< ExtractSubvector Index indicates start offset.
SK_PermuteTwoSrc, ///< Merge elements from two source vectors into one		SK_PermuteTwoSrc, ///< Merge elements from two source vectors into one
///< with any shuffle mask.		///< with any shuffle mask.
SK_PermuteSingleSrc ///< Shuffle elements of single source vector with any		SK_PermuteSingleSrc ///< Shuffle elements of single source vector with any
///< shuffle mask.		///< shuffle mask.
};		};

▲ Show 20 Lines • Show All 1,019 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 547 Lines • ▼ Show 20 Lines	unsigned getArithmeticInstrCost(
}		}

// We don't know anything about this scalar instruction.		// We don't know anything about this scalar instruction.
return OpCost;		return OpCost;
}		}

unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) {		Type *SubTp) {
if (Kind == TTI::SK_Alternate \|\| Kind == TTI::SK_PermuteTwoSrc \|\|		switch (Kind) {
Kind == TTI::SK_PermuteSingleSrc) {		case TTI::SK_Alternate:
		case TTI::SK_Transpose:
		case TTI::SK_PermuteSingleSrc:
		case TTI::SK_PermuteTwoSrc:
return getPermuteShuffleOverhead(Tp);		return getPermuteShuffleOverhead(Tp);
}		default:
return 1;		return 1;
}		}
		}

unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
const Instruction *I = nullptr) {		const Instruction *I = nullptr) {
const TargetLoweringBase *TLI = getTLI();		const TargetLoweringBase *TLI = getTLI();
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");
std::pair<unsigned, MVT> SrcLT = TLI->getTypeLegalizationCost(DL, Src);		std::pair<unsigned, MVT> SrcLT = TLI->getTypeLegalizationCost(DL, Src);
std::pair<unsigned, MVT> DstLT = TLI->getTypeLegalizationCost(DL, Dst);		std::pair<unsigned, MVT> DstLT = TLI->getTypeLegalizationCost(DL, Dst);
▲ Show 20 Lines • Show All 814 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 677 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < MaskSize && isAlternate; ++i) {
if (Mask[i] < 0)		if (Mask[i] < 0)
continue;		continue;
isAlternate = Mask[i] == (int)((i & 1) ? i : MaskSize + i);		isAlternate = Mask[i] == (int)((i & 1) ? i : MaskSize + i);
}		}

return isAlternate;		return isAlternate;
}		}

		static bool isTransposeVectorMask(ArrayRef<int> Mask) {
		// Transpose vector masks transpose a 2xn matrix. They read corresponding
		// even- or odd-numbered vector elements from two n-dimensional source
		// vectors and write each result into consecutive elements of an
		// n-dimensional destination vector. Two shuffles are necessary to complete
		// the transpose, one for the even elements and another for the odd elements.
		// This description closely follows how the TRN1 and TRN2 AArch64
		// instructions operate.
		//
		// For example, a simple 2x2 matrix can be transposed with:
		//
		// ; Original matrix
		// m0 = <a, b>
		// m1 = <c, d>
		//
		// ; Transposed matrix
		// t0 = <a, c> = shufflevector m0, m1, <0, 2>
		// t1 = <b, d> = shufflevector m0, m1, <1, 3>
		//
		// For matrices having greater than n columns, the resulting nx2 transposed
		// matrix is stored in two result vectors such that one vector contains
		// interleaved elements from all the even-numbered rows and the other vector
		// contains interleaved elements from all the odd-numbered rows. For example,
		// a 2x4 matrix can be transposed with:
		//
		// ; Original matrix
		// m0 = <a, b, c, d>
		// m1 = <e, f, g, h>
		//
		// ; Transposed matrix
		// t0 = <a, e, c, g> = shufflevector m0, m1 <0, 4, 2, 6>
		// t1 = <b, f, d, h> = shufflevector m0, m1 <1, 5, 3, 7>
		//
		// The above explanation places limitations on what valid transpose masks can
		// look like. These limitations are defined by the checks below.
		//
		// 1. The number of elements in the mask must be a power of two.
		if (!isPowerOf2_32(Mask.size()))
		return false;

		// 2. The first element of the mask must be either a zero (for the
		// even-numbered vector elements) or a one (for the odd-numbered vector
		// elements).
		if (Mask[0] != 0 && Mask[0] != 1)
		return false;

		// 3. The difference between the first two elements must be equal to the
		// number of elements in the mask.
		if (Mask[1] - Mask[0] != (int)Mask.size())
		return false;

		// 4. The difference between consecutive even-numbered and odd-numbered
		// elements must be equal to two.
		for (int I = 2; I < (int)Mask.size(); ++I)
		if (Mask[I] - Mask[I - 2] != 2)
		return false;

		return true;
		}

static TargetTransformInfo::OperandValueKind getOperandInfo(Value *V) {		static TargetTransformInfo::OperandValueKind getOperandInfo(Value *V) {
TargetTransformInfo::OperandValueKind OpInfo =		TargetTransformInfo::OperandValueKind OpInfo =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;

// Check for a splat of a constant or for a non uniform vector of constants.		// Check for a splat of a constant or for a non uniform vector of constants.
if (isa<ConstantVector>(V) \|\| isa<ConstantDataVector>(V)) {		if (isa<ConstantVector>(V) \|\| isa<ConstantDataVector>(V)) {
OpInfo = TargetTransformInfo::OK_NonUniformConstantValue;		OpInfo = TargetTransformInfo::OK_NonUniformConstantValue;
if (cast<Constant>(V)->getSplatValue() != nullptr)		if (cast<Constant>(V)->getSplatValue() != nullptr)
▲ Show 20 Lines • Show All 440 Lines • ▼ Show 20 Lines	int TargetTransformInfo::getInstructionThroughput(const Instruction *I) const {
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
const ShuffleVectorInst *Shuffle = cast<ShuffleVectorInst>(I);		const ShuffleVectorInst *Shuffle = cast<ShuffleVectorInst>(I);
Type *VecTypOp0 = Shuffle->getOperand(0)->getType();		Type *VecTypOp0 = Shuffle->getOperand(0)->getType();
unsigned NumVecElems = VecTypOp0->getVectorNumElements();		unsigned NumVecElems = VecTypOp0->getVectorNumElements();
SmallVector<int, 16> Mask = Shuffle->getShuffleMask();		SmallVector<int, 16> Mask = Shuffle->getShuffleMask();

if (NumVecElems == Mask.size()) {		if (NumVecElems == Mask.size()) {
if (isReverseVectorMask(Mask))		if (isReverseVectorMask(Mask))
return getShuffleCost(TargetTransformInfo::SK_Reverse, VecTypOp0,		return TTIImpl->getShuffleCost(TargetTransformInfo::SK_Reverse,
0, nullptr);		VecTypOp0, 0, nullptr);
if (isAlternateVectorMask(Mask))		if (isAlternateVectorMask(Mask))
return getShuffleCost(TargetTransformInfo::SK_Alternate,		return TTIImpl->getShuffleCost(TargetTransformInfo::SK_Alternate,
		VecTypOp0, 0, nullptr);

		if (isTransposeVectorMask(Mask))
		return TTIImpl->getShuffleCost(TargetTransformInfo::SK_Transpose,
VecTypOp0, 0, nullptr);		VecTypOp0, 0, nullptr);

if (isZeroEltBroadcastVectorMask(Mask))		if (isZeroEltBroadcastVectorMask(Mask))
return getShuffleCost(TargetTransformInfo::SK_Broadcast,		return TTIImpl->getShuffleCost(TargetTransformInfo::SK_Broadcast,
VecTypOp0, 0, nullptr);		VecTypOp0, 0, nullptr);

if (isSingleSourceVectorMask(Mask))		if (isSingleSourceVectorMask(Mask))
return getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,		return TTIImpl->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
VecTypOp0, 0, nullptr);		VecTypOp0, 0, nullptr);

return getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc,		return TTIImpl->getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc,
VecTypOp0, 0, nullptr);		VecTypOp0, 0, nullptr);
}		}

return -1;		return -1;
}		}
case Instruction::Call:		case Instruction::Call:
if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {		if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
SmallVector<Value *, 4> Args(II->arg_operands());		SmallVector<Value *, 4> Args(II->arg_operands());

▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	bool shouldExpandReduction(const IntrinsicInst *II) const {
return false;		return false;
}		}

bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const;		TTI::ReductionFlags Flags) const;

int getArithmeticReductionCost(unsigned Opcode, Type *Ty,		int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
bool IsPairwiseForm);		bool IsPairwiseForm);

		int getShuffleCost(TTI::ShuffleKind Kind, Type Tp, int Index, Type SubTp);
/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AARCH64_AARCH64TARGETTRANSFORMINFO_H		#endif // LLVM_LIB_TARGET_AARCH64_AARCH64TARGETTRANSFORMINFO_H

llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 906 Lines • ▼ Show 20 Lines	static const CostTblEntry CostTblNoPairwise[]{
{ISD::ADD, MVT::v4i32, 1},		{ISD::ADD, MVT::v4i32, 1},
};		};

if (const auto *Entry = CostTableLookup(CostTblNoPairwise, ISD, MTy))		if (const auto *Entry = CostTableLookup(CostTblNoPairwise, ISD, MTy))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwiseForm);		return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwiseForm);
}		}

		int AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
		Type *SubTp) {

		// Transpose shuffle kinds can be performed with 'trn1/trn2' and 'zip1/zip2'
		// instructions.
		if (Kind == TTI::SK_Transpose) {
		static const CostTblEntry TransposeTbl[] = {
		{ISD::VECTOR_SHUFFLE, MVT::v8i8, 1},
		{ISD::VECTOR_SHUFFLE, MVT::v16i8, 1},
		{ISD::VECTOR_SHUFFLE, MVT::v4i16, 1},
		{ISD::VECTOR_SHUFFLE, MVT::v8i16, 1},
		{ISD::VECTOR_SHUFFLE, MVT::v2i32, 1},
		{ISD::VECTOR_SHUFFLE, MVT::v4i32, 1},
		{ISD::VECTOR_SHUFFLE, MVT::v2i64, 1},
		{ISD::VECTOR_SHUFFLE, MVT::v2f32, 1},
		{ISD::VECTOR_SHUFFLE, MVT::v4f32, 1},
		{ISD::VECTOR_SHUFFLE, MVT::v2f64, 1},
		};
		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
		if (const auto *Entry =
		CostTableLookup(TransposeTbl, ISD::VECTOR_SHUFFLE, LT.second))
		return LT.first * Entry->Cost;
		}

		return BaseT::getShuffleCost(Kind, Tp, Index, SubTp);
		}

llvm/trunk/test/Analysis/CostModel/AArch64/shuffle-transpose.ll

	; RUN: opt < %s -mtriple=aarch64--linux-gnu -cost-model -analyze \| FileCheck %s --check-prefix=COST			; RUN: opt < %s -mtriple=aarch64--linux-gnu -cost-model -analyze \| FileCheck %s --check-prefix=COST
	; RUN: llc < %s -mtriple=aarch64--linux-gnu \| FileCheck %s --check-prefix=CODE			; RUN: llc < %s -mtriple=aarch64--linux-gnu \| FileCheck %s --check-prefix=CODE

	; COST-LABEL: trn1.v8i8			; COST-LABEL: trn1.v8i8
	; COST: Found an estimated cost of 42 for instruction: %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
	; CODE-LABEL: trn1.v8i8			; CODE-LABEL: trn1.v8i8
	; CODE: trn1 v0.8b, v0.8b, v1.8b			; CODE: trn1 v0.8b, v0.8b, v1.8b
	define <8 x i8> @trn1.v8i8(<8 x i8> %v0, <8 x i8> %v1) {			define <8 x i8> @trn1.v8i8(<8 x i8> %v0, <8 x i8> %v1) {
	%tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>			%tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
	ret <8 x i8> %tmp0			ret <8 x i8> %tmp0
	}			}

	; COST-LABEL: trn2.v8i8			; COST-LABEL: trn2.v8i8
	; COST: Found an estimated cost of 42 for instruction: %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
	; CODE-LABEL: trn2.v8i8			; CODE-LABEL: trn2.v8i8
	; CODE: trn2 v0.8b, v0.8b, v1.8b			; CODE: trn2 v0.8b, v0.8b, v1.8b
	define <8 x i8> @trn2.v8i8(<8 x i8> %v0, <8 x i8> %v1) {			define <8 x i8> @trn2.v8i8(<8 x i8> %v0, <8 x i8> %v1) {
	%tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>			%tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
	ret <8 x i8> %tmp0			ret <8 x i8> %tmp0
	}			}

	; COST-LABEL: trn1.v16i8			; COST-LABEL: trn1.v16i8
	; COST: Found an estimated cost of 90 for instruction: %tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30>
	; CODE-LABEL: trn1.v16i8			; CODE-LABEL: trn1.v16i8
	; CODE: trn1 v0.16b, v0.16b, v1.16b			; CODE: trn1 v0.16b, v0.16b, v1.16b
	define <16 x i8> @trn1.v16i8(<16 x i8> %v0, <16 x i8> %v1) {			define <16 x i8> @trn1.v16i8(<16 x i8> %v0, <16 x i8> %v1) {
	%tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30>			%tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30>
	ret <16 x i8> %tmp0			ret <16 x i8> %tmp0
	}			}

	; COST-LABEL: trn2.v16i8			; COST-LABEL: trn2.v16i8
	; COST: Found an estimated cost of 90 for instruction: %tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 1, i32 17, i32 3, i32 19, i32 5, i32 21, i32 7, i32 23, i32 9, i32 25, i32 11, i32 27, i32 13, i32 29, i32 15, i32 31>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 1, i32 17, i32 3, i32 19, i32 5, i32 21, i32 7, i32 23, i32 9, i32 25, i32 11, i32 27, i32 13, i32 29, i32 15, i32 31>
	; CODE-LABEL: trn2.v16i8			; CODE-LABEL: trn2.v16i8
	; CODE: trn2 v0.16b, v0.16b, v1.16b			; CODE: trn2 v0.16b, v0.16b, v1.16b
	define <16 x i8> @trn2.v16i8(<16 x i8> %v0, <16 x i8> %v1) {			define <16 x i8> @trn2.v16i8(<16 x i8> %v0, <16 x i8> %v1) {
	%tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 1, i32 17, i32 3, i32 19, i32 5, i32 21, i32 7, i32 23, i32 9, i32 25, i32 11, i32 27, i32 13, i32 29, i32 15, i32 31>			%tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 1, i32 17, i32 3, i32 19, i32 5, i32 21, i32 7, i32 23, i32 9, i32 25, i32 11, i32 27, i32 13, i32 29, i32 15, i32 31>
	ret <16 x i8> %tmp0			ret <16 x i8> %tmp0
	}			}

	; COST-LABEL: trn1.v4i16			; COST-LABEL: trn1.v4i16
	; COST: Found an estimated cost of 18 for instruction: %tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	; CODE-LABEL: trn1.v4i16			; CODE-LABEL: trn1.v4i16
	; CODE: trn1 v0.4h, v0.4h, v1.4h			; CODE: trn1 v0.4h, v0.4h, v1.4h
	define <4 x i16> @trn1.v4i16(<4 x i16> %v0, <4 x i16> %v1) {			define <4 x i16> @trn1.v4i16(<4 x i16> %v0, <4 x i16> %v1) {
	%tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			%tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	ret <4 x i16> %tmp0			ret <4 x i16> %tmp0
	}			}

	; COST-LABEL: trn2.v4i16			; COST-LABEL: trn2.v4i16
	; COST: Found an estimated cost of 18 for instruction: %tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	; CODE-LABEL: trn2.v4i16			; CODE-LABEL: trn2.v4i16
	; CODE: trn2 v0.4h, v0.4h, v1.4h			; CODE: trn2 v0.4h, v0.4h, v1.4h
	define <4 x i16> @trn2.v4i16(<4 x i16> %v0, <4 x i16> %v1) {			define <4 x i16> @trn2.v4i16(<4 x i16> %v0, <4 x i16> %v1) {
	%tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			%tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	ret <4 x i16> %tmp0			ret <4 x i16> %tmp0
	}			}

	; COST-LABEL: trn1.v8i16			; COST-LABEL: trn1.v8i16
	; COST: Found an estimated cost of 42 for instruction: %tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
	; CODE-LABEL: trn1.v8i16			; CODE-LABEL: trn1.v8i16
	; CODE: trn1 v0.8h, v0.8h, v1.8h			; CODE: trn1 v0.8h, v0.8h, v1.8h
	define <8 x i16> @trn1.v8i16(<8 x i16> %v0, <8 x i16> %v1) {			define <8 x i16> @trn1.v8i16(<8 x i16> %v0, <8 x i16> %v1) {
	%tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>			%tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
	ret <8 x i16> %tmp0			ret <8 x i16> %tmp0
	}			}

	; COST-LABEL: trn2.v8i16			; COST-LABEL: trn2.v8i16
	; COST: Found an estimated cost of 42 for instruction: %tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
	; CODE-LABEL: trn2.v8i16			; CODE-LABEL: trn2.v8i16
	; CODE: trn2 v0.8h, v0.8h, v1.8h			; CODE: trn2 v0.8h, v0.8h, v1.8h
	define <8 x i16> @trn2.v8i16(<8 x i16> %v0, <8 x i16> %v1) {			define <8 x i16> @trn2.v8i16(<8 x i16> %v0, <8 x i16> %v1) {
	%tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>			%tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
	ret <8 x i16> %tmp0			ret <8 x i16> %tmp0
	}			}

	; COST-LABEL: trn1.v2i32			; COST-LABEL: trn1.v2i32
	; COST: Found an estimated cost of 6 for instruction: %tmp0 = shufflevector <2 x i32> %v0, <2 x i32> %v1, <2 x i32> <i32 0, i32 2>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <2 x i32> %v0, <2 x i32> %v1, <2 x i32> <i32 0, i32 2>
	; CODE-LABEL: trn1.v2i32			; CODE-LABEL: trn1.v2i32
	; CODE: zip1 v0.2s, v0.2s, v1.2s			; CODE: zip1 v0.2s, v0.2s, v1.2s
	define <2 x i32> @trn1.v2i32(<2 x i32> %v0, <2 x i32> %v1) {			define <2 x i32> @trn1.v2i32(<2 x i32> %v0, <2 x i32> %v1) {
	%tmp0 = shufflevector <2 x i32> %v0, <2 x i32> %v1, <2 x i32> <i32 0, i32 2>			%tmp0 = shufflevector <2 x i32> %v0, <2 x i32> %v1, <2 x i32> <i32 0, i32 2>
	ret <2 x i32> %tmp0			ret <2 x i32> %tmp0
	}			}

	; COST-LABEL: trn2.v2i32			; COST-LABEL: trn2.v2i32
	; COST: Found an estimated cost of 6 for instruction: %tmp0 = shufflevector <2 x i32> %v0, <2 x i32> %v1, <2 x i32> <i32 1, i32 3>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <2 x i32> %v0, <2 x i32> %v1, <2 x i32> <i32 1, i32 3>
	; CODE-LABEL: trn2.v2i32			; CODE-LABEL: trn2.v2i32
	; CODE: zip2 v0.2s, v0.2s, v1.2s			; CODE: zip2 v0.2s, v0.2s, v1.2s
	define <2 x i32> @trn2.v2i32(<2 x i32> %v0, <2 x i32> %v1) {			define <2 x i32> @trn2.v2i32(<2 x i32> %v0, <2 x i32> %v1) {
	%tmp0 = shufflevector <2 x i32> %v0, <2 x i32> %v1, <2 x i32> <i32 1, i32 3>			%tmp0 = shufflevector <2 x i32> %v0, <2 x i32> %v1, <2 x i32> <i32 1, i32 3>
	ret <2 x i32> %tmp0			ret <2 x i32> %tmp0
	}			}

	; COST-LABEL: trn1.v4i32			; COST-LABEL: trn1.v4i32
	; COST: Found an estimated cost of 18 for instruction: %tmp0 = shufflevector <4 x i32> %v0, <4 x i32> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <4 x i32> %v0, <4 x i32> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	; CODE-LABEL: trn1.v4i32			; CODE-LABEL: trn1.v4i32
	; CODE: trn1 v0.4s, v0.4s, v1.4s			; CODE: trn1 v0.4s, v0.4s, v1.4s
	define <4 x i32> @trn1.v4i32(<4 x i32> %v0, <4 x i32> %v1) {			define <4 x i32> @trn1.v4i32(<4 x i32> %v0, <4 x i32> %v1) {
	%tmp0 = shufflevector <4 x i32> %v0, <4 x i32> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			%tmp0 = shufflevector <4 x i32> %v0, <4 x i32> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	ret <4 x i32> %tmp0			ret <4 x i32> %tmp0
	}			}

	; COST-LABEL: trn2.v4i32			; COST-LABEL: trn2.v4i32
	; COST: Found an estimated cost of 18 for instruction: %tmp0 = shufflevector <4 x i32> %v0, <4 x i32> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <4 x i32> %v0, <4 x i32> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	; CODE-LABEL: trn2.v4i32			; CODE-LABEL: trn2.v4i32
	; CODE: trn2 v0.4s, v0.4s, v1.4s			; CODE: trn2 v0.4s, v0.4s, v1.4s
	define <4 x i32> @trn2.v4i32(<4 x i32> %v0, <4 x i32> %v1) {			define <4 x i32> @trn2.v4i32(<4 x i32> %v0, <4 x i32> %v1) {
	%tmp0 = shufflevector <4 x i32> %v0, <4 x i32> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			%tmp0 = shufflevector <4 x i32> %v0, <4 x i32> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	ret <4 x i32> %tmp0			ret <4 x i32> %tmp0
	}			}

	; COST-LABEL: trn1.v2i64			; COST-LABEL: trn1.v2i64
	; COST: Found an estimated cost of 6 for instruction: %tmp0 = shufflevector <2 x i64> %v0, <2 x i64> %v1, <2 x i32> <i32 0, i32 2>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <2 x i64> %v0, <2 x i64> %v1, <2 x i32> <i32 0, i32 2>
	; CODE-LABEL: trn1.v2i64			; CODE-LABEL: trn1.v2i64
	; CODE: zip1 v0.2d, v0.2d, v1.2d			; CODE: zip1 v0.2d, v0.2d, v1.2d
	define <2 x i64> @trn1.v2i64(<2 x i64> %v0, <2 x i64> %v1) {			define <2 x i64> @trn1.v2i64(<2 x i64> %v0, <2 x i64> %v1) {
	%tmp0 = shufflevector <2 x i64> %v0, <2 x i64> %v1, <2 x i32> <i32 0, i32 2>			%tmp0 = shufflevector <2 x i64> %v0, <2 x i64> %v1, <2 x i32> <i32 0, i32 2>
	ret <2 x i64> %tmp0			ret <2 x i64> %tmp0
	}			}

	; COST-LABEL: trn2.v2i64			; COST-LABEL: trn2.v2i64
	; COST: Found an estimated cost of 6 for instruction: %tmp0 = shufflevector <2 x i64> %v0, <2 x i64> %v1, <2 x i32> <i32 1, i32 3>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <2 x i64> %v0, <2 x i64> %v1, <2 x i32> <i32 1, i32 3>
	; CODE-LABEL: trn2.v2i64			; CODE-LABEL: trn2.v2i64
	; CODE: zip2 v0.2d, v0.2d, v1.2d			; CODE: zip2 v0.2d, v0.2d, v1.2d
	define <2 x i64> @trn2.v2i64(<2 x i64> %v0, <2 x i64> %v1) {			define <2 x i64> @trn2.v2i64(<2 x i64> %v0, <2 x i64> %v1) {
	%tmp0 = shufflevector <2 x i64> %v0, <2 x i64> %v1, <2 x i32> <i32 1, i32 3>			%tmp0 = shufflevector <2 x i64> %v0, <2 x i64> %v1, <2 x i32> <i32 1, i32 3>
	ret <2 x i64> %tmp0			ret <2 x i64> %tmp0
	}			}

	; COST-LABEL: trn1.v2f32			; COST-LABEL: trn1.v2f32
	; COST: Found an estimated cost of 6 for instruction: %tmp0 = shufflevector <2 x float> %v0, <2 x float> %v1, <2 x i32> <i32 0, i32 2>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <2 x float> %v0, <2 x float> %v1, <2 x i32> <i32 0, i32 2>
	; CODE-LABEL: trn1.v2f32			; CODE-LABEL: trn1.v2f32
	; CODE: zip1 v0.2s, v0.2s, v1.2s			; CODE: zip1 v0.2s, v0.2s, v1.2s
	define <2 x float> @trn1.v2f32(<2 x float> %v0, <2 x float> %v1) {			define <2 x float> @trn1.v2f32(<2 x float> %v0, <2 x float> %v1) {
	%tmp0 = shufflevector <2 x float> %v0, <2 x float> %v1, <2 x i32> <i32 0, i32 2>			%tmp0 = shufflevector <2 x float> %v0, <2 x float> %v1, <2 x i32> <i32 0, i32 2>
	ret <2 x float> %tmp0			ret <2 x float> %tmp0
	}			}

	; COST-LABEL: trn2.v2f32			; COST-LABEL: trn2.v2f32
	; COST: Found an estimated cost of 6 for instruction: %tmp0 = shufflevector <2 x float> %v0, <2 x float> %v1, <2 x i32> <i32 1, i32 3>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <2 x float> %v0, <2 x float> %v1, <2 x i32> <i32 1, i32 3>
	; CODE-LABEL: trn2.v2f32			; CODE-LABEL: trn2.v2f32
	; CODE: zip2 v0.2s, v0.2s, v1.2s			; CODE: zip2 v0.2s, v0.2s, v1.2s
	define <2 x float> @trn2.v2f32(<2 x float> %v0, <2 x float> %v1) {			define <2 x float> @trn2.v2f32(<2 x float> %v0, <2 x float> %v1) {
	%tmp0 = shufflevector <2 x float> %v0, <2 x float> %v1, <2 x i32> <i32 1, i32 3>			%tmp0 = shufflevector <2 x float> %v0, <2 x float> %v1, <2 x i32> <i32 1, i32 3>
	ret <2 x float> %tmp0			ret <2 x float> %tmp0
	}			}

	; COST-LABEL: trn1.v4f32			; COST-LABEL: trn1.v4f32
	; COST: Found an estimated cost of 18 for instruction: %tmp0 = shufflevector <4 x float> %v0, <4 x float> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <4 x float> %v0, <4 x float> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	; CODE-LABEL: trn1.v4f32			; CODE-LABEL: trn1.v4f32
	; CODE: trn1 v0.4s, v0.4s, v1.4s			; CODE: trn1 v0.4s, v0.4s, v1.4s
	define <4 x float> @trn1.v4f32(<4 x float> %v0, <4 x float> %v1) {			define <4 x float> @trn1.v4f32(<4 x float> %v0, <4 x float> %v1) {
	%tmp0 = shufflevector <4 x float> %v0, <4 x float> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			%tmp0 = shufflevector <4 x float> %v0, <4 x float> %v1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	ret <4 x float> %tmp0			ret <4 x float> %tmp0
	}			}

	; COST-LABEL: trn2.v4f32			; COST-LABEL: trn2.v4f32
	; COST: Found an estimated cost of 18 for instruction: %tmp0 = shufflevector <4 x float> %v0, <4 x float> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <4 x float> %v0, <4 x float> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	; CODE-LABEL: trn2.v4f32			; CODE-LABEL: trn2.v4f32
	; CODE: trn2 v0.4s, v0.4s, v1.4s			; CODE: trn2 v0.4s, v0.4s, v1.4s
	define <4 x float> @trn2.v4f32(<4 x float> %v0, <4 x float> %v1) {			define <4 x float> @trn2.v4f32(<4 x float> %v0, <4 x float> %v1) {
	%tmp0 = shufflevector <4 x float> %v0, <4 x float> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			%tmp0 = shufflevector <4 x float> %v0, <4 x float> %v1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	ret <4 x float> %tmp0			ret <4 x float> %tmp0
	}			}

	; COST-LABEL: trn1.v2f64			; COST-LABEL: trn1.v2f64
	; COST: Found an estimated cost of 6 for instruction: %tmp0 = shufflevector <2 x double> %v0, <2 x double> %v1, <2 x i32> <i32 0, i32 2>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <2 x double> %v0, <2 x double> %v1, <2 x i32> <i32 0, i32 2>
	; CODE-LABEL: trn1.v2f64			; CODE-LABEL: trn1.v2f64
	; CODE: zip1 v0.2d, v0.2d, v1.2d			; CODE: zip1 v0.2d, v0.2d, v1.2d
	define <2 x double> @trn1.v2f64(<2 x double> %v0, <2 x double> %v1) {			define <2 x double> @trn1.v2f64(<2 x double> %v0, <2 x double> %v1) {
	%tmp0 = shufflevector <2 x double> %v0, <2 x double> %v1, <2 x i32> <i32 0, i32 2>			%tmp0 = shufflevector <2 x double> %v0, <2 x double> %v1, <2 x i32> <i32 0, i32 2>
	ret <2 x double> %tmp0			ret <2 x double> %tmp0
	}			}

	; COST-LABEL: trn2.v2f64			; COST-LABEL: trn2.v2f64
	; COST: Found an estimated cost of 6 for instruction: %tmp0 = shufflevector <2 x double> %v0, <2 x double> %v1, <2 x i32> <i32 1, i32 3>			; COST: Found an estimated cost of 1 for instruction: %tmp0 = shufflevector <2 x double> %v0, <2 x double> %v1, <2 x i32> <i32 1, i32 3>
	; CODE-LABEL: trn2.v2f64			; CODE-LABEL: trn2.v2f64
	; CODE: zip2 v0.2d, v0.2d, v1.2d			; CODE: zip2 v0.2d, v0.2d, v1.2d
	define <2 x double> @trn2.v2f64(<2 x double> %v0, <2 x double> %v1) {			define <2 x double> @trn2.v2f64(<2 x double> %v0, <2 x double> %v1) {
	%tmp0 = shufflevector <2 x double> %v0, <2 x double> %v1, <2 x i32> <i32 1, i32 3>			%tmp0 = shufflevector <2 x double> %v0, <2 x double> %v1, <2 x i32> <i32 1, i32 3>
	ret <2 x double> %tmp0			ret <2 x double> %tmp0
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[TTI, AArch64] Add transpose shuffle kindClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 144108

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/trunk/test/Analysis/CostModel/AArch64/shuffle-transpose.ll

[TTI, AArch64] Add transpose shuffle kind
ClosedPublic