This is an archive of the discontinued LLVM Phabricator instance.

[PCG] Poor shuffle lane tracking (PR35454 )
Needs ReviewPublic

Authored by kbelochapka on Nov 29 2017, 7:58 PM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
efriedma
RKSimon

Summary

Fix allows for the following transformation to take place:
BINARY-OPERATION( SHUFFLE(vector1, mask), SHUFFLE(vector2, mask)) -> SHUFFLE( BINARY-OPERATION(vector1, vector2), mask)
in a case when BINARY-OPERATION instruction operands vector type is different from SHUFFLE instruction operands vector type. e.g. <4 x i32> and <16 x i8>, obviously the both data types need to have the same width.

Diff Detail

Event Timeline

kbelochapka created this revision.Nov 29 2017, 7:58 PM

RKSimon added a reviewer: RKSimon.Nov 30 2017, 1:20 AM

spatel added a reviewer: efriedma.Nov 30 2017, 12:51 PM

spatel added inline comments.

lib/Transforms/InstCombine/InstructionCombining.cpp

1715–1717

This is not correct. You can't assume the surrounding instructions when creating an instcombine fold. Your test cases should be minimal and show what happens with those patterns. For this transform, that would be something like this:

define <8 x i16> @shuffle_add(<4 x i32> %v1, <4 x i32> %v2) {
  %shuffle1 = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  %shuffle2 = shufflevector <4 x i32> %v2, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  %bc1 = bitcast <4 x i32> %shuffle1 to <8 x i16>
  %bc2 = bitcast <4 x i32> %shuffle2 to <8 x i16>
  %add = add <8 x i16> %bc1, %bc2
  ret <8 x i16> %add
}

With this patch, we go from 5 instructions to 6:

define <8 x i16> @shuffle_add(<4 x i32> %v1, <4 x i32> %v2) {
  %1 = bitcast <4 x i32> %v1 to <8 x i16>
  %2 = bitcast <4 x i32> %v2 to <8 x i16>
  %3 = add <8 x i16> %1, %2
  %4 = bitcast <8 x i16> %3 to <4 x i32>
  %5 = shufflevector <4 x i32> %4, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  %6 = bitcast <4 x i32> %5 to <8 x i16>
  ret <8 x i16> %6
}

Maybe this makes sense because we traded a shuffle for a couple of bitcasts?

If we used the narrower element type for the shuffle, then we'd have a reduction in instruction count by eliminating the last 2 bitcasts, but I don't know if that is allowed as a target-independent transform.

spatel added inline comments.Nov 30 2017, 12:54 PM

lib/Transforms/InstCombine/InstructionCombining.cpp
1720–1721	The shuffle mask is always a constant: "The shuffle mask operand is required to be a constant vector with either constant integer or undef values." http://llvm.org/docs/LangRef.html#shufflevector-instruction

efriedma added inline comments.Nov 30 2017, 1:06 PM

lib/Transforms/InstCombine/InstructionCombining.cpp
1715–1717	On some targets, vector bitcasts aren't free (IIRC big-endian ARM is like this). Changing the type of the shuffle is.... maybe a little sketchy. I mean, ideally targets should be able to handle either form, but I'm not sure we actually do that reliably. We don't have good tests for that sort of thing.

spatel added inline comments.Nov 30 2017, 1:12 PM

lib/Transforms/InstCombine/InstructionCombining.cpp
1715–1717	Yeah, I was afraid that was the answer :) So 2 options: Wait to do this in the DAG where we can ask if bitcasts are free. Try to match the larger pattern (shuffle+binop+shuffle) shown in the motivating tests.

Thanks guys for valuable comments, will reimplementing the fix as suggested by Sanyaj.

Reimplemented the fix based on the reviewers recommendations.
Now the fix makes an attempt to transform sequence :
SHUFFLE<T0>(MASK) --> BITCACT<T1> --> BINOP<T1> --> BITCAST<T0> --> SHUFFLE<T0>(MASK)
into:
BITCAST<T1> --> BINOP<T1> --> SHUFFLE<T1>(NEW_MASK)
It is always possible when sizeof of BINOP vector element type is smaller than sizeof of SHUFFLE vector element type,
and sometimes is possible when it is not.

@kbelochapka I think we can abandon this now the vector combine pass handles PR35454

@spatel Maybe ensure we have all the test coverage from the tests that Konstantin added here?

In D40633#1958062, @RKSimon wrote:

@spatel Maybe ensure we have all the test coverage from the tests that Konstantin added here?

Tests adapted from this patch and added to "PhaseOrdering":
rG389704cc601

I added test comments about what we still can do to improve things. Also note that the fold in -vector-combine relies on the cost model, so we don't get the simplifications for a base SSE2 compile (I assumed from the asm shown in https://bugs.llvm.org/show_bug.cgi?id=35454 that we care about an AVX or later target).

spatel mentioned this in rG389704cc601b: [PhaseOrdering] add shuffle tests based on D40633; NFC.Apr 3 2020, 10:15 AM

spatel mentioned this in D77881: [VectorUtils] add IR-level analysis for widening of shuffle mask .Apr 10 2020, 8:51 AM

spatel mentioned this in rGc23cbefd9d73: [VectorUtils] add IR-level analysis for widening of shuffle mask.Apr 12 2020, 7:28 AM

RKSimon resigned from this revision.May 15 2020, 1:53 AM

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineInternal.h

4 lines

InstCombineVectorOps.cpp

5 lines

InstructionCombining.cpp

296 lines

test/

Transforms/

InstCombine/

vec_shuffle.ll

294 lines

Diff 126643

lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 793 Lines • ▼ Show 20 Lines	private:
Instruction SimplifyMemSet(MemSetInst MI);		Instruction SimplifyMemSet(MemSetInst MI);

Value EvaluateInDifferentType(Value V, Type *Ty, bool isSigned);		Value EvaluateInDifferentType(Value V, Type *Ty, bool isSigned);

/// \brief Returns a value X such that Val = X * Scale, or null if none.		/// \brief Returns a value X such that Val = X * Scale, or null if none.
///		///
/// If the multiplication is known not to overflow then NoSignedWrap is set.		/// If the multiplication is known not to overflow then NoSignedWrap is set.
Value Descale(Value Val, APInt Scale, bool &NoSignedWrap);		Value Descale(Value Val, APInt Scale, bool &NoSignedWrap);

		// Rearange shuffle-bitcast-shuffle sequence into:
		// shuffle-shuffle-bitcast or bitcast-shuffle-shuffle
		Instruction *RearangeShuffleBitcastShuffle(ShuffleVectorInst &Shuf);
};		};

} // end namespace llvm		} // end namespace llvm

#undef DEBUG_TYPE		#undef DEBUG_TYPE

#endif // LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H		#endif // LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H

lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 1,200 Lines • ▼ Show 20 Lines	if (VWidth == LHSWidth) {
if (isLHSID) return replaceInstUsesWith(SVI, LHS);		if (isLHSID) return replaceInstUsesWith(SVI, LHS);
if (isRHSID) return replaceInstUsesWith(SVI, RHS);		if (isRHSID) return replaceInstUsesWith(SVI, RHS);
}		}

if (isa<UndefValue>(RHS) && CanEvaluateShuffled(LHS, Mask)) {		if (isa<UndefValue>(RHS) && CanEvaluateShuffled(LHS, Mask)) {
Value *V = EvaluateInDifferentElementOrder(LHS, Mask);		Value *V = EvaluateInDifferentElementOrder(LHS, Mask);
return replaceInstUsesWith(SVI, V);		return replaceInstUsesWith(SVI, V);
}		}
		if (isa<UndefValue>(RHS) && isa<BitCastInst>(LHS)) {
		Instruction *Inst = RearangeShuffleBitcastShuffle(SVI);
		if (Inst)
		return replaceInstUsesWith(SVI, Inst);
		}

// SROA generates shuffle+bitcast when the extracted sub-vector is bitcast to		// SROA generates shuffle+bitcast when the extracted sub-vector is bitcast to
// a non-vector type. We can instead bitcast the original vector followed by		// a non-vector type. We can instead bitcast the original vector followed by
// an extract of the desired element:		// an extract of the desired element:
//		//
// %sroa = shufflevector <16 x i8> %in, <16 x i8> undef,		// %sroa = shufflevector <16 x i8> %in, <16 x i8> undef,
// <4 x i32> <i32 0, i32 1, i32 2, i32 3>		// <4 x i32> <i32 0, i32 1, i32 2, i32 3>
// %1 = bitcast <4 x i8> %sroa to i32		// %1 = bitcast <4 x i8> %sroa to i32
▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 1,402 Lines • ▼ Show 20 Lines	if (Ancestor == Val)
return Val;		return Val;

// Move up one level in the expression.		// Move up one level in the expression.
assert(Ancestor->hasOneUse() && "Drilled down when more than one use!");		assert(Ancestor->hasOneUse() && "Drilled down when more than one use!");
Ancestor = Ancestor->user_back();		Ancestor = Ancestor->user_back();
} while (true);		} while (true);
}		}

		// Computes shuffle mask for smaller vector element size
		// InNumElts, InEltsSize - target vector <number of elements x element size>, eg
		// <16 x 1> ShufNumElts, ShufEltSize - current shuffle <number of elements x
		// element size>, eg <4 x 4> ShufMask - current shuffle mask NewShufMask -
		// recomputed shuffle mask
		static bool ShuffleMaskForSmallerVectorElement(
		int InNumElts, int ShufNumElts, int InEltBytes, int ShufEltBytes,
		const SmallVector<int, 16> &ShufMask, SmallVector<int, 16> &NewShufMask) {

		if ((int)ShufMask.size() != ShufNumElts)
		return false;
		if ((InNumElts * InEltBytes) != (ShufNumElts * ShufEltBytes))
		return false;
		int ChunkSize = ShufEltBytes / InEltBytes;
		if (!ChunkSize)
		return false;
		NewShufMask.resize(InNumElts);
		for (int i = 0; i < ShufNumElts; ++i) {
		int ShufMaskElt = ShufMask[i];
		for (int j = 0, m = i * ChunkSize; j < ChunkSize; ++j) {
		if ((ShufMaskElt < 0) \|\| (ShufMaskElt >= ShufNumElts))
		NewShufMask[m + j] = -1;
		else
		NewShufMask[m + j] = (ShufMaskElt * ChunkSize) + j;
		}
		}
		return true;
		}

		// Computes shuffle mask for bigger vector element size
		// InNumElts, InEltsSize - target vector <number of elements x element size>, eg
		// <4 x 4> ShufNumElts, ShufEltSize - current shuffle <number of elements x
		// element size>, eg <16 x 1> ShufMask - current shuffle mask NewShufMask -
		// recomputed shuffle mask NOTE: not always succedes
		static bool ShuffleMaskForBiggerVectorElement(
		int InNumElts, int ShufNumElts, int InEltBytes, int ShufEltBytes,
		const SmallVector<int, 16> &ShufMask, SmallVector<int, 16> &NewShufMask) {

		if ((int)ShufMask.size() != ShufNumElts)
		return false;
		if ((InNumElts * InEltBytes) != (ShufNumElts * ShufEltBytes))
		return false;
		int ChunkSize = InEltBytes / ShufEltBytes;
		if (!ChunkSize)
		return false;

		NewShufMask.resize(InNumElts);
		for (int i = 0, m = 0; i < ShufNumElts; i += ChunkSize, m++) {
		int ShufMaskElt = ShufMask[i];
		for (int j = 0; j < ChunkSize; ++j) {
		if ((ShufMaskElt < 0) \|\|
		(ShufMaskElt >=
		ShufNumElts)) { // either all elements in a chunk are undefined
		if (ShufMask[i + j] != ShufMaskElt)
		return false;
		} else { // or they are all sequencial
		if (ShufMask[i + j] != (ShufMaskElt + j))
		return false;
		}
		}
		NewShufMask[m] = ShufMaskElt / ChunkSize;
		}
		return true;
		}

		static Constant *ShuffleMaskToConstantVector(ArrayRef<int> ShufMask,
		InstCombiner::BuilderTy &Builder) {
		SmallVector<Constant *, 16> MaskValues;
		for (int i = 0, e = ShufMask.size(); i != e; ++i) {
		if (ShufMask[i] == -1)
		MaskValues.push_back(UndefValue::get(Builder.getInt32Ty()));
		else
		MaskValues.push_back(Builder.getInt32(ShufMask[i]));
		}
		return ConstantVector::get(MaskValues);
		}

		template <typename VectorInstType>
		static bool getVectorInstructionTypeInfo(const VectorInstType &VectInst,
		int &NumElts, int &EltBytes) {
		NumElts = EltBytes = 0;
		VectorType *VectInstTy = cast<VectorType>(VectInst.getType());
		if (!VectInstTy)
		return false;
		NumElts = VectInstTy->getVectorNumElements();
		EltBytes = VectInstTy->getVectorElementType()->getScalarSizeInBits() / 8;
		return true;
		}

		template <typename VectorInstType>
		static bool getVectorInstructionOperandTypeInfo(const VectorInstType &VectInst,
		int OpNum, int &NumElts,
		int &EltBytes) {
		NumElts = EltBytes = 0;
		Value *VectInstOp = VectInst.getOperand(OpNum);
		if (isa<UndefValue>(VectInstOp))
		return true;
		VectorType *VectInstOpTy = cast<VectorType>(VectInstOp->getType());
		if (!VectInstOpTy)
		return false;
		NumElts = VectInstOpTy->getVectorNumElements();
		EltBytes = VectInstOpTy->getVectorElementType()->getScalarSizeInBits() / 8;
		return true;
		}

		static bool ComputeShuffleMaskForSmallerVectorElement(
		const BinaryOperator &Inst, const ShuffleVectorInst &Shuf,
		Constant *&NewMask, InstCombiner::BuilderTy &Builder) {

		SmallVector<int, 16> ShufMask = Shuf.getShuffleMask();
		int InstNumElts, ShufNumElts, InstEltBytes, ShufEltBytes;
		if (!getVectorInstructionTypeInfo(Inst, InstNumElts, InstEltBytes))
		return false;
		if (!getVectorInstructionTypeInfo(Shuf, ShufNumElts, ShufEltBytes))
		return false;
		SmallVector<int, 16> NewShufMask;
		bool ok =
		ShuffleMaskForSmallerVectorElement(InstNumElts, ShufNumElts, InstEltBytes,
		ShufEltBytes, ShufMask, NewShufMask);
		if (!ok)
		return false;
		NewMask = ShuffleMaskToConstantVector(NewShufMask, Builder);
		return true;
		}

		static bool ComputeShuffleMaskForBiggerVectorElement(
		const BinaryOperator &Inst, const ShuffleVectorInst &Shuf,
		Constant *&NewMask, InstCombiner::BuilderTy &Builder) {

		SmallVector<int, 16> ShufMask = Shuf.getShuffleMask();
		int InstNumElts, ShufNumElts, InstEltBytes, ShufEltBytes;
		if (!getVectorInstructionTypeInfo(Inst, InstNumElts, InstEltBytes))
		return false;
		if (!getVectorInstructionTypeInfo(Shuf, ShufNumElts, ShufEltBytes))
		return false;
		SmallVector<int, 16> NewShufMask;
		bool ok =
		ShuffleMaskForBiggerVectorElement(InstNumElts, ShufNumElts, InstEltBytes,
		ShufEltBytes, ShufMask, NewShufMask);
		if (!ok)
		return false;
		NewMask = ShuffleMaskToConstantVector(NewShufMask, Builder);
		return true;
		}

		static bool ComputeShuffleMaskForDifferentVectorType(
		const BinaryOperator &Inst, const ShuffleVectorInst &Shuf,
		Constant *&NewMask, InstCombiner::BuilderTy &Builder) {

		int InstNumElts, ShufNumElts, InstEltBytes, ShufEltBytes;
		if (!getVectorInstructionTypeInfo(Inst, InstNumElts, InstEltBytes))
		return false;
		if (!getVectorInstructionTypeInfo(Shuf, ShufNumElts, ShufEltBytes))
		return false;
		if (InstNumElts > ShufNumElts) {
		return ComputeShuffleMaskForSmallerVectorElement(Inst, Shuf, NewMask,
		Builder);
		} else if (InstNumElts < ShufNumElts) {
		return ComputeShuffleMaskForBiggerVectorElement(Inst, Shuf, NewMask,
		Builder);
		}
		return false;
		}

		static Instruction *BitcastShuffleShuffleSequence(
		ShuffleVectorInst &Shuf1, ShuffleVectorInst &Shuf2,
		const SmallVector<int, 16> &NewShufMask, InstCombiner::BuilderTy &Builder) {
		Constant *NewShuf2MaskC = ShuffleMaskToConstantVector(NewShufMask, Builder);
		Constant *Shuf2MaskC = Shuf2.getMask();
		Value *NewBitCast =
		Builder.CreateBitCast(Shuf1.getOperand(0), Shuf2.getType());
		Value *NewShuf1 = Builder.CreateShuffleVector(
		NewBitCast, UndefValue::get(Shuf2.getType()), NewShuf2MaskC);
		Value *NewShuf2 = Builder.CreateShuffleVector(
		NewShuf1, UndefValue::get(Shuf2.getType()), Shuf2MaskC);
		return cast<Instruction>(NewShuf2);
		}

		static Instruction *ShuffleShuffleBitcastSequence(
		ShuffleVectorInst &Shuf1, ShuffleVectorInst &Shuf2,
		const SmallVector<int, 16> &NewShufMask, InstCombiner::BuilderTy &Builder) {
		Constant *NewShuf2MaskC = ShuffleMaskToConstantVector(NewShufMask, Builder);
		Value *NewShuf2 = Builder.CreateShuffleVector(
		&Shuf1, UndefValue::get(Shuf1.getType()), NewShuf2MaskC);
		Value *NewBitCast = Builder.CreateBitCast(NewShuf2, Shuf2.getType());
		return cast<Instruction>(NewBitCast);
		}

		// Rearange shuffle-bitcast-shuffle sequence:
		// x1 = shuffle(<T1> x0, mask1); x2=bitcact(<T1> x1, <T2>); x3 = shuffle(<T2>
		// x2, mask2); Into two possible sequences:
		// 1. => x1 = shuffle(<T1> x0, mask1); x2 = shuffle(<T1> x1, new_mask2); x3 =
		// bitcact(<T1> x2, <T2>);
		// 2. => x1 = bitcact(<T1> x0, <T2>); x2 = shuffle(<T2>, new_mask1); x3 =
		// shuffle(<T2> x2, mask2); Backend usually will replace two addjacent shuffles
		// by one shuffle instruction with combined mask
		Instruction *
		InstCombiner::RearangeShuffleBitcastShuffle(ShuffleVectorInst &Shuf) {
		Value *ShufLHS = Shuf.getOperand(0);
		Value *ShufRHS = Shuf.getOperand(1);

		if (!isa<BitCastInst>(ShufLHS))
		return nullptr;
		if (!isa<UndefValue>(ShufRHS))
		return nullptr;
		BitCastInst *ShufBc = cast<BitCastInst>(ShufLHS);

		Value *ShufBcOp = ShufBc->getOperand(0);
		if (!isa<ShuffleVectorInst>(ShufBcOp))
		return nullptr;
		ShuffleVectorInst *BcShuf = cast<ShuffleVectorInst>(ShufBcOp);

		int ShufNumElts, BcShufNumElts, ShufEltBytes, BcShufEltBytes;
		if (!getVectorInstructionTypeInfo(Shuf, ShufNumElts, ShufEltBytes))
		return nullptr;
		if (!getVectorInstructionTypeInfo(*BcShuf, BcShufNumElts, BcShufEltBytes))
		return nullptr;
		if ((ShufNumElts * ShufEltBytes) != (BcShufNumElts * BcShufEltBytes))
		return nullptr;
		SmallVector<int, 16> ShufMask = Shuf.getShuffleMask();
		SmallVector<int, 16> BcShufMask = BcShuf->getShuffleMask();
		SmallVector<int, 16> NewShufMask;
		bool ok = false;
		if (BcShufEltBytes > ShufEltBytes) {
		ok = ShuffleMaskForBiggerVectorElement(BcShufNumElts, ShufNumElts,
		BcShufEltBytes, ShufEltBytes,
		ShufMask, NewShufMask);
		if (ok) {
		return ShuffleShuffleBitcastSequence(*BcShuf, Shuf, NewShufMask, Builder);
		} else {
		ok = ShuffleMaskForSmallerVectorElement(ShufNumElts, BcShufNumElts,
		ShufEltBytes, BcShufEltBytes,
		BcShufMask, NewShufMask);
		if (!ok)
		return nullptr;
		return BitcastShuffleShuffleSequence(*BcShuf, Shuf, NewShufMask, Builder);
		}
		} else { // ShufEltBytes > BcShufEltBytes
		ok = ShuffleMaskForBiggerVectorElement(ShufNumElts, BcShufNumElts,
		ShufEltBytes, BcShufEltBytes,
		BcShufMask, NewShufMask);
		if (ok) {
		return BitcastShuffleShuffleSequence(*BcShuf, Shuf, NewShufMask, Builder);
		} else {
		ok = ShuffleMaskForSmallerVectorElement(BcShufNumElts, ShufNumElts,
		BcShufEltBytes, ShufEltBytes,
		ShufMask, NewShufMask);
		if (!ok)
		return nullptr;
		return ShuffleShuffleBitcastSequence(*BcShuf, Shuf, NewShufMask, Builder);
		}
		}
		return nullptr;
		}

/// \brief Creates node of binary operation with the same attributes as the		/// \brief Creates node of binary operation with the same attributes as the
/// specified one but with other operands.		/// specified one but with other operands.
static Value CreateBinOpAsGiven(BinaryOperator &Inst, Value LHS, Value *RHS,		static Value CreateBinOpAsGiven(BinaryOperator &Inst, Value LHS, Value *RHS,
InstCombiner::BuilderTy &B) {		InstCombiner::BuilderTy &B) {
Value *BO = B.CreateBinOp(Inst.getOpcode(), LHS, RHS);		Value *BO = B.CreateBinOp(Inst.getOpcode(), LHS, RHS);
// If LHS and RHS are constant, BO won't be a binary operator.		// If LHS and RHS are constant, BO won't be a binary operator.
if (BinaryOperator *NewBO = dyn_cast<BinaryOperator>(BO))		if (BinaryOperator *NewBO = dyn_cast<BinaryOperator>(BO))
NewBO->copyIRFlags(&Inst);		NewBO->copyIRFlags(&Inst);
Show All 27 Lines	if (LShuf && RShuf && LShuf->getMask() == RShuf->getMask() &&
isa<UndefValue>(LShuf->getOperand(1)) &&		isa<UndefValue>(LShuf->getOperand(1)) &&
isa<UndefValue>(RShuf->getOperand(1)) &&		isa<UndefValue>(RShuf->getOperand(1)) &&
LShuf->getOperand(0)->getType() == RShuf->getOperand(0)->getType()) {		LShuf->getOperand(0)->getType() == RShuf->getOperand(0)->getType()) {
Value *NewBO = CreateBinOpAsGiven(Inst, LShuf->getOperand(0),		Value *NewBO = CreateBinOpAsGiven(Inst, LShuf->getOperand(0),
RShuf->getOperand(0), Builder);		RShuf->getOperand(0), Builder);
return Builder.CreateShuffleVector(		return Builder.CreateShuffleVector(
NewBO, UndefValue::get(NewBO->getType()), LShuf->getMask());		NewBO, UndefValue::get(NewBO->getType()), LShuf->getMask());
}		}
		// Both arguments of the binary operation are the shuffle instructions, but
		// binary operation vector element type is different from a shuffle
		// instruction vector element type, e.g. shuffle operands data type is <4 x
		// i32>, but a binary operation operands data type is <16 x i8> In this
		// situation, in order to move the shuffle instruction behind the binary
		// operation instruction we need change the shuffle instruction data type and
		// recompute the shuffle instruction mask. We can always do that if we need to
		// change shuffle vector element type into smaller one, but changing from
		// smaller shuffle vector element type into bigger vector element type is not
		spatelUnsubmitted Not Done Reply Inline Actions This is not correct. You can't assume the surrounding instructions when creating an instcombine fold. Your test cases should be minimal and show what happens with those patterns. For this transform, that would be something like this: define <8 x i16> @shuffle_add(<4 x i32> %v1, <4 x i32> %v2) { %shuffle1 = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0> %shuffle2 = shufflevector <4 x i32> %v2, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0> %bc1 = bitcast <4 x i32> %shuffle1 to <8 x i16> %bc2 = bitcast <4 x i32> %shuffle2 to <8 x i16> %add = add <8 x i16> %bc1, %bc2 ret <8 x i16> %add } With this patch, we go from 5 instructions to 6: define <8 x i16> @shuffle_add(<4 x i32> %v1, <4 x i32> %v2) { %1 = bitcast <4 x i32> %v1 to <8 x i16> %2 = bitcast <4 x i32> %v2 to <8 x i16> %3 = add <8 x i16> %1, %2 %4 = bitcast <8 x i16> %3 to <4 x i32> %5 = shufflevector <4 x i32> %4, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0> %6 = bitcast <4 x i32> %5 to <8 x i16> ret <8 x i16> %6 } Maybe this makes sense because we traded a shuffle for a couple of bitcasts? If we used the narrower element type for the shuffle, then we'd have a reduction in instruction count by eliminating the last 2 bitcasts, but I don't know if that is allowed as a target-independent transform. spatel: This is not correct. You can't assume the surrounding instructions when creating an instcombine…
		efriedmaUnsubmitted Not Done Reply Inline Actions On some targets, vector bitcasts aren't free (IIRC big-endian ARM is like this). Changing the type of the shuffle is.... maybe a little sketchy. I mean, ideally targets should be able to handle either form, but I'm not sure we actually do that reliably. We don't have good tests for that sort of thing. efriedma: On some targets, vector bitcasts aren't free (IIRC big-endian ARM is like this). Changing the…
		spatelUnsubmitted Not Done Reply Inline Actions Yeah, I was afraid that was the answer :) So 2 options: Wait to do this in the DAG where we can ask if bitcasts are free. Try to match the larger pattern (shuffle+binop+shuffle) shown in the motivating tests. spatel: Yeah, I was afraid that was the answer :) So 2 options: 1. Wait to do this in the DAG where we…
		// always possible.
		BitCastInst *LBitCast = dyn_cast<BitCastInst>(LHS);
		BitCastInst *RBitCast = dyn_cast<BitCastInst>(RHS);
		if (LBitCast && RBitCast) {
		spatelUnsubmitted Not Done Reply Inline Actions The shuffle mask is always a constant: "The shuffle mask operand is required to be a constant vector with either constant integer or undef values." http://llvm.org/docs/LangRef.html#shufflevector-instruction spatel: The shuffle mask is always a constant: "The shuffle mask operand is required to be a constant…
		Value *LBitCastOp = LBitCast->getOperand(0);
		Value *RBitCastOp = RBitCast->getOperand(0);
		ShuffleVectorInst *LBcShuf = dyn_cast<ShuffleVectorInst>(LBitCastOp);
		ShuffleVectorInst *RBcShuf = dyn_cast<ShuffleVectorInst>(RBitCastOp);

		if (LBcShuf && RBcShuf && LBcShuf->getMask() == RBcShuf->getMask() &&
		isa<UndefValue>(LBcShuf->getOperand(1)) &&
		isa<UndefValue>(RBcShuf->getOperand(1)) &&
		LBcShuf->getOperand(0)->getType() ==
		RBcShuf->getOperand(0)->getType()) {
		Constant *NewMask;
		bool ok = ComputeShuffleMaskForDifferentVectorType(Inst, *LBcShuf,
		NewMask, Builder);
		if (ok) {
		Value *LBitCast =
		Builder.CreateBitCast(LBcShuf->getOperand(0), Inst.getType());
		Value *RBitCast =
		Builder.CreateBitCast(RBcShuf->getOperand(0), Inst.getType());

		Value *NewBinOp = CreateBinOpAsGiven(Inst, LBitCast, RBitCast, Builder);

		Value *NewShuf = Builder.CreateShuffleVector(
		NewBinOp, UndefValue::get(Inst.getType()), NewMask);
		return NewShuf;
		}
		}
		}
// If one argument is a shuffle within one vector, the other is a constant,		// If one argument is a shuffle within one vector, the other is a constant,
// try moving the shuffle after the binary operation.		// try moving the shuffle after the binary operation.
ShuffleVectorInst *Shuffle = nullptr;		ShuffleVectorInst *Shuffle = nullptr;
Constant *C1 = nullptr;		Constant *C1 = nullptr;
if (isa<ShuffleVectorInst>(LHS)) Shuffle = cast<ShuffleVectorInst>(LHS);		if (isa<ShuffleVectorInst>(LHS)) Shuffle = cast<ShuffleVectorInst>(LHS);
if (isa<ShuffleVectorInst>(RHS)) Shuffle = cast<ShuffleVectorInst>(RHS);		if (isa<ShuffleVectorInst>(RHS)) Shuffle = cast<ShuffleVectorInst>(RHS);
if (isa<Constant>(LHS)) C1 = cast<Constant>(LHS);		if (isa<Constant>(LHS)) C1 = cast<Constant>(LHS);
if (isa<Constant>(RHS)) C1 = cast<Constant>(RHS);		if (isa<Constant>(RHS)) C1 = cast<Constant>(RHS);
▲ Show 20 Lines • Show All 1,887 Lines • Show Last 20 Lines

test/Transforms/InstCombine/vec_shuffle.ll

	Show First 20 Lines • Show All 457 Lines • ▼ Show 20 Lines
	define <2 x i32> @pr23113(<4 x i32> %A) {			define <2 x i32> @pr23113(<4 x i32> %A) {
	; CHECK-LABEL: @pr23113(			; CHECK-LABEL: @pr23113(
	; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> %A, <4 x i32*> undef, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> %A, <4 x i32*> undef, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: ret <2 x i32*> [[TMP1]]			; CHECK-NEXT: ret <2 x i32*> [[TMP1]]
	;			;
	%1 = shufflevector <4 x i32> %A, <4 x i32> undef, <2 x i32> <i32 0, i32 1>			%1 = shufflevector <4 x i32> %A, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
	ret <2 x i32*> %1			ret <2 x i32*> %1
	}			}

				; Function Attrs: noinline nounwind uwtable
				define <2 x i64> @shuffle_32_add_16_shuffle_32_masks_are_eq(<2 x i64> %v) {
				; CHECK-LABEL: @shuffle_32_add_16_shuffle_32_masks_are_eq(
				; CHECK-NEXT: [[TMP0:%.*]] = bitcast <2 x i64> %v to <8 x i16>
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64> %v to <8 x i16>
				; CHECK-NEXT: [[TMP2:%.]] = add <8 x i16> [[TMP1:%.]], [[TMP2:%.*]]
				; CHECK-NEXT: [[TMP3:%.]] = bitcast <8 x i16> [[TMP2:%.]] to <2 x i64>
				; CHECK-NEXT: ret <2 x i64> [[TMP3:%.*]]
				;
				%bc0 = bitcast <2 x i64> %v to <4 x i32>
				%shuffle = shufflevector <4 x i32> %bc0, <4 x i32> zeroinitializer, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
				%bc1 = bitcast <4 x i32> %shuffle to <2 x i64>
				%bc2 = bitcast <2 x i64> %bc1 to <8 x i16>
				%add.i = add <8 x i16> %bc2, %bc2
				%bc3 = bitcast <8 x i16> %add.i to <2 x i64>
				%bc4 = bitcast <2 x i64> %bc3 to <4 x i32>
				%shuffle4 = shufflevector <4 x i32> %bc4, <4 x i32> zeroinitializer, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
				%bc5 = bitcast <4 x i32> %shuffle4 to <2 x i64>
				ret <2 x i64> %bc5
				}

				; Function Attrs: noinline nounwind uwtable
				define <2 x i64> @shuffle_32_add_8_shuffle_32_masks_are_eq(<2 x i64> %v) {
				; CHECK-LABEL: @shuffle_32_add_8_shuffle_32_masks_are_eq(
				; CHECK-NEXT: [[TMP0:%.*]] = bitcast <2 x i64> %v to <16 x i8>
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64> %v to <16 x i8>
				; CHECK-NEXT: [[TMP2:%.]] = add <16 x i8> [[TMP1:%.]], [[TMP2:%.*]]
				; CHECK-NEXT: [[TMP3:%.]] = bitcast <16 x i8> [[TMP2:%.]] to <2 x i64>
				; CHECK-NEXT: ret <2 x i64> [[TMP3:%.*]]
				;
				%bc0 = bitcast <2 x i64> %v to <4 x i32>
				%shuffle = shufflevector <4 x i32> %bc0, <4 x i32> zeroinitializer, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
				%bc1 = bitcast <4 x i32> %shuffle to <2 x i64>
				%bc2 = bitcast <2 x i64> %bc1 to <16 x i8>
				%add.i = add <16 x i8> %bc2, %bc2
				%bc3 = bitcast <16 x i8> %add.i to <2 x i64>
				%bc4 = bitcast <2 x i64> %bc3 to <4 x i32>
				%shuffle4 = shufflevector <4 x i32> %bc4, <4 x i32> zeroinitializer, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
				%bc5 = bitcast <4 x i32> %shuffle4 to <2 x i64>
				ret <2 x i64> %bc5
				}

				; Function Attrs: noinline nounwind uwtable
				define <2 x i64> @shuffle_8_add_32_shuffle_8_masks_are_eq(<2 x i64> %v) {
				; CHECK-LABEL: @shuffle_8_add_32_shuffle_8_masks_are_eq(
				; CHECK-NEXT: [[TMP0:%.*]] = bitcast <2 x i64> %v to <4 x i32>
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64> %v to <4 x i32>
				; CHECK-NEXT: [[TMP2:%.]] = add <4 x i32> [[TMP1:%.]], [[TMP2:%.*]]
				; CHECK-NEXT: [[TMP3:%.]] = bitcast <4 x i32> [[TMP2:%.]] to <2 x i64>
				; CHECK-NEXT: ret <2 x i64> [[TMP3:%.*]]
				;
				%bc0 = bitcast <2 x i64> %v to <16 x i8>
				%shuffle = shufflevector <16 x i8> %bc0, <16 x i8> zeroinitializer, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				%bc1 = bitcast <16 x i8> %shuffle to <2 x i64>
				%bc2 = bitcast <2 x i64> %bc1 to <4 x i32>
				%add.i = add <4 x i32> %bc2, %bc2
				%bc3 = bitcast <4 x i32> %add.i to <2 x i64>
				%bc4 = bitcast <2 x i64> %bc3 to <16 x i8>
				%shuffle4 = shufflevector <16 x i8> %bc4, <16 x i8> zeroinitializer, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				%bc5 = bitcast <16 x i8> %shuffle4 to <2 x i64>
				ret <2 x i64> %bc5
				}

				define <8 x i16> @shuffle_32_add_16_masks_are_eq(<4 x i32> %v1, <4 x i32> %v2) {
				; CHECK-LABEL: @shuffle_32_add_16_masks_are_eq
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> %v1 to <8 x i16>
				; CHECK-NEXT: [[TMP2:%.*]] = bitcast <4 x i32> %v2 to <8 x i16>
				; CHECK-NEXT: [[TMP3:%.]] = add <8 x i16> [[TMP1:%.]], [[TMP2:%.*]]
				; CHECK-NEXT: [[TMP4:%.]] = shufflevector <8 x i16> [[TMP3:%.]], <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1>
				; CHECK-NEXT: ret <8 x i16> [[TMP4:%.*]]

				%shuffle1 = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				%shuffle2 = shufflevector <4 x i32> %v2, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				%bc1 = bitcast <4 x i32> %shuffle1 to <8 x i16>
				%bc2 = bitcast <4 x i32> %shuffle2 to <8 x i16>
				%add = add <8 x i16> %bc1, %bc2
				ret <8 x i16> %add
				}

				define <16 x i8> @shuffle_32_add_8_masks_are_eq(<4 x i32> %v1, <4 x i32> %v2) {
				; CHECK-LABEL: @shuffle_32_add_8_masks_are_eq
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> %v1 to <16 x i8>
				; CHECK-NEXT: [[TMP2:%.*]] = bitcast <4 x i32> %v2 to <16 x i8>
				; CHECK-NEXT: [[TMP3:%.]] = add <16 x i8> [[TMP1:%.]], [[TMP2:%.*]]
				; CHECK-NEXT: [[TMP4:%.]] = shufflevector <16 x i8> [[TMP3:%.]], <16 x i8> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				; CHECK-NEXT: ret <16 x i8> [[TMP4:%.*]]

				%shuffle1 = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				%shuffle2 = shufflevector <4 x i32> %v2, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				%bc1 = bitcast <4 x i32> %shuffle1 to <16 x i8>
				%bc2 = bitcast <4 x i32> %shuffle2 to <16 x i8>
				%add = add <16 x i8> %bc1, %bc2
				ret <16 x i8> %add
				}

				define <16 x i8> @shuffle_16_add_8_masks_are_eq(<8 x i16> %v1, <8 x i16> %v2) {
				; CHECK-LABEL: @shuffle_16_add_8_masks_are_eq
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> %v1 to <16 x i8>
				; CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i16> %v2 to <16 x i8>
				; CHECK-NEXT: [[TMP3:%.]] = add <16 x i8> [[TMP1:%.]], [[TMP2:%.*]]
				; CHECK-NEXT: [[TMP4:%.]] = shufflevector <16 x i8> [[TMP3:%.]], <16 x i8> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 14, i32 15, i32 12, i32 13>
				; CHECK-NEXT: ret <16 x i8> [[TMP4:%.*]]

				%shuffle1 = shufflevector <8 x i16> %v1, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 1, i32 0, i32 4, i32 5, i32 7, i32 6>
				%shuffle2 = shufflevector <8 x i16> %v2, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 1, i32 0, i32 4, i32 5, i32 7, i32 6>
				%bc1 = bitcast <8 x i16> %shuffle1 to <16 x i8>
				%bc2 = bitcast <8 x i16> %shuffle2 to <16 x i8>
				%add = add <16 x i8> %bc1, %bc2
				ret <16 x i8> %add
				}

				define <4 x i32> @shuffle_16_add_32_masks_are_eq_and_can_be_converted_up(<8 x i16> %v1, <8 x i16> %v2) {
				; CHECK-LABEL: @shuffle_16_add_32_masks_are_eq_and_can_be_converted_up
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> %v1 to <4 x i32>
				; CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i16> %v2 to <4 x i32>
				; CHECK-NEXT: [[TMP3:%.]] = add <4 x i32> [[TMP1:%.]], [[TMP2:%.*]]
				; CHECK-NEXT: [[TMP4:%.]] = shufflevector <4 x i32> [[TMP3:%.]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
				; CHECK-NEXT: ret <4 x i32> [[TMP4:%.*]]

				%shuffle1 = shufflevector <8 x i16> %v1, <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				%shuffle2 = shufflevector <8 x i16> %v2, <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				%bc1 = bitcast <8 x i16> %shuffle1 to <4 x i32>
				%bc2 = bitcast <8 x i16> %shuffle2 to <4 x i32>
				%add = add <4 x i32> %bc1, %bc2
				ret <4 x i32> %add
				}

				define <4 x i32> @shuffle_8_add_32_masks_are_eq_and_can_be_converted_up(<16 x i8> %v1, <16 x i8> %v2) {
				; CHECK-LABEL: @shuffle_8_add_32_masks_are_eq_and_can_be_converted_up
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <16 x i8> %v1 to <4 x i32>
				; CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> %v2 to <4 x i32>
				; CHECK-NEXT: [[TMP3:%.]] = add <4 x i32> [[TMP1:%.]], [[TMP2:%.*]]
				; CHECK-NEXT: [[TMP4:%.]] = shufflevector <4 x i32> [[TMP3:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
				; CHECK-NEXT: ret <4 x i32> [[TMP4:%.*]]

				%shuffle1 = shufflevector <16 x i8> %v1, <16 x i8> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				%shuffle2 = shufflevector <16 x i8> %v2, <16 x i8> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				%bc1 = bitcast <16 x i8> %shuffle1 to <4 x i32>
				%bc2 = bitcast <16 x i8> %shuffle2 to <4 x i32>
				%add = add <4 x i32> %bc1, %bc2
				ret <4 x i32> %add
				}

				; shuffle<8 x i16>( bitcast<8 x i16>( shuffle<4 x i32>(v)))

				define <8 x i16> @shuffle_32_bitcast_16_shuffle_16_can_be_converted_up(<4 x i32> %v1) {
				; CHECK-LABEL: @shuffle_32_bitcast_16_shuffle_16_can_be_converted_up
				; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[TMP0:%.]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				; CHECK-NEXT: [[TMP2:%.]] = bitcast <4 x i32> [[TMP1:%.]] to <8 x i16>
				; CHECK-NEXT: ret <8 x i16> [[TMP2:%.*]]

				%shuffle1 = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				%bc1 = bitcast <4 x i32> %shuffle1 to <8 x i16>
				%shuffle2 = shufflevector <8 x i16> %bc1, <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1>
				ret <8 x i16> %shuffle2
				}

				; shuffle<8 x i16>( bitcast<8 x i16>( shuffle<4 x i32>(v)))

				define <8 x i16> @shuffle_32_bitcast_16_shuffle_16_can_not_be_converted_up(<4 x i32> %v1) {
				; CHECK-LABEL: @shuffle_32_bitcast_16_shuffle_16_can_not_be_converted_up
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> %v1 to <8 x i16>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[TMP1:%.]], <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1>
				; CHECK-NEXT: [[TMP3:%.]] = shufflevector <8 x i16> [[TMP2:%.]], <8 x i16> undef, <8 x i32> <i32 5, i32 4, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				; CHECK-NEXT: ret <8 x i16> [[TMP3:%.*]]

				%shuffle1 = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				%bc1 = bitcast <4 x i32> %shuffle1 to <8 x i16>
				%shuffle2 = shufflevector <8 x i16> %bc1, <8 x i16> undef, <8 x i32> <i32 5, i32 4, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				ret <8 x i16> %shuffle2
				}

				; shuffle<16 x i8>( bitcast<16 x i8>( shuffle<4 x i32>(v)))

				define <16 x i8> @shuffle_32_bitcast_8_shuffle_8_can_be_converted_up(<4 x i32> %v1) {
				; CHECK-LABEL: @shuffle_32_bitcast_8_shuffle_8_can_be_converted_up
				; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[TMP0:%.]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				; CHECK-NEXT: [[TMP2:%.]] = bitcast <4 x i32> [[TMP1:%.]] to <16 x i8>
				; CHECK-NEXT: ret <16 x i8> [[TMP2:%.*]]

				%shuffle1 = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				%bc1 = bitcast <4 x i32> %shuffle1 to <16 x i8>
				%shuffle2 = shufflevector <16 x i8> %bc1, <16 x i8> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				ret <16 x i8> %shuffle2
				}

				; shuffle<16 x i8>( bitcast<16 x i8>( shuffle<4 x i32>(v)))

				define <16 x i8> @shuffle_32_bitcast_8_shuffle_8_can_not_be_converted_up(<4 x i32> %v1) {
				; CHECK-LABEL: @shuffle_32_bitcast_8_shuffle_8_can_not_be_converted_up
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> %v1 to <16 x i8>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <16 x i8> [[TMP1:%.]], <16 x i8> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				; CHECK-NEXT: [[TMP3:%.]] = shufflevector <16 x i8> [[TMP2:%.]], <16 x i8> undef, <16 x i32> <i32 5, i32 4, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				; CHECK-NEXT: ret <16 x i8> [[TMP3:%.*]]

				%shuffle1 = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				%bc1 = bitcast <4 x i32> %shuffle1 to <16 x i8>
				%shuffle2 = shufflevector <16 x i8> %bc1, <16 x i8> undef, <16 x i32> <i32 5, i32 4, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				ret <16 x i8> %shuffle2
				}

				; shuffle<4 x i32>( bitcast<4 x i32>( shuffle<16 x i8>(v)))

				define <4 x i32> @shuffle_8_bitcast_32_shuffle_32_can_be_converted_up(<16 x i8> %v1) {
				; CHECK-LABEL: @shuffle_8_bitcast_32_shuffle_32_can_be_converted_up
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <16 x i8> %v1 to <4 x i32>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP1:%.]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				; CHECK-NEXT: [[TMP3:%.]] = shufflevector <4 x i32> [[TMP2:%.]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				; CHECK-NEXT: ret <4 x i32> [[TMP3:%.*]]

				%shuffle1 = shufflevector <16 x i8> %v1, <16 x i8> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				%bc1 = bitcast <16 x i8> %shuffle1 to <4 x i32>
				%shuffle2 = shufflevector <4 x i32> %bc1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				ret <4 x i32> %shuffle2
				}

				; shuffle<4 x i32>( bitcast<4 x i32>( shuffle<8 x i16>(v)))

				define <4 x i32> @shuffle_16_bitcast_32_shuffle_32_can_be_converted_up(<8 x i16> %v1) {
				; CHECK-LABEL: @shuffle_16_bitcast_32_shuffle_32_can_be_converted_up
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> %v1 to <4 x i32>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP1:%.]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				; CHECK-NEXT: [[TMP3:%.]] = shufflevector <4 x i32> [[TMP2:%.]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				; CHECK-NEXT: ret <4 x i32> [[TMP3:%.*]]

				%shuffle1 = shufflevector <8 x i16> %v1, <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1>
				%bc1 = bitcast <8 x i16> %shuffle1 to <4 x i32>
				%shuffle2 = shufflevector <4 x i32> %bc1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				ret <4 x i32> %shuffle2
				}

				; shuffle<4 x i32>( bitcast<4 x i32>( shuffle<16 x i8>(v)))

				define <4 x i32> @shuffle_8_bitcast_32_shuffle_32_can_not_be_converted_up(<16 x i8> %v1) {
				; CHECK-LABEL: @shuffle_8_bitcast_32_shuffle_32_can_not_be_converted_up
				; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <16 x i8> %v1, <16 x i8> undef, <16 x i32> <i32 9, i32 8, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i8> [[TMP0:%.]], <16 x i8> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				; CHECK-NEXT: [[TMP2:%.]] = bitcast <16 x i8> [[TMP1:%.]] to <4 x i32>
				; CHECK-NEXT: ret <4 x i32> [[TMP2:%.*]]

				%shuffle1 = shufflevector <16 x i8> %v1, <16 x i8> undef, <16 x i32> <i32 9, i32 8, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				%bc1 = bitcast <16 x i8> %shuffle1 to <4 x i32>
				%shuffle2 = shufflevector <4 x i32> %bc1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				ret <4 x i32> %shuffle2
				}

				; shuffle<4 x i32>( bitcast<4 x i32>( shuffle<8 x i16>(v)))

				define <4 x i32> @shuffle_16_bitcast_32_shuffle_32_can_not_be_converted_up(<8 x i16> %v1) {
				; CHECK-LABEL: @shuffle_16_bitcast_32_shuffle_32_can_not_be_converted_up
				; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <8 x i16> %v1, <8 x i16> undef, <8 x i32> <i32 5, i32 4, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1>
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i16> [[TMP0:%.]], <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1>
				; CHECK-NEXT: [[TMP2:%.]] = bitcast <8 x i16> [[TMP1:%.]] to <4 x i32>
				; CHECK-NEXT: ret <4 x i32> [[TMP2:%.*]]

				%shuffle1 = shufflevector <8 x i16> %v1, <8 x i16> undef, <8 x i32> <i32 5, i32 4, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1>
				%bc1 = bitcast <8 x i16> %shuffle1 to <4 x i32>
				%shuffle2 = shufflevector <4 x i32> %bc1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 1, i32 0>
				ret <4 x i32> %shuffle2
				}

				; shuffle<8 x i16>( bitcast<8 x i16>( shuffle<16 x i8>(v)))

				define <8 x i16> @shuffle_8_bitcast_16_shuffle_16_can__be_converted_up(<16 x i8> %v1) {
				; CHECK-LABEL: @shuffle_8_bitcast_16_shuffle_16_can__be_converted_up
				; CHECK-NEXT: [[TMP1:%.*]] = bitcast <16 x i8> %v1 to <8 x i16>
				; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i16> %1, <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1>
				; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x i16> %2, <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1>
				; CHECK-NEXT: ret <8 x i16> %3

				%shuffle1 = shufflevector <16 x i8> %v1, <16 x i8> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				%bc1 = bitcast <16 x i8> %shuffle1 to <8 x i16>
				%shuffle2 = shufflevector <8 x i16> %bc1, <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1>
				ret <8 x i16> %shuffle2
				}

				; shuffle<8 x i16>( bitcast<8 x i16>( shuffle<16 x i8>(v)))

				define <8 x i16> @shuffle_8_bitcast_16_shuffle_16_can_not_be_converted_up(<16 x i8> %v1) {
				; CHECK-LABEL: @shuffle_8_bitcast_16_shuffle_16_can_not_be_converted_up(<16 x i8> %v1) {
				; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <16 x i8> %v1, <16 x i8> undef, <16 x i32> <i32 9, i32 8, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i8> [[TMP0:%.]], <16 x i8> undef, <16 x i32> <i32 10, i32 11, i32 8, i32 9, i32 12, i32 13, i32 14, i32 15, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				; CHECK-NEXT: [[TMP2:%.]] = bitcast <16 x i8> [[TMP1:%.]] to <8 x i16>
				; CHECK-NEXT: ret <8 x i16> [[TMP2:%.*]]

				%shuffle1 = shufflevector <16 x i8> %v1, <16 x i8> undef, <16 x i32> <i32 9, i32 8, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				%bc1 = bitcast <16 x i8> %shuffle1 to <8 x i16>
				%shuffle2 = shufflevector <8 x i16> %bc1, <8 x i16> undef, <8 x i32> <i32 5, i32 4, i32 6, i32 7, i32 2, i32 3, i32 0, i32 1>
				ret <8 x i16> %shuffle2
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PCG] Poor shuffle lane tracking (PR35454 )Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 126643

lib/Transforms/InstCombine/InstCombineInternal.h

lib/Transforms/InstCombine/InstCombineVectorOps.cpp

lib/Transforms/InstCombine/InstructionCombining.cpp

test/Transforms/InstCombine/vec_shuffle.ll

[PCG] Poor shuffle lane tracking (PR35454 )
Needs ReviewPublic