This is an archive of the discontinued LLVM Phabricator instance.

[VectorUtils] move x86's scaleShuffleMask to generic VectorUtils
ClosedPublic

Authored by spatel on Mar 20 2020, 9:01 AM.

Download Raw Diff

Details

Reviewers

efriedma
RKSimon
lebedev.ri
nikic
aartbik
t.p.northover

Commits

rG0eeee83d7513: [VectorUtils] move x86's scaleShuffleMask to generic VectorUtils

Summary

We have some long-standing missing shuffle optimizations that could use this transform via VectorCombine now:
https://bugs.llvm.org/show_bug.cgi?id=35454
(and we still don't get that case in the backend either)

This function is apparently templated because there's existing code in IR that treats mask values as unsigned and backend code that treats masks values as signed?

The mask values are not endian-dependent IIUC.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Mar 20 2020, 9:01 AM

Herald added a reviewer: aartbik. · View Herald TranscriptMar 20 2020, 9:01 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

I'm not convinced this endian-agnostic, but i agree this is correct for little-endian.

This revision now requires changes to proceed.Mar 20 2020, 9:50 AM

Whether a given transform is endian-agnostic depends on the specific transform. If you're talking about reordering a bitcast and a shuffle, in particular, I'm pretty sure that's endian-agnostic. Trivial example:

%shufflefirst = shufflevector <1 x i64> %x, <1 x i64> %x, <1 x i32> <i32 0>
%z = bitcast <1 x i64> %shufflefirst to <2 x i32>

vs.

%bitcastfirst = bitcast <1 x i64> %x to <2 x i32>
%result = shufflevector <2 x i32> %bitcastfirst, <1 x i32> %bitcastfirst, <2 x i32> <i32 0, i32 1>

The shuffle is a no-op; an identity shuffle is spelled the same way on big-endian and little-endian targets.

This shouldn't be endian specific - DAGCombiner::visitVECTOR_SHUFFLE already does SHUFFLE(BITCAST,UNDEF)->BITCAST(SHUFFLE) with its own almost-identical version of ScaleShuffleMask.

In D76508#1934379, @RKSimon wrote:

This shouldn't be endian specific - DAGCombiner::visitVECTOR_SHUFFLE already does SHUFFLE(BITCAST,UNDEF)->BITCAST(SHUFFLE) with its own almost-identical version of ScaleShuffleMask.

Thanks - I knew that we had this somewhere else, but I missed that. I can remove that duplication as part of this patch if that improves the motivation.

Patch updated:
Replace duplicate shuffle mask scaling from DAGCombiner with calls to the generic util.

Thanks, that helped convince me of this not being endian-specific.
LG

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19759–19760	Do we want to preserve fastpath?
19765	I'd personally keep ternary variant, not have two loops.

This revision is now accepted and ready to land.Mar 22 2020, 10:23 AM

LGTM

llvm/include/llvm/Analysis/VectorUtils.h
346	Retaining a fast copy path for Scale == 1 would make sense

I'll check this in as-is and follow-up with the implementation improvements.

Closed by commit rG0eeee83d7513: [VectorUtils] move x86's scaleShuffleMask to generic VectorUtils (authored by spatel). · Explain WhyMar 23 2020, 7:04 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2020, 7:04 AM

spatel mentioned this in rGebf83c36e29a: [Analysis] simplify code for scaleShuffleMask.Mar 23 2020, 9:16 AM

spatel mentioned this in D77881: [VectorUtils] add IR-level analysis for widening of shuffle mask .Apr 10 2020, 8:51 AM

spatel mentioned this in rGc23cbefd9d73: [VectorUtils] add IR-level analysis for widening of shuffle mask.Apr 12 2020, 7:28 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

VectorUtils.h

34 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

18 lines

Target/

X86/

X86ISelLowering.h

26 lines

unittests/

Analysis/

VectorUtilsTest.cpp

6 lines

Diff 252022

llvm/include/llvm/Analysis/VectorUtils.h

	Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines
	/// Return true if each element of the vector value \p V is poisoned or equal to			/// Return true if each element of the vector value \p V is poisoned or equal to
	/// every other non-poisoned element. If an index element is specified, either			/// every other non-poisoned element. If an index element is specified, either
	/// every element of the vector is poisoned or the element at that index is not			/// every element of the vector is poisoned or the element at that index is not
	/// poisoned and equal to every other non-poisoned element.			/// poisoned and equal to every other non-poisoned element.
	/// This may be more powerful than the related getSplatValue() because it is			/// This may be more powerful than the related getSplatValue() because it is
	/// not limited by finding a scalar source value to a splatted vector.			/// not limited by finding a scalar source value to a splatted vector.
	bool isSplatValue(const Value *V, int Index = -1, unsigned Depth = 0);			bool isSplatValue(const Value *V, int Index = -1, unsigned Depth = 0);

				/// Scale a shuffle or target shuffle mask, replacing each mask index with the
				/// scaled sequential indices for an equivalent mask of narrowed elements.
				/// Mask elements that are less than 0 (sentinel values) are repeated in the
				/// output mask.
				///
				/// Example with Scale = 4:
				/// <4 x i32> <3, 2, 0, -1> -->
				/// <16 x i8> <12, 13, 14, 15, 8, 9, 10, 11, 0, 1, 2, 3, -1, -1, -1, -1>
				///
				/// This is the reverse process of "canWidenShuffleElements", but can always
				/// succeed.
				template <typename T>
				void scaleShuffleMask(size_t Scale, ArrayRef<T> Mask,
				SmallVectorImpl<T> &ScaledMask) {
				assert(Scale > 0 && "Unexpected scaling factor");
				size_t NumElts = Mask.size();
				RKSimonUnsubmitted Not Done Reply Inline Actions Retaining a fast copy path for Scale == 1 would make sense RKSimon: Retaining a fast copy path for Scale == 1 would make sense
				ScaledMask.assign(NumElts * Scale, -1);

				for (size_t i = 0; i != NumElts; ++i) {
				int M = Mask[i];

				// Repeat sentinel values in every mask element.
				if (M < 0) {
				for (size_t s = 0; s != Scale; ++s)
				ScaledMask[(Scale * i) + s] = M;
				continue;
				}

				// Scale mask element and increment across each mask element.
				for (size_t s = 0; s != Scale; ++s)
				ScaledMask[(Scale * i) + s] = (Scale * M) + s;
				}
				}

	/// Compute a map of integer instructions to their minimum legal type			/// Compute a map of integer instructions to their minimum legal type
	/// size.			/// size.
	///			///
	/// C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int			/// C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
	/// type (e.g. i32) whenever arithmetic is performed on them.			/// type (e.g. i32) whenever arithmetic is performed on them.
	///			///
	/// For targets with native i8 or i16 operations, usually InstCombine can shrink			/// For targets with native i8 or i16 operations, usually InstCombine can shrink
	/// the arithmetic type down again. However InstCombine refuses to create			/// the arithmetic type down again. However InstCombine refuses to create
	▲ Show 20 Lines • Show All 537 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 19,750 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::CONCAT_VECTORS && N1.isUndef() &&
N0.getNumOperands() == 2 &&		N0.getNumOperands() == 2 &&
N0.getOperand(0) == N0.getOperand(1)) {		N0.getOperand(0) == N0.getOperand(1)) {
int HalfNumElts = (int)NumElts / 2;		int HalfNumElts = (int)NumElts / 2;
SmallVector<int, 8> NewMask;		SmallVector<int, 8> NewMask;
for (unsigned i = 0; i != NumElts; ++i) {		for (unsigned i = 0; i != NumElts; ++i) {
int Idx = SVN->getMaskElt(i);		int Idx = SVN->getMaskElt(i);
if (Idx >= HalfNumElts) {		if (Idx >= HalfNumElts) {
assert(Idx < (int)NumElts && "Shuffle mask chooses undef op");		assert(Idx < (int)NumElts && "Shuffle mask chooses undef op");
Idx -= HalfNumElts;		Idx -= HalfNumElts;
}		}
lebedev.riUnsubmitted Not Done Reply Inline Actions Do we want to preserve fastpath? lebedev.ri: Do we want to preserve fastpath?
NewMask.push_back(Idx);		NewMask.push_back(Idx);
}		}
if (TLI.isShuffleMaskLegal(NewMask, VT)) {		if (TLI.isShuffleMaskLegal(NewMask, VT)) {
SDValue UndefVec = DAG.getUNDEF(N0.getOperand(0).getValueType());		SDValue UndefVec = DAG.getUNDEF(N0.getOperand(0).getValueType());
SDValue NewCat = DAG.getNode(ISD::CONCAT_VECTORS, SDLoc(N), VT,		SDValue NewCat = DAG.getNode(ISD::CONCAT_VECTORS, SDLoc(N), VT,
lebedev.riUnsubmitted Not Done Reply Inline Actions I'd personally keep ternary variant, not have two loops. lebedev.ri: I'd personally keep ternary variant, not have two loops.
N0.getOperand(0), UndefVec);		N0.getOperand(0), UndefVec);
return DAG.getVectorShuffle(VT, SDLoc(N), NewCat, N1, NewMask);		return DAG.getVectorShuffle(VT, SDLoc(N), NewCat, N1, NewMask);
}		}
}		}

// Attempt to combine a shuffle of 2 inputs of 'scalar sources' -		// Attempt to combine a shuffle of 2 inputs of 'scalar sources' -
// BUILD_VECTOR or SCALAR_TO_VECTOR into a single BUILD_VECTOR.		// BUILD_VECTOR or SCALAR_TO_VECTOR into a single BUILD_VECTOR.
if (Level < AfterLegalizeDAG && TLI.isTypeLegal(VT))		if (Level < AfterLegalizeDAG && TLI.isTypeLegal(VT))
if (SDValue Res = combineShuffleOfScalars(SVN, DAG, TLI))		if (SDValue Res = combineShuffleOfScalars(SVN, DAG, TLI))
return Res;		return Res;

// If this shuffle only has a single input that is a bitcasted shuffle,		// If this shuffle only has a single input that is a bitcasted shuffle,
// attempt to merge the 2 shuffles and suitably bitcast the inputs/output		// attempt to merge the 2 shuffles and suitably bitcast the inputs/output
// back to their original types.		// back to their original types.
if (N0.getOpcode() == ISD::BITCAST && N0.hasOneUse() &&		if (N0.getOpcode() == ISD::BITCAST && N0.hasOneUse() &&
N1.isUndef() && Level < AfterLegalizeVectorOps &&		N1.isUndef() && Level < AfterLegalizeVectorOps &&
TLI.isTypeLegal(VT)) {		TLI.isTypeLegal(VT)) {
auto ScaleShuffleMask = [](ArrayRef<int> Mask, int Scale) {
if (Scale == 1)
return SmallVector<int, 8>(Mask.begin(), Mask.end());

SmallVector<int, 8> NewMask;
for (int M : Mask)
for (int s = 0; s != Scale; ++s)
NewMask.push_back(M < 0 ? -1 : Scale * M + s);
return NewMask;
};

SDValue BC0 = peekThroughOneUseBitcasts(N0);		SDValue BC0 = peekThroughOneUseBitcasts(N0);
if (BC0.getOpcode() == ISD::VECTOR_SHUFFLE && BC0.hasOneUse()) {		if (BC0.getOpcode() == ISD::VECTOR_SHUFFLE && BC0.hasOneUse()) {
EVT SVT = VT.getScalarType();		EVT SVT = VT.getScalarType();
EVT InnerVT = BC0->getValueType(0);		EVT InnerVT = BC0->getValueType(0);
EVT InnerSVT = InnerVT.getScalarType();		EVT InnerSVT = InnerVT.getScalarType();

// Determine which shuffle works with the smaller scalar type.		// Determine which shuffle works with the smaller scalar type.
EVT ScaleVT = SVT.bitsLT(InnerSVT) ? VT : InnerVT;		EVT ScaleVT = SVT.bitsLT(InnerSVT) ? VT : InnerVT;
EVT ScaleSVT = ScaleVT.getScalarType();		EVT ScaleSVT = ScaleVT.getScalarType();

if (TLI.isTypeLegal(ScaleVT) &&		if (TLI.isTypeLegal(ScaleVT) &&
0 == (InnerSVT.getSizeInBits() % ScaleSVT.getSizeInBits()) &&		0 == (InnerSVT.getSizeInBits() % ScaleSVT.getSizeInBits()) &&
0 == (SVT.getSizeInBits() % ScaleSVT.getSizeInBits())) {		0 == (SVT.getSizeInBits() % ScaleSVT.getSizeInBits())) {
int InnerScale = InnerSVT.getSizeInBits() / ScaleSVT.getSizeInBits();		int InnerScale = InnerSVT.getSizeInBits() / ScaleSVT.getSizeInBits();
int OuterScale = SVT.getSizeInBits() / ScaleSVT.getSizeInBits();		int OuterScale = SVT.getSizeInBits() / ScaleSVT.getSizeInBits();

// Scale the shuffle masks to the smaller scalar type.		// Scale the shuffle masks to the smaller scalar type.
ShuffleVectorSDNode *InnerSVN = cast<ShuffleVectorSDNode>(BC0);		ShuffleVectorSDNode *InnerSVN = cast<ShuffleVectorSDNode>(BC0);
SmallVector<int, 8> InnerMask =		SmallVector<int, 8> InnerMask;
ScaleShuffleMask(InnerSVN->getMask(), InnerScale);		SmallVector<int, 8> OuterMask;
SmallVector<int, 8> OuterMask =		scaleShuffleMask<int>(InnerScale, InnerSVN->getMask(), InnerMask);
ScaleShuffleMask(SVN->getMask(), OuterScale);		scaleShuffleMask<int>(OuterScale, SVN->getMask(), OuterMask);

// Merge the shuffle masks.		// Merge the shuffle masks.
SmallVector<int, 8> NewMask;		SmallVector<int, 8> NewMask;
for (int M : OuterMask)		for (int M : OuterMask)
NewMask.push_back(M < 0 ? -1 : InnerMask[M]);		NewMask.push_back(M < 0 ? -1 : InnerMask[M]);

// Test for shuffle mask legality over both commutations.		// Test for shuffle mask legality over both commutations.
SDValue SV0 = BC0->getOperand(0);		SDValue SV0 = BC0->getOperand(0);
▲ Show 20 Lines • Show All 1,865 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,589 Lines • ▼ Show 20 Lines	void createSplat2ShuffleMask(MVT VT, SmallVectorImpl<T> &Mask, bool Lo) {
int NumElts = VT.getVectorNumElements();		int NumElts = VT.getVectorNumElements();
for (int i = 0; i < NumElts; ++i) {		for (int i = 0; i < NumElts; ++i) {
int Pos = i / 2;		int Pos = i / 2;
Pos += (Lo ? 0 : NumElts / 2);		Pos += (Lo ? 0 : NumElts / 2);
Mask.push_back(Pos);		Mask.push_back(Pos);
}		}
}		}

/// Helper function to scale a shuffle or target shuffle mask, replacing each
/// mask index with the scaled sequential indices for an equivalent narrowed
/// mask. This is the reverse process to canWidenShuffleElements, but can
/// always succeed.
template <typename T>
void scaleShuffleMask(size_t Scale, ArrayRef<T> Mask,
SmallVectorImpl<T> &ScaledMask) {
assert(0 < Scale && "Unexpected scaling factor");
size_t NumElts = Mask.size();
ScaledMask.assign(NumElts * Scale, -1);

for (size_t i = 0; i != NumElts; ++i) {
int M = Mask[i];

// Repeat sentinel values in every mask element.
if (M < 0) {
for (size_t s = 0; s != Scale; ++s)
ScaledMask[(Scale * i) + s] = M;
continue;
}

// Scale mask element and increment across each mask element.
for (size_t s = 0; s != Scale; ++s)
ScaledMask[(Scale * i) + s] = (Scale * M) + s;
}
}
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_X86_X86ISELLOWERING_H		#endif // LLVM_LIB_TARGET_X86_X86ISELLOWERING_H

llvm/unittests/Analysis/VectorUtilsTest.cpp

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	TEST_F(BasicTest, isSplat) {
Value *SplatC = IRB.CreateVectorSplat(5, ScalarC);		Value *SplatC = IRB.CreateVectorSplat(5, ScalarC);
EXPECT_TRUE(isSplatValue(SplatC));		EXPECT_TRUE(isSplatValue(SplatC));

// FIXME: Constant splat analysis does not allow undef elements.		// FIXME: Constant splat analysis does not allow undef elements.
Constant *SplatWithUndefC = ConstantVector::get({ScalarC, UndefScalar});		Constant *SplatWithUndefC = ConstantVector::get({ScalarC, UndefScalar});
EXPECT_FALSE(isSplatValue(SplatWithUndefC));		EXPECT_FALSE(isSplatValue(SplatWithUndefC));
}		}

		TEST_F(BasicTest, scaleShuffleMask) {
		SmallVector<int, 16> ScaledMask;
		scaleShuffleMask<int>(4, {3,2,0,-1}, ScaledMask);
		EXPECT_EQ(makeArrayRef<int>(ScaledMask), makeArrayRef<int>({12,13,14,15,8,9,10,11,0,1,2,3,-1,-1,-1,-1}));
		}

TEST_F(BasicTest, getSplatIndex) {		TEST_F(BasicTest, getSplatIndex) {
EXPECT_EQ(getSplatIndex({0,0,0}), 0);		EXPECT_EQ(getSplatIndex({0,0,0}), 0);
EXPECT_EQ(getSplatIndex({1,0,0}), -1); // no splat		EXPECT_EQ(getSplatIndex({1,0,0}), -1); // no splat
EXPECT_EQ(getSplatIndex({0,1,1}), -1); // no splat		EXPECT_EQ(getSplatIndex({0,1,1}), -1); // no splat
EXPECT_EQ(getSplatIndex({42,42,42}), 42); // array size is independent of splat index		EXPECT_EQ(getSplatIndex({42,42,42}), 42); // array size is independent of splat index
EXPECT_EQ(getSplatIndex({42,42,-1}), 42); // ignore negative		EXPECT_EQ(getSplatIndex({42,42,-1}), 42); // ignore negative
EXPECT_EQ(getSplatIndex({-1,42,-1}), 42); // ignore negatives		EXPECT_EQ(getSplatIndex({-1,42,-1}), 42); // ignore negatives
EXPECT_EQ(getSplatIndex({-4,42,-42}), 42); // ignore all negatives		EXPECT_EQ(getSplatIndex({-4,42,-42}), 42); // ignore all negatives
▲ Show 20 Lines • Show All 518 Lines • Show Last 20 Lines