This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] transform more extract/insert pairs into shuffles (PR2109)
ClosedPublic

Authored by spatel on Nov 30 2015, 3:52 PM.

Download Raw Diff

Details

Reviewers

RKSimon
t.p.northover
majnemer
sanjoy
hfinkel

Commits

rGae945e7927e3: [InstCombine] transform more extract/insert pairs into shuffles (PR2109)
rL256394: [InstCombine] transform more extract/insert pairs into shuffles (PR2109)

Summary

This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229

The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in.

The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109

For that example, the IR becomes:

%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2

And x86 SSE output improves from:

movq	(%rdi), %xmm1           ## xmm1 = mem[0],zero
movdqa	%xmm1, %xmm2
shufps	$229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps	$132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps	$32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps	$36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
retq

To the almost optimal:

movhpd	(%rdi), %xmm0

Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples.

Note 2: The 2x shufflevector mask limitation is not in the IR Language Reference shufflevector instruction definition, but it is encoded in ShuffleVectorInst::isValidOperands().

Diff Detail

Event Timeline

spatel updated this revision to Diff 41443.Nov 30 2015, 3:52 PM

spatel retitled this revision from to [InstCombine] transform more extract/insert pairs into shuffles (PR2109).

spatel updated this object.

spatel added reviewers: t.p.northover, hfinkel, RKSimon.

spatel added a subscriber: llvm-commits.

Herald added a subscriber: aemerson. · View Herald TranscriptNov 30 2015, 3:52 PM

Note 2: The 2x shufflevector mask limitation is not in the IR Language Reference shufflevector instruction definition, but it is encoded in ShuffleVectorInst::isValidOperands().

Disregard that comment. I mistook a bug in an earlier draft of this patch as that limitation. I'll update the patch to remove that check.

Patch updated:

Removed check for 2x shuffle.
Updated 'too_wide' test case because it's not too wide!

Ping.

Ping * 2.

LGTM

This revision is now accepted and ready to land.Dec 19 2015, 9:12 AM

Closed by commit rL256394: [InstCombine] transform more extract/insert pairs into shuffles (PR2109) (authored by spatel). · Explain WhyDec 24 2015, 1:21 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineVectorOps.cpp

53 lines

test/

Transforms/

InstCombine/

insert-extract-shuffle.ll

26 lines

Diff 41452

lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	if (isa<UndefValue>(ScalarOp)) { // inserting undef into vector.
}		}
}		}
}		}
}		}

return false;		return false;
}		}

		/// If we have insertion into a vector that is wider than the vector that we
		/// are extracting from, try to widen the source vector to allow a single
		/// shufflevector to replace one or more insert/extract pairs.
		static void replaceExtractElements(InsertElementInst *InsElt,
		ExtractElementInst *ExtElt,
		InstCombiner &IC) {
		VectorType *InsVecType = InsElt->getType();
		VectorType *ExtVecType = ExtElt->getVectorOperandType();
		unsigned NumInsElts = InsVecType->getVectorNumElements();
		unsigned NumExtElts = ExtVecType->getVectorNumElements();

		// The inserted-to vector must be wider than the extracted-from vector.
		if (InsVecType->getElementType() != ExtVecType->getElementType() \|\|
		NumExtElts >= NumInsElts)
		return;

		// Create a shuffle mask to widen the extended-from vector using undefined
		// values. The mask selects all of the values of the original vector followed
		// by as many undefined values as needed to create a vector of the same length
		// as the inserted-to vector.
		SmallVector<Constant *, 16> ExtendMask;
		IntegerType *IntType = Type::getInt32Ty(InsElt->getContext());
		for (unsigned i = 0; i < NumExtElts; ++i)
		ExtendMask.push_back(ConstantInt::get(IntType, i));
		for (unsigned i = NumExtElts; i < NumInsElts; ++i)
		ExtendMask.push_back(UndefValue::get(IntType));

		Value *ExtVecOp = ExtElt->getVectorOperand();
		auto *WideVec = new ShuffleVectorInst(ExtVecOp, UndefValue::get(ExtVecType),
		ConstantVector::get(ExtendMask));

		// Replace all extracts from the original narrow vector with extracts from
		// the new wide vector.
		WideVec->insertBefore(ExtElt);
		for (User *U : ExtVecOp->users()) {
		if (ExtractElementInst *OldExt = dyn_cast<ExtractElementInst>(U)) {
		auto *NewExt = ExtractElementInst::Create(WideVec, OldExt->getOperand(1));
		NewExt->insertAfter(WideVec);
		IC.ReplaceInstUsesWith(*OldExt, NewExt);
		}
		}
		}

/// We are building a shuffle to create V, which is a sequence of insertelement,		/// We are building a shuffle to create V, which is a sequence of insertelement,
/// extractelement pairs. If PermittedRHS is set, then we must either use it or		/// extractelement pairs. If PermittedRHS is set, then we must either use it or
/// not rely on the second vector source. Return a std::pair containing the		/// not rely on the second vector source. Return a std::pair containing the
/// left and right vectors of the proposed shuffle (or 0), and set the Mask		/// left and right vectors of the proposed shuffle (or 0), and set the Mask
/// parameter as required.		/// parameter as required.
///		///
/// Note: we intentionally don't try to fold earlier shuffles since they have		/// Note: we intentionally don't try to fold earlier shuffles since they have
/// often been chosen carefully to be efficiently implementable on the target.		/// often been chosen carefully to be efficiently implementable on the target.
typedef std::pair<Value , Value > ShuffleOps;		typedef std::pair<Value , Value > ShuffleOps;

static ShuffleOps collectShuffleElements(Value *V,		static ShuffleOps collectShuffleElements(Value *V,
SmallVectorImpl<Constant *> &Mask,		SmallVectorImpl<Constant *> &Mask,
Value *PermittedRHS) {		Value *PermittedRHS,
		InstCombiner &IC) {
assert(V->getType()->isVectorTy() && "Invalid shuffle!");		assert(V->getType()->isVectorTy() && "Invalid shuffle!");
unsigned NumElts = cast<VectorType>(V->getType())->getNumElements();		unsigned NumElts = cast<VectorType>(V->getType())->getNumElements();

if (isa<UndefValue>(V)) {		if (isa<UndefValue>(V)) {
Mask.assign(NumElts, UndefValue::get(Type::getInt32Ty(V->getContext())));		Mask.assign(NumElts, UndefValue::get(Type::getInt32Ty(V->getContext())));
return std::make_pair(		return std::make_pair(
PermittedRHS ? UndefValue::get(PermittedRHS->getType()) : V, nullptr);		PermittedRHS ? UndefValue::get(PermittedRHS->getType()) : V, nullptr);
}		}
Show All 14 Lines	if (ExtractElementInst *EI = dyn_cast<ExtractElementInst>(ScalarOp)) {
unsigned ExtractedIdx =		unsigned ExtractedIdx =
cast<ConstantInt>(EI->getOperand(1))->getZExtValue();		cast<ConstantInt>(EI->getOperand(1))->getZExtValue();
unsigned InsertedIdx = cast<ConstantInt>(IdxOp)->getZExtValue();		unsigned InsertedIdx = cast<ConstantInt>(IdxOp)->getZExtValue();

// Either the extracted from or inserted into vector must be RHSVec,		// Either the extracted from or inserted into vector must be RHSVec,
// otherwise we'd end up with a shuffle of three inputs.		// otherwise we'd end up with a shuffle of three inputs.
if (EI->getOperand(0) == PermittedRHS \|\| PermittedRHS == nullptr) {		if (EI->getOperand(0) == PermittedRHS \|\| PermittedRHS == nullptr) {
Value *RHS = EI->getOperand(0);		Value *RHS = EI->getOperand(0);
ShuffleOps LR = collectShuffleElements(VecOp, Mask, RHS);		ShuffleOps LR = collectShuffleElements(VecOp, Mask, RHS, IC);
assert(LR.second == nullptr \|\| LR.second == RHS);		assert(LR.second == nullptr \|\| LR.second == RHS);

if (LR.first->getType() != RHS->getType()) {		if (LR.first->getType() != RHS->getType()) {
		// Although we are giving up for now, see if we can create extracts
		// that match the inserts for another round of combining.
		replaceExtractElements(IEI, EI, IC);

// We tried our best, but we can't find anything compatible with RHS		// We tried our best, but we can't find anything compatible with RHS
// further up the chain. Return a trivial shuffle.		// further up the chain. Return a trivial shuffle.
for (unsigned i = 0; i < NumElts; ++i)		for (unsigned i = 0; i < NumElts; ++i)
Mask[i] = ConstantInt::get(Type::getInt32Ty(V->getContext()), i);		Mask[i] = ConstantInt::get(Type::getInt32Ty(V->getContext()), i);
return std::make_pair(V, nullptr);		return std::make_pair(V, nullptr);
}		}

unsigned NumLHSElts = RHS->getType()->getVectorNumElements();		unsigned NumLHSElts = RHS->getType()->getVectorNumElements();
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	if (isa<ConstantInt>(EI->getOperand(1)) && isa<ConstantInt>(IdxOp)) {
// back into the same place, just use the input vector.		// back into the same place, just use the input vector.
if (EI->getOperand(0) == VecOp && ExtractedIdx == InsertedIdx)		if (EI->getOperand(0) == VecOp && ExtractedIdx == InsertedIdx)
return ReplaceInstUsesWith(IE, VecOp);		return ReplaceInstUsesWith(IE, VecOp);

// If this insertelement isn't used by some other insertelement, turn it		// If this insertelement isn't used by some other insertelement, turn it
// (and any insertelements it points to), into one big shuffle.		// (and any insertelements it points to), into one big shuffle.
if (!IE.hasOneUse() \|\| !isa<InsertElementInst>(IE.user_back())) {		if (!IE.hasOneUse() \|\| !isa<InsertElementInst>(IE.user_back())) {
SmallVector<Constant*, 16> Mask;		SmallVector<Constant*, 16> Mask;
ShuffleOps LR = collectShuffleElements(&IE, Mask, nullptr);		ShuffleOps LR = collectShuffleElements(&IE, Mask, nullptr, *this);

// The proposed shuffle may be trivial, in which case we shouldn't		// The proposed shuffle may be trivial, in which case we shouldn't
// perform the combine.		// perform the combine.
if (LR.first != &IE && LR.second != &IE) {		if (LR.first != &IE && LR.second != &IE) {
// We now have a shuffle of LHS, RHS, Mask.		// We now have a shuffle of LHS, RHS, Mask.
if (LR.second == nullptr)		if (LR.second == nullptr)
LR.second = UndefValue::get(LR.first->getType());		LR.second = UndefValue::get(LR.first->getType());
return new ShuffleVectorInst(LR.first, LR.second,		return new ShuffleVectorInst(LR.first, LR.second,
▲ Show 20 Lines • Show All 662 Lines • Show Last 20 Lines

test/Transforms/InstCombine/insert-extract-shuffle.ll

Show All 20 Lines	; CHECK: shufflevector <8 x i16> %in2, <8 x i16> %in, <4 x i32> <i32 11, i32 9, i32 0, i32 10>
%vec.2 = insertelement <4 x i16> %vec.1, i16 %elt2, i32 2		%vec.2 = insertelement <4 x i16> %vec.1, i16 %elt2, i32 2
%vec.3 = insertelement <4 x i16> %vec.2, i16 %elt3, i32 3		%vec.3 = insertelement <4 x i16> %vec.2, i16 %elt3, i32 3

ret <4 x i16> %vec.3		ret <4 x i16> %vec.3
}		}

define <2 x i64> @test_vcopyq_lane_p64(<2 x i64> %a, <1 x i64> %b) {		define <2 x i64> @test_vcopyq_lane_p64(<2 x i64> %a, <1 x i64> %b) {
; CHECK-LABEL: @test_vcopyq_lane_p64		; CHECK-LABEL: @test_vcopyq_lane_p64
; CHECK-NEXT: extractelement		; CHECK-NEXT: %[[WIDEVEC:.*]] = shufflevector <1 x i64> %b, <1 x i64> undef, <2 x i32> <i32 0, i32 undef>
; CHECK-NEXT: insertelement		; CHECK-NEXT: shufflevector <2 x i64> %a, <2 x i64> %[[WIDEVEC]], <2 x i32> <i32 0, i32 2>
; CHECK-NEXT: ret <2 x i64> %res		; CHECK-NEXT: ret <2 x i64> %res
%elt = extractelement <1 x i64> %b, i32 0		%elt = extractelement <1 x i64> %b, i32 0
%res = insertelement <2 x i64> %a, i64 %elt, i32 1		%res = insertelement <2 x i64> %a, i64 %elt, i32 1
ret <2 x i64> %res		ret <2 x i64> %res
}		}

; PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109		; PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109

define <4 x float> @widen_extract2(<4 x float> %ins, <2 x float> %ext) {		define <4 x float> @widen_extract2(<4 x float> %ins, <2 x float> %ext) {
; CHECK-LABEL: @widen_extract2(		; CHECK-LABEL: @widen_extract2(
; CHECK-NEXT: extractelement		; CHECK-NEXT: %[[WIDEVEC:.*]] = shufflevector <2 x float> %ext, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: extractelement		; CHECK-NEXT: shufflevector <4 x float> %ins, <4 x float> %[[WIDEVEC]], <4 x i32> <i32 0, i32 4, i32 2, i32 5>
; CHECK-NEXT: insertelement
; CHECK-NEXT: insertelement
; CHECK-NEXT: ret <4 x float> %i2		; CHECK-NEXT: ret <4 x float> %i2
%e1 = extractelement <2 x float> %ext, i32 0		%e1 = extractelement <2 x float> %ext, i32 0
%e2 = extractelement <2 x float> %ext, i32 1		%e2 = extractelement <2 x float> %ext, i32 1
%i1 = insertelement <4 x float> %ins, float %e1, i32 1		%i1 = insertelement <4 x float> %ins, float %e1, i32 1
%i2 = insertelement <4 x float> %i1, float %e2, i32 3		%i2 = insertelement <4 x float> %i1, float %e2, i32 3
ret <4 x float> %i2		ret <4 x float> %i2
}		}

define <4 x float> @widen_extract3(<4 x float> %ins, <3 x float> %ext) {		define <4 x float> @widen_extract3(<4 x float> %ins, <3 x float> %ext) {
; CHECK-LABEL: @widen_extract3(		; CHECK-LABEL: @widen_extract3(
; CHECK-NEXT: extractelement		; CHECK-NEXT: %[[WIDEVEC:.*]] = shufflevector <3 x float> %ext, <3 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 undef>
; CHECK-NEXT: extractelement		; CHECK-NEXT: shufflevector <4 x float> %ins, <4 x float> %[[WIDEVEC]], <4 x i32> <i32 6, i32 5, i32 4, i32 3>
; CHECK-NEXT: extractelement
; CHECK-NEXT: insertelement
; CHECK-NEXT: insertelement
; CHECK-NEXT: insertelement
; CHECK-NEXT: ret <4 x float> %i3		; CHECK-NEXT: ret <4 x float> %i3
%e1 = extractelement <3 x float> %ext, i32 0		%e1 = extractelement <3 x float> %ext, i32 0
%e2 = extractelement <3 x float> %ext, i32 1		%e2 = extractelement <3 x float> %ext, i32 1
%e3 = extractelement <3 x float> %ext, i32 2		%e3 = extractelement <3 x float> %ext, i32 2
%i1 = insertelement <4 x float> %ins, float %e1, i32 2		%i1 = insertelement <4 x float> %ins, float %e1, i32 2
%i2 = insertelement <4 x float> %i1, float %e2, i32 1		%i2 = insertelement <4 x float> %i1, float %e2, i32 1
%i3 = insertelement <4 x float> %i2, float %e3, i32 0		%i3 = insertelement <4 x float> %i2, float %e3, i32 0
ret <4 x float> %i3		ret <4 x float> %i3
}		}

define <8 x float> @too_wide(<8 x float> %ins, <2 x float> %ext) {		define <8 x float> @widen_extract4(<8 x float> %ins, <2 x float> %ext) {
; CHECK-LABEL: @too_wide(		; CHECK-LABEL: @widen_extract4(
; CHECK-NEXT: extractelement		; CHECK-NEXT: %[[WIDEVEC:.*]] = shufflevector <2 x float> %ext, <2 x float> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: insertelement		; CHECK-NEXT: shufflevector <8 x float> %ins, <8 x float> %[[WIDEVEC]], <8 x i32> <i32 0, i32 1, i32 8, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: ret <8 x float> %i1		; CHECK-NEXT: ret <8 x float> %i1
%e1 = extractelement <2 x float> %ext, i32 0		%e1 = extractelement <2 x float> %ext, i32 0
%i1 = insertelement <8 x float> %ins, float %e1, i32 2		%i1 = insertelement <8 x float> %ins, float %e1, i32 2
ret <8 x float> %i1		ret <8 x float> %i1
}		}