This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] fold insertelement of constant into shuffle with constant operand (PR29126)
ClosedPublic

Authored by spatel on Aug 25 2016, 12:03 PM.

Download Raw Diff

Details

Reviewers

majnemer
hfinkel
efriedma

Commits

rG521f19f2498e: [InsttCombine] fold insertelement of constant into shuffle with constant…
rL280504: [InsttCombine] fold insertelement of constant into shuffle with constant…

Summary

We can see chains of insertelement instructions before SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector.

This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126

Diff Detail

Event Timeline

spatel updated this revision to Diff 69278.Aug 25 2016, 12:03 PM

spatel retitled this revision from to [InstCombine] fold insertelement of constant into shuffle with constant operand (PR29126).

spatel updated this object.

spatel added reviewers: majnemer, efriedma, hfinkel.

spatel added subscribers: RKSimon, ABataev.

Herald added a subscriber: mcrosier. · View Herald TranscriptAug 25 2016, 12:03 PM

This seems vaguely risky: if the backend isn't smart enough to pattern-match an arbitrary shuffle into a cheap shuffle + insert, you could end up making the code slower. (For example, I'm not sure the x86 backend can decompose an <8 x i16> shuffle into a punpcklwd+pinsrw in general.)

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
602	I'm not following this logic... don't you need to prove that the chosen element of the original constant vector isn't used? Consider a shuffle mask like "<8 x i16> <0, 1, 2, 3, 8, 9, 10, 11>" followed by an insertion at index 0.

In D23886#526485, @efriedma wrote:

This seems vaguely risky: if the backend isn't smart enough to pattern-match an arbitrary shuffle into a cheap shuffle + insert, you could end up making the code slower. (For example, I'm not sure the x86 backend can decompose an <8 x i16> shuffle into a punpcklwd+pinsrw in general.)

Let me write up some more tests and pass them on to a few different targets. For the x86 cases I looked at, this appeared to always be a win, but I didn't try any i16 tests.

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
602	Yes, this is just wrong. I was only thinking about the same-lane / blend pattern in the motivating example.

This leads to a question that was raised in D22114. Which is canonical: a shufflevector with a select-equivalent mask or a select with a constant condition operand?

Given:

define <4 x i8> @hoo(<4 x i8> %x) {
  %y = shufflevector <4 x i8> %x, <4 x i8> <i8 undef, i8 5, i8 6, i8 7>, <4 x i32><i32 0, i32 7, i32 6, i32 5> ; lane-changing
  ret <4 x i8> %y
}

Should we transform to:

define <4 x i8> @hoo(<4 x i8> %x) {
  %y = shufflevector <4 x i8> %x, <4 x i8> <i8 undef, i8 7, i8 6, i8 5>, <4 x i32><i32 0, i32 5, i32 6, i32 7> ; lane-preserving
  ret <4 x i8> %y
}

or:

define <4 x i8> @hoo(<4 x i8> %x) {
  %y = select <4 x i1> <i1 true, i1 false, i1 false, i1 false>, <4 x i8> %x, <4 x i8> <i8 undef, i8 7, i8 6, i8 5>
  ret <4 x i8> %y
}

IR should be canonical, so we should add transforms to make one of the latter the preferred form?

Probably a good idea to bring up the shuffle vs select discussion on llvmdev, to get more visibility; it will have a substantial impact on backends.

Patch updated:
We seem to have consensus about canonicalizing a vector select with a constant condition operand to a shuffle, but I'll give it a bit more time before making a patch for that.

Even without that step in place, I think we can push this patch forward by limiting the transform to select-equivalent shuffles.

We still get the motivating case.
The previous logic was incorrect for the general shuffle case, but it should work for this limited form of shuffle.
We add a bit of shuffle <--> select plumbing via isShuffleEquivalentToSelect() which may be useful for subsequent patches too.

This is looking much better.

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
590	I think you can write the body of this loop as "int EltVal = Shuf.getMaskValue(i); if (EltVal != -1 && EltVal != i && EltVal != i + Vecsize) return false;".

spatel added inline comments.Sep 2 2016, 6:49 AM

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
590	Nice - not sure how I missed that method up to now!

Patch updated:
Use Shuf.getMaskValue(i) to simplify the code.

spatel added a child revision: D24182: [InstCombine] Fix for PR29124: reduce insertelements to shufflevector.Sep 2 2016, 9:11 AM

LGTM.

This revision is now accepted and ready to land.Sep 2 2016, 9:12 AM

spatel mentioned this in D24182: [InstCombine] Fix for PR29124: reduce insertelements to shufflevector.Sep 2 2016, 9:23 AM

Closed by commit rL280504: [InsttCombine] fold insertelement of constant into shuffle with constant… (authored by spatel). · Explain WhySep 2 2016, 10:13 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineVectorOps.cpp

76 lines

test/

Transforms/

InstCombine/

insert-const-shuf.ll

11 lines

Diff 70162

lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 560 Lines • ▼ Show 20 Lines	while (V->hasOneUse() && Depth < 10) {
Depth++;		Depth++;
}		}

if (IsRedundant)		if (IsRedundant)
return replaceInstUsesWith(I, I.getOperand(0));		return replaceInstUsesWith(I, I.getOperand(0));
return nullptr;		return nullptr;
}		}

		static bool isShuffleEquivalentToSelect(ShuffleVectorInst &Shuf) {
		int MaskSize = Shuf.getMask()->getType()->getVectorNumElements();
		int VecSize = Shuf.getOperand(0)->getType()->getVectorNumElements();

		// A vector select does not change the size of the operands.
		if (MaskSize != VecSize)
		return false;

		// Each mask element must be undefined or choose a vector element from one of
		// the source operands without crossing vector lanes.
		for (int i = 0; i != MaskSize; ++i) {
		int Elt = Shuf.getMaskValue(i);
		if (Elt != -1 && Elt != i && Elt != i + VecSize)
		return false;
		}

		return true;
		}

		/// insertelt (shufflevector X, CVec, Mask), C, CIndex -->
		/// shufflevector X, CVec', Mask'
		static Instruction *foldConstantInsEltIntoShuffle(InsertElementInst &InsElt) {
		efriedmaUnsubmitted Not Done Reply Inline Actions I think you can write the body of this loop as "int EltVal = Shuf.getMaskValue(i); if (EltVal != -1 && EltVal != i && EltVal != i + Vecsize) return false;". efriedma: I think you can write the body of this loop as "int EltVal = Shuf.getMaskValue(i); if (EltVal !
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Nice - not sure how I missed that method up to now! spatel: Nice - not sure how I missed that method up to now!
		// Bail out if the shuffle has more than one use. In that case, we'd be
		// replacing the insertelt with a shuffle, and that's not a clear win.
		auto *Shuf = dyn_cast<ShuffleVectorInst>(InsElt.getOperand(0));
		if (!Shuf \|\| !Shuf->hasOneUse())
		return nullptr;

		// The shuffle must have a constant vector operand. The insertelt must have a
		// constant scalar being inserted at a constant position in the vector.
		Constant ShufConstVec, InsEltScalar;
		uint64_t InsEltIndex;
		if (!match(Shuf->getOperand(1), m_Constant(ShufConstVec)) \|\|
		!match(InsElt.getOperand(1), m_Constant(InsEltScalar)) \|\|
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not following this logic... don't you need to prove that the chosen element of the original constant vector isn't used? Consider a shuffle mask like "<8 x i16> <0, 1, 2, 3, 8, 9, 10, 11>" followed by an insertion at index 0. efriedma: I'm not following this logic... don't you need to prove that the chosen element of the original…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Yes, this is just wrong. I was only thinking about the same-lane / blend pattern in the motivating example. spatel: Yes, this is just wrong. I was only thinking about the same-lane / blend pattern in the…
		!match(InsElt.getOperand(2), m_ConstantInt(InsEltIndex)))
		return nullptr;

		// Adding an element to an arbitrary shuffle could be expensive, but a shuffle
		// that selects elements from vectors without changing lanes is assumed cheap.
		// If we're just adding a constant into that shuffle, it will still be cheap.
		if (!isShuffleEquivalentToSelect(*Shuf))
		return nullptr;

		// From the above 'select' check, we know that the mask has the same number of
		// elements as the vector input operands. We also know that each constant
		// input element is used in its lane and can not be used more than once by the
		// shuffle. Therefore, replace the constant in the shuffle's constant vector
		// with the insertelt constant. Replace the constant in the shuffle's mask
		// vector with the insertelt index plus the length of the vector (because the
		// constant vector operand of a shuffle is always the 2nd operand).
		Constant *Mask = Shuf->getMask();
		unsigned NumElts = Mask->getType()->getVectorNumElements();
		SmallVector<Constant*, 16> NewShufElts(NumElts);
		SmallVector<Constant*, 16> NewMaskElts(NumElts);
		for (unsigned i = 0; i != NumElts; ++i) {
		if (i == InsEltIndex) {
		NewShufElts[i] = InsEltScalar;
		Type *Int32Ty = Type::getInt32Ty(Shuf->getContext());
		NewMaskElts[i] = ConstantInt::get(Int32Ty, InsEltIndex + NumElts);
		} else {
		// Copy over the existing values.
		NewShufElts[i] = ShufConstVec->getAggregateElement(i);
		NewMaskElts[i] = Mask->getAggregateElement(i);
		}
		}

		// Create new operands for a shuffle that includes the constant of the
		// original insertelt. The old shuffle will be dead now.
		Constant *NewShufVec = ConstantVector::get(NewShufElts);
		Constant *NewMask = ConstantVector::get(NewMaskElts);
		return new ShuffleVectorInst(Shuf->getOperand(0), NewShufVec, NewMask);
		}

Instruction *InstCombiner::visitInsertElementInst(InsertElementInst &IE) {		Instruction *InstCombiner::visitInsertElementInst(InsertElementInst &IE) {
Value *VecOp = IE.getOperand(0);		Value *VecOp = IE.getOperand(0);
Value *ScalarOp = IE.getOperand(1);		Value *ScalarOp = IE.getOperand(1);
Value *IdxOp = IE.getOperand(2);		Value *IdxOp = IE.getOperand(2);

// Inserting an undef or into an undefined place, remove this.		// Inserting an undef or into an undefined place, remove this.
if (isa<UndefValue>(ScalarOp) \|\| isa<UndefValue>(IdxOp))		if (isa<UndefValue>(ScalarOp) \|\| isa<UndefValue>(IdxOp))
replaceInstUsesWith(IE, VecOp);		replaceInstUsesWith(IE, VecOp);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitInsertElementInst(InsertElementInst &IE) {
APInt UndefElts(VWidth, 0);		APInt UndefElts(VWidth, 0);
APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));		APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));
if (Value *V = SimplifyDemandedVectorElts(&IE, AllOnesEltMask, UndefElts)) {		if (Value *V = SimplifyDemandedVectorElts(&IE, AllOnesEltMask, UndefElts)) {
if (V != &IE)		if (V != &IE)
return replaceInstUsesWith(IE, V);		return replaceInstUsesWith(IE, V);
return &IE;		return &IE;
}		}

		if (Instruction *Shuf = foldConstantInsEltIntoShuffle(IE))
		return Shuf;

return nullptr;		return nullptr;
}		}

/// Return true if we can evaluate the specified expression tree if the vector		/// Return true if we can evaluate the specified expression tree if the vector
/// elements were shuffled in a different order.		/// elements were shuffled in a different order.
static bool CanEvaluateShuffled(Value *V, ArrayRef<int> Mask,		static bool CanEvaluateShuffled(Value *V, ArrayRef<int> Mask,
unsigned Depth = 5) {		unsigned Depth = 5) {
// We can always reorder the elements of a constant.		// We can always reorder the elements of a constant.
▲ Show 20 Lines • Show All 639 Lines • Show Last 20 Lines

test/Transforms/InstCombine/insert-const-shuf.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -instcombine %s \| FileCheck %s			; RUN: opt -S -instcombine %s \| FileCheck %s

	; TODO: Eliminate the insertelement.			; Eliminate the insertelement.

	define <4 x float> @PR29126(<4 x float> %x) {			define <4 x float> @PR29126(<4 x float> %x) {
	; CHECK-LABEL: @PR29126(			; CHECK-LABEL: @PR29126(
	; CHECK-NEXT: [[SHUF:%.*]] = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.000000e+00, float 2.000000e+00, float undef>, <4 x i32> <i32 0, i32 5, i32 6, i32 undef>			; CHECK-NEXT: [[INS:%.*]] = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.000000e+00, float 2.000000e+00, float 4.200000e+01>, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[INS:%.*]] = insertelement <4 x float> [[SHUF]], float 4.200000e+01, i32 3
	; CHECK-NEXT: ret <4 x float> [[INS]]			; CHECK-NEXT: ret <4 x float> [[INS]]
	;			;
	%shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32> <i32 0, i32 5, i32 6, i32 3>			%shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
	%ins = insertelement <4 x float> %shuf, float 42.0, i32 3			%ins = insertelement <4 x float> %shuf, float 42.0, i32 3
	ret <4 x float> %ins			ret <4 x float> %ins
	}			}

	; TODO: A chain of inserts should collapse.			; A chain of inserts should collapse.

	define <4 x float> @twoInserts(<4 x float> %x) {			define <4 x float> @twoInserts(<4 x float> %x) {
	; CHECK-LABEL: @twoInserts(			; CHECK-LABEL: @twoInserts(
	; CHECK-NEXT: [[SHUF:%.*]] = shufflevector <4 x float> %x, <4 x float> <float undef, float 0.000000e+00, float undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef>			; CHECK-NEXT: [[INS2:%.*]] = shufflevector <4 x float> %x, <4 x float> <float undef, float 0.000000e+00, float 4.200000e+01, float 1.100000e+01>, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[INS1:%.*]] = insertelement <4 x float> [[SHUF]], float 4.200000e+01, i32 2
	; CHECK-NEXT: [[INS2:%.*]] = insertelement <4 x float> [[INS1]], float 1.100000e+01, i32 3
	; CHECK-NEXT: ret <4 x float> [[INS2]]			; CHECK-NEXT: ret <4 x float> [[INS2]]
	;			;
	%shuf = shufflevector <4 x float> %x, <4 x float> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 3>			%shuf = shufflevector <4 x float> %x, <4 x float> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
	%ins1 = insertelement <4 x float> %shuf, float 42.0, i32 2			%ins1 = insertelement <4 x float> %shuf, float 42.0, i32 2
	%ins2 = insertelement <4 x float> %ins1, float 11.0, i32 3			%ins2 = insertelement <4 x float> %ins1, float 11.0, i32 3
	ret <4 x float> %ins2			ret <4 x float> %ins2
	}			}

	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines