This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] fold insertelement of constant into shuffle with constant operand (PR29126)
ClosedPublic

Authored by spatel on Aug 25 2016, 12:03 PM.

Download Raw Diff

Details

Reviewers

majnemer
hfinkel
efriedma

Commits

rG521f19f2498e: [InsttCombine] fold insertelement of constant into shuffle with constant…
rL280504: [InsttCombine] fold insertelement of constant into shuffle with constant…

Summary

We can see chains of insertelement instructions before SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector.

This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126

Diff Detail

Event Timeline

spatel updated this revision to Diff 69278.Aug 25 2016, 12:03 PM

spatel retitled this revision from to [InstCombine] fold insertelement of constant into shuffle with constant operand (PR29126).

spatel updated this object.

spatel added reviewers: majnemer, efriedma, hfinkel.

spatel added subscribers: RKSimon, ABataev.

Herald added a subscriber: mcrosier. · View Herald TranscriptAug 25 2016, 12:03 PM

This seems vaguely risky: if the backend isn't smart enough to pattern-match an arbitrary shuffle into a cheap shuffle + insert, you could end up making the code slower. (For example, I'm not sure the x86 backend can decompose an <8 x i16> shuffle into a punpcklwd+pinsrw in general.)

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
602	I'm not following this logic... don't you need to prove that the chosen element of the original constant vector isn't used? Consider a shuffle mask like "<8 x i16> <0, 1, 2, 3, 8, 9, 10, 11>" followed by an insertion at index 0.

In D23886#526485, @efriedma wrote:

This seems vaguely risky: if the backend isn't smart enough to pattern-match an arbitrary shuffle into a cheap shuffle + insert, you could end up making the code slower. (For example, I'm not sure the x86 backend can decompose an <8 x i16> shuffle into a punpcklwd+pinsrw in general.)

Let me write up some more tests and pass them on to a few different targets. For the x86 cases I looked at, this appeared to always be a win, but I didn't try any i16 tests.

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
602	Yes, this is just wrong. I was only thinking about the same-lane / blend pattern in the motivating example.

This leads to a question that was raised in D22114. Which is canonical: a shufflevector with a select-equivalent mask or a select with a constant condition operand?

Given:

define <4 x i8> @hoo(<4 x i8> %x) {
  %y = shufflevector <4 x i8> %x, <4 x i8> <i8 undef, i8 5, i8 6, i8 7>, <4 x i32><i32 0, i32 7, i32 6, i32 5> ; lane-changing
  ret <4 x i8> %y
}

Should we transform to:

define <4 x i8> @hoo(<4 x i8> %x) {
  %y = shufflevector <4 x i8> %x, <4 x i8> <i8 undef, i8 7, i8 6, i8 5>, <4 x i32><i32 0, i32 5, i32 6, i32 7> ; lane-preserving
  ret <4 x i8> %y
}

or:

define <4 x i8> @hoo(<4 x i8> %x) {
  %y = select <4 x i1> <i1 true, i1 false, i1 false, i1 false>, <4 x i8> %x, <4 x i8> <i8 undef, i8 7, i8 6, i8 5>
  ret <4 x i8> %y
}

IR should be canonical, so we should add transforms to make one of the latter the preferred form?

Probably a good idea to bring up the shuffle vs select discussion on llvmdev, to get more visibility; it will have a substantial impact on backends.

Patch updated:
We seem to have consensus about canonicalizing a vector select with a constant condition operand to a shuffle, but I'll give it a bit more time before making a patch for that.

Even without that step in place, I think we can push this patch forward by limiting the transform to select-equivalent shuffles.

We still get the motivating case.
The previous logic was incorrect for the general shuffle case, but it should work for this limited form of shuffle.
We add a bit of shuffle <--> select plumbing via isShuffleEquivalentToSelect() which may be useful for subsequent patches too.

This is looking much better.

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
590	I think you can write the body of this loop as "int EltVal = Shuf.getMaskValue(i); if (EltVal != -1 && EltVal != i && EltVal != i + Vecsize) return false;".

spatel added inline comments.Sep 2 2016, 6:49 AM

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
590	Nice - not sure how I missed that method up to now!

Patch updated:
Use Shuf.getMaskValue(i) to simplify the code.

spatel added a child revision: D24182: [InstCombine] Fix for PR29124: reduce insertelements to shufflevector.Sep 2 2016, 9:11 AM

LGTM.

This revision is now accepted and ready to land.Sep 2 2016, 9:12 AM

spatel mentioned this in D24182: [InstCombine] Fix for PR29124: reduce insertelements to shufflevector.Sep 2 2016, 9:23 AM

Closed by commit rL280504: [InsttCombine] fold insertelement of constant into shuffle with constant… (authored by spatel). · Explain WhySep 2 2016, 10:13 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineVectorOps.cpp

53 lines

test/

Transforms/

InstCombine/

insert-const-shuf.ll

68 lines

Diff 69278

lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 560 Lines • ▼ Show 20 Lines	while (V->hasOneUse() && Depth < 10) {
Depth++;		Depth++;
}		}

if (IsRedundant)		if (IsRedundant)
return replaceInstUsesWith(I, I.getOperand(0));		return replaceInstUsesWith(I, I.getOperand(0));
return nullptr;		return nullptr;
}		}

		/// insertelt (shufflevector X, CVec, Mask), C, CIndex -->
		/// shufflevector X, CVec', Mask'
		static Instruction *foldConstantInsEltIntoShuffle(InsertElementInst &InsElt) {
		// Bail out if the shuffle has more than one use. In that case, we'd be
		// replacing the insertelt with a shuffle, and that's not a clear win.
		auto *Shuf = dyn_cast<ShuffleVectorInst>(InsElt.getOperand(0));
		if (!Shuf \|\| !Shuf->hasOneUse())
		return nullptr;

		// The shuffle must have a constant vector operand. The insertelt must have a
		// constant scalar being inserted at a constant position in the vector.
		Constant ShufConstVec, InsEltScalar;
		uint64_t InsEltIndex;
		if (!match(Shuf->getOperand(1), m_Constant(ShufConstVec)) \|\|
		!match(InsElt.getOperand(1), m_Constant(InsEltScalar)) \|\|
		!match(InsElt.getOperand(2), m_ConstantInt(InsEltIndex)))
		return nullptr;

		// TODO: This restriction could be loosened to handle a shuffle with a mask
		// that has a shorter length than its vector operands.
		Constant *Mask = Shuf->getMask();
		unsigned NumElts = Mask->getType()->getVectorNumElements();
		efriedmaUnsubmitted Not Done Reply Inline Actions I think you can write the body of this loop as "int EltVal = Shuf.getMaskValue(i); if (EltVal != -1 && EltVal != i && EltVal != i + Vecsize) return false;". efriedma: I think you can write the body of this loop as "int EltVal = Shuf.getMaskValue(i); if (EltVal !
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Nice - not sure how I missed that method up to now! spatel: Nice - not sure how I missed that method up to now!
		if (ShufConstVec->getType()->getVectorNumElements() != NumElts)
		return nullptr;

		// Replace the constant in the shuffle's constant vector with the insertelt
		// constant. Replace the constant in the shuffle's mask vector with the
		// insertelt index plus the length of the vector (because the constant vector
		// operand of the shuffle is always the 2nd operand).
		SmallVector<Constant*, 16> NewShufElts(NumElts);
		SmallVector<Constant*, 16> NewMaskElts(NumElts);
		for (unsigned i = 0; i != NumElts; ++i) {
		if (i == InsEltIndex) {
		NewShufElts[i] = InsEltScalar;
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not following this logic... don't you need to prove that the chosen element of the original constant vector isn't used? Consider a shuffle mask like "<8 x i16> <0, 1, 2, 3, 8, 9, 10, 11>" followed by an insertion at index 0. efriedma: I'm not following this logic... don't you need to prove that the chosen element of the original…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Yes, this is just wrong. I was only thinking about the same-lane / blend pattern in the motivating example. spatel: Yes, this is just wrong. I was only thinking about the same-lane / blend pattern in the…
		Type *Int32Ty = Type::getInt32Ty(Shuf->getContext());
		NewMaskElts[i] = ConstantInt::get(Int32Ty, InsEltIndex + NumElts);
		} else {
		// Copy over the existing values.
		NewShufElts[i] = ShufConstVec->getAggregateElement(i);
		NewMaskElts[i] = Mask->getAggregateElement(i);
		}
		}

		// Create new operands for a shuffle that includes the constant of the
		// original insertelt. The old shuffle will be dead now.
		Constant *NewShufVec = ConstantVector::get(NewShufElts);
		Constant *NewMask = ConstantVector::get(NewMaskElts);
		return new ShuffleVectorInst(Shuf->getOperand(0), NewShufVec, NewMask);
		}

Instruction *InstCombiner::visitInsertElementInst(InsertElementInst &IE) {		Instruction *InstCombiner::visitInsertElementInst(InsertElementInst &IE) {
Value *VecOp = IE.getOperand(0);		Value *VecOp = IE.getOperand(0);
Value *ScalarOp = IE.getOperand(1);		Value *ScalarOp = IE.getOperand(1);
Value *IdxOp = IE.getOperand(2);		Value *IdxOp = IE.getOperand(2);

// Inserting an undef or into an undefined place, remove this.		// Inserting an undef or into an undefined place, remove this.
if (isa<UndefValue>(ScalarOp) \|\| isa<UndefValue>(IdxOp))		if (isa<UndefValue>(ScalarOp) \|\| isa<UndefValue>(IdxOp))
replaceInstUsesWith(IE, VecOp);		replaceInstUsesWith(IE, VecOp);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitInsertElementInst(InsertElementInst &IE) {
APInt UndefElts(VWidth, 0);		APInt UndefElts(VWidth, 0);
APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));		APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));
if (Value *V = SimplifyDemandedVectorElts(&IE, AllOnesEltMask, UndefElts)) {		if (Value *V = SimplifyDemandedVectorElts(&IE, AllOnesEltMask, UndefElts)) {
if (V != &IE)		if (V != &IE)
return replaceInstUsesWith(IE, V);		return replaceInstUsesWith(IE, V);
return &IE;		return &IE;
}		}

		if (Instruction *Shuf = foldConstantInsEltIntoShuffle(IE))
		return Shuf;

return nullptr;		return nullptr;
}		}

/// Return true if we can evaluate the specified expression tree if the vector		/// Return true if we can evaluate the specified expression tree if the vector
/// elements were shuffled in a different order.		/// elements were shuffled in a different order.
static bool CanEvaluateShuffled(Value *V, ArrayRef<int> Mask,		static bool CanEvaluateShuffled(Value *V, ArrayRef<int> Mask,
unsigned Depth = 5) {		unsigned Depth = 5) {
// We can always reorder the elements of a constant.		// We can always reorder the elements of a constant.
▲ Show 20 Lines • Show All 639 Lines • Show Last 20 Lines

test/Transforms/InstCombine/insert-const-shuf.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -instcombine %s \| FileCheck %s

				; Eliminate the insertelement.

				define <4 x float> @PR29126(<4 x float> %x) {
				; CHECK-LABEL: @PR29126(
				; CHECK-NEXT: [[INS:%.*]] = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.000000e+00, float 2.000000e+00, float 4.200000e+01>, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
				; CHECK-NEXT: ret <4 x float> [[INS]]
				;
				%shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
				%ins = insertelement <4 x float> %shuf, float 42.0, i32 3
				ret <4 x float> %ins
				}

				define <4 x float> @twoInserts(<4 x float> %x) {
				; CHECK-LABEL: @twoInserts(
				; CHECK-NEXT: [[INS2:%.*]] = shufflevector <4 x float> %x, <4 x float> <float undef, float 0.000000e+00, float 4.200000e+01, float 1.100000e+01>, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
				; CHECK-NEXT: ret <4 x float> [[INS2]]
				;
				%shuf = shufflevector <4 x float> %x, <4 x float> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
				%ins1 = insertelement <4 x float> %shuf, float 42.0, i32 2
				%ins2 = insertelement <4 x float> %ins1, float 11.0, i32 3
				ret <4 x float> %ins2
				}

				; Don't transform insert to shuffle if the original shuffle is not removed.
				; TODO: Ease the use restriction if the insert scalar would simplify the shuffle to a full vector constant?

				define <3 x float> @twoShufUses(<3 x float> %x) {
				; CHECK-LABEL: @twoShufUses(
				; CHECK-NEXT: [[SHUF:%.*]] = shufflevector <3 x float> %x, <3 x float> <float undef, float 1.000000e+00, float 2.000000e+00>, <3 x i32> <i32 0, i32 4, i32 5>
				; CHECK-NEXT: [[INS:%.*]] = insertelement <3 x float> [[SHUF]], float 4.200000e+01, i2 1
				; CHECK-NEXT: [[ADD:%.*]] = fadd <3 x float> [[SHUF]], [[INS]]
				; CHECK-NEXT: ret <3 x float> [[ADD]]
				;
				%shuf = shufflevector <3 x float> %x, <3 x float> <float undef, float 1.0, float 2.0>, <3 x i32> <i32 0, i32 4, i32 5>
				%ins = insertelement <3 x float> %shuf, float 42.0, i2 1
				%add = fadd <3 x float> %shuf, %ins
				ret <3 x float> %add
				}

				; The inserted scalar constant index is out-of-bounds for the shuffle vector constant.

				define <5 x i8> @longerMask(<3 x i8> %x) {
				; CHECK-LABEL: @longerMask(
				; CHECK-NEXT: [[SHUF:%.*]] = shufflevector <3 x i8> %x, <3 x i8> <i8 undef, i8 1, i8 undef>, <5 x i32> <i32 2, i32 1, i32 4, i32 undef, i32 undef>
				; CHECK-NEXT: [[INS:%.*]] = insertelement <5 x i8> [[SHUF]], i8 42, i17 4
				; CHECK-NEXT: ret <5 x i8> [[INS]]
				;
				%shuf = shufflevector <3 x i8> %x, <3 x i8> <i8 undef, i8 1, i8 2>, <5 x i32> <i32 2, i32 1, i32 4, i32 3, i32 0>
				%ins = insertelement <5 x i8> %shuf, i8 42, i17 4
				ret <5 x i8> %ins
				}

				; TODO: The inserted constant could get folded into the shuffle vector constant.

				define <3 x i8> @shorterMask(<5 x i8> %x) {
				; CHECK-LABEL: @shorterMask(
				; CHECK-NEXT: [[SHUF:%.*]] = shufflevector <5 x i8> %x, <5 x i8> undef, <3 x i32> <i32 undef, i32 1, i32 4>
				; CHECK-NEXT: [[INS:%.*]] = insertelement <3 x i8> [[SHUF]], i8 42, i21 0
				; CHECK-NEXT: ret <3 x i8> [[INS]]
				;
				%shuf = shufflevector <5 x i8> %x, <5 x i8> <i8 undef, i8 1, i8 2, i8 3, i8 4>, <3 x i32> <i32 2, i32 1, i32 4>
				%ins = insertelement <3 x i8> %shuf, i8 42, i21 0
				ret <3 x i8> %ins
				}