This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
2
CodeGenerator.rst
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineVectorOps.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
vec_shuffle.ll

Differential D62024

[InstCombine] fold shuffles of insert_subvectors
ClosedPublic

Authored by spatel on May 16 2019, 12:04 PM.

Download Raw Diff

Details

Reviewers

lebedev.ri
RKSimon
efriedma

Commits

rZORG34a0b086d347: [InstCombine] fold shuffles of insert_subvectors
rG34a0b086d347: [InstCombine] fold shuffles of insert_subvectors
rL361338: [InstCombine] fold shuffles of insert_subvectors
rG6a554188aaef: [InstCombine] fold shuffles of insert_subvectors

Summary

This should be a valid exception to the general rule of not creating new shuffle masks in IR...because we already do it. :)
Also AFAICT, DAG combining/legalization can undo this by widening the shuffle back out if needed.

Explanation for how we already do this: SLP or vector source can create chains of insert/extract as shown in 1 of the examples from PR16739:
https://godbolt.org/z/NlK7rA
https://bugs.llvm.org/show_bug.cgi?id=16739

And we expect instcombine or DAGCombine to clean that up by creating relatively simple shuffles. But instcombine has a strange way of dealing with this and which can fail with multiple uses - grep for:

// TODO: Looking at the user(s) to determine if this insert is a
// fold-to-shuffle opportunity does not match the usual instcombine
// constraints.

So this patch is proposing to reduce that restriction by matching a sub-pattern of shuffles. We would have folded that anyway if they had appeared in an alternate form (as a sequence of insert/extract).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.May 16 2019, 12:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2019, 12:04 PM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

Can we make a list somewhere of the shuffles which all targets with vector types are expected to support?

In D62024#1505318, @efriedma wrote:

Can we make a list somewhere of the shuffles which all targets with vector types are expected to support?

Sure - LangRef or less formal? eg, doxygen code comments for InstCombiner::visitShuffleVectorInst()?

LangRef isn't really appropriate; it's not an issue of correctness. (All targets support all shuffles; it's just that some of them generate terrible code.)

Maybe doxygen on InstCombiner::visitShuffleVectorInst is okay. Not sure where else we would put it; maybe CodeGenerator.rst, but that doesn't seem great.

Patch updated:
Added descriptive text to CodeGenerator.rst to document our (previously unstated?) rules about creating shuffles in IR.
The docs folder seems easier to point to than a doxygen comment, but I'm open to suggestions. We could duplicate text or link to the docs file within the source code to make it more likely to be found. Also, the text is likely not completely accurate, so any refinement ideas are welcome.

efriedma added inline comments.May 16 2019, 4:20 PM

llvm/docs/CodeGenerator.rst
932	This description is useful from the perspective from the optimizer, but it would probably also be a good idea to mention what this means in practice from the perspective of the backend. For shuffles where operand and result types are all the same, and legal, that's pretty straightforward, but some of these are more complicated. Can a splat change the vector length, and if so, what DAG nodes does that produce? What DAG nodes are we actually talking about for an "insert subvector"? Also, which of those categories does the proposed transform fall under?

Revised documentation text: tried to make this backend-centric by specifying the mapping from IR to SDAG opcodes.

Also, which of those categories does the proposed transform fall under?

This transform relies on expanding the shuffle of narrowed operands back to:
shuffle_vector (insert_subvector undef, X, 0), (insert_subvector undef, Y, 0), Mask'

Here, I'm assuming insertion into 'undef' rather than an arbitrary base vector, so it's an easier operation to support than what I wrote in this rev of the doc text. I'm not sure yet if inserting at index 0 into an arbitrary vector is actually handled well enough by all targets, but that seems like a common enough vector op? It's a fuzzy spec because it can change as we add DAGCombiner rules.

This transform relies on expanding the shuffle of narrowed operands back to

I think that qualifies as its own category; you're basically saying that certain shuffles should be equivalent to certain other shuffles, which is sort of orthogonal to whether any particular shuffle is legal.

In D62024#1506977, @efriedma wrote:

This transform relies on expanding the shuffle of narrowed operands back to

I think that qualifies as its own category; you're basically saying that certain shuffles should be equivalent to certain other shuffles, which is sort of orthogonal to whether any particular shuffle is legal.

I'm struggling to find words to describe this without opening the door completely to arbitrary shuffle combining. Any suggestions are welcome. :)

The motivating transform that led me here is the last test diff (ends up as just a single insert_subvector), so I could restrict the transform further, but it seemed like the more general fold would be beneficial and reversible if necessary.

In terms of describing it, what is the backend actually going to see?

If both the source and destination types are legal, the rule is a sort of a "truncated shuffle rule"; essentially, given vectors with the same element type, the shuffle mask <x,y,z,w> should have roughly the same cost as the shuffle mask <x,y,z,w,undef,undef,undef,undef>. (This isn't a great description because it sort of implies that the mask constants would be the same, but you get the idea.) This is probably reasonable to expect on most targets that support multiple vector widths, but it does require some implementation work. IIRC I did some optimization work related to this for AArch64.

If the destination type is legal, and the source type would be "widened" (in vector legalization terms) to the destination type, then effectively the two shuffles are the same, and the backend doesn't really need to be aware, I think.

If the destination type is legal, and the source type would be "promoted" (in vector legalization terms) to the destination type, then I'm not sure what rule would allow the transform: the shuffles aren't really related in any obvious way.

I won't try to enumerate the various cases where the destination type is illegal; I'm guessing it generally breaks down to one of the above cases somehow.

RKSimon added inline comments.May 18 2019, 5:52 AM

llvm/docs/CodeGenerator.rst
922	"Each element of the vector is chosen from either of the corresponding elements of the 2 input vectors."

In D62024#1507180, @efriedma wrote:

In terms of describing it, what is the backend actually going to see?

Good question - I stepped through some examples, and I might've been overthinking this. We assume that the original shuffle was safe/legal because nothing in the IR optimizer should create anything difficult. So the initial DAG for these patterns appears to be the same independently of this IR transform:

Before:
define <4 x float> @widen_shuffles(<2 x float> %x, <2 x float> %y) {
  %s1 = shufflevector <2 x float> %x, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> ; concat with undef elements
  %s2 = shufflevector <2 x float> %y, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> ; concat with undef elements
  %s3 = shufflevector <4 x float> %s2, <4 x float> %s1, <4 x i32> <i32 1, i32 5, i32 4, i32 0> ; arbitrary shuffle
  ret <4 x float> %s3
}

$ llc -o - -debug
...
Initial selection DAG: %bb.0 'widen_shuffles:'
        t9: v4f32 = concat_vectors t6, undef:v2f32
        t8: v4f32 = concat_vectors t3, undef:v2f32
      t10: v4f32 = vector_shuffle<1,5,4,0> t9, t8

After:
define <4 x float> @widen_shuffles(<2 x float> %x, <2 x float> %y) {
  %s3 = shufflevector <2 x float> %y, <2 x float> %x, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
  ret <4 x float> %s3
}

$ llc -o - -debug
...
Initial selection DAG: %bb.0 'widen_shuffles:'
        t8: v4f32 = concat_vectors t6, undef:v2f32
        t9: v4f32 = concat_vectors t3, undef:v2f32
      t10: v4f32 = vector_shuffle<1,5,4,0> t8, t9

So IIUC, this is always the 1st option that you listed: legal destination type with vector widening, so the backend shouldn't ever see a difference.

Patch updated:

Improved doc text for 'shuffle select' as suggested.
Adjusted code comment to say the transform is expected to be expanded back to original form in the backend.

So IIUC, this is always the 1st option that you listed: legal destination type with vector widening, so the backend shouldn't ever see a difference.

Did you try any testcases with integer types? floats can't be "promoted", but integers can.

In D62024#1508980, @efriedma wrote:

So IIUC, this is always the 1st option that you listed: legal destination type with vector widening, so the backend shouldn't ever see a difference.

Did you try any testcases with integer types? floats can't be "promoted", but integers can.

Yes - looks like the same concat_vectors pattern to me:

  t9: v4i32 = concat_vectors t6, undef:v2i32
  t8: v4i32 = concat_vectors t5, undef:v2i32
t10: v4i32 = vector_shuffle<1,5,4,0> t9, t8

  t9: v8i16 = concat_vectors t7, undef:v4i16
  t10: v8i16 = concat_vectors t6, undef:v4i16
t11: v8i16 = vector_shuffle<1,9,10,0,1,1,11,3> t9, t10

I also tried with an odd input type (<3 x i32>) to a legal type (<4x i32>), and that has extra ops on initial DAG, but it's still identical between the 2 proposed IR variants:

      t7: v3i32 = extract_subvector t4, Constant:i64<0>
    t9: v6i32 = concat_vectors t7, undef:v3i32
      t6: v3i32 = extract_subvector t2, Constant:i64<0>
    t10: v6i32 = concat_vectors t6, undef:v3i32
  t11: v6i32 = vector_shuffle<1,7,6,0,u,u> t9, t10
t12: v4i32 = extract_subvector t11, Constant:i64<0>

I need to update my local repo to confirm, but this case exposes an existing crash for AArch64 codegen.

I tried llc with the before/after of the <7 x i8> testcase from the patch on x86-64, and I'm seeing different shuffles. I'm not sure why, though; the type v9i8 is involved for some non-obvious reason. Full testcase:

define <7 x i8> @insert_subvector_shuffles(<3 x i8> %x, <3 x i8> %y) {
  %s1 = shufflevector <3 x i8> %x, <3 x i8> undef, <7 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %s2 = shufflevector <3 x i8> %y, <3 x i8> undef, <7 x i32> <i32 undef, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>
  %s3 = shufflevector <7 x i8> %s1, <7 x i8> %s2, <7 x i32> <i32 0, i32 8, i32 1, i32 undef, i32 8, i32 1, i32 9>
  ret <7 x i8> %s3
}
define <7 x i8> @insert_subvector_shuffles2(<3 x i8> %x, <3 x i8> %y) {
  %s3 = shufflevector <3 x i8> %x, <3 x i8> %y, <7 x i32> <i32 0, i32 4, i32 1, i32 undef, i32 4, i32 1, i32 5>
  ret <7 x i8> %s3
}

In D62024#1509044, @efriedma wrote:

define <7 x i8> @insert_subvector_shuffles(<3 x i8> %x, <3 x i8> %y) {
  %s1 = shufflevector <3 x i8> %x, <3 x i8> undef, <7 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %s2 = shufflevector <3 x i8> %y, <3 x i8> undef, <7 x i32> <i32 undef, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>
  %s3 = shufflevector <7 x i8> %s1, <7 x i8> %s2, <7 x i32> <i32 0, i32 8, i32 1, i32 undef, i32 8, i32 1, i32 9>
  ret <7 x i8> %s3
}
define <7 x i8> @insert_subvector_shuffles2(<3 x i8> %x, <3 x i8> %y) {
  %s3 = shufflevector <3 x i8> %x, <3 x i8> %y, <7 x i32> <i32 0, i32 4, i32 1, i32 undef, i32 4, i32 1, i32 5>
  ret <7 x i8> %s3
}

That's bizarre. We use "alignTo(7, 3)":

/// Returns the next integer (mod 2**64) that is greater than or equal to
/// \p Value and is a multiple of \p Align. \p Align must be non-zero.

But for the purposes of this transform, we don't have to worry about weird types, right? We get to assume that the destination type (the original wide shuffle type) was intentionally created in source or was validated by TTI in the vectorizers.

We use "alignTo(7, 3)":

Hmm. It looks like the idea is that we should perform the shuffle in some type that's a multiple of the source vectors, so we can construct the source operands using CONCAT_VECTORS. Not sure why we aren't just generating an INSERT_SUBVECTOR into undef, though.

But for the purposes of this transform, we don't have to worry about weird types, right?

Depends on what you mean by "weird"; AMDGPU has legal non-power-of-two vector types. But yes, for the purpose of preserving shuffles, we're generally trying to preserve shuffles that were legal in the first place, which means either the shuffle is "trivial", like the ones you've listed, or the destination type is legal (or maybe some multiple of a legal type, I guess).

I had one more thought, which I'm not sure we really covered before: what happens if the destination type of the final shuffle is narrower than the intermediate types? I guess the reasoning ends up mostly the same, though.

In D62024#1509309, @efriedma wrote:

We use "alignTo(7, 3)":

Hmm. It looks like the idea is that we should perform the shuffle in some type that's a multiple of the source vectors, so we can construct the source operands using CONCAT_VECTORS. Not sure why we aren't just generating an INSERT_SUBVECTOR into undef, though.

INSERT_SUBVECTOR makes sense to me, but I just tried that experiment and...fallout: x86 has regressions/crashing, AMDGPU is infinite looping.

Okay, so my current understanding is the following: the invariant we want for this patch is that given an IR shufflevector with a given destination type and shuffle mask, we produce an ISD::VECTOR_SHUFFLE with the same destination type and shuffle mask. If that's true, we can change the source type without causing issues. If the source and destination vectors both have both power-of-two length, SelectionDAG does in fact preserve this invariant. But in general, it doesn't.

I don't want to create extra work in the way of an incremental improvement, but I don't think we can merge this patch given the current state of SelectionDAG. Maybe if would be okay if you restrict it to power-of-two vectors for now?

In D62024#1510707, @efriedma wrote:

Okay, so my current understanding is the following: the invariant we want for this patch is that given an IR shufflevector with a given destination type and shuffle mask, we produce an ISD::VECTOR_SHUFFLE with the same destination type and shuffle mask. If that's true, we can change the source type without causing issues. If the source and destination vectors both have both power-of-two length, SelectionDAG does in fact preserve this invariant. But in general, it doesn't.

I don't want to create extra work in the way of an incremental improvement, but I don't think we can merge this patch given the current state of SelectionDAG. Maybe if would be okay if you restrict it to power-of-two vectors for now?

Sure - that would still catch the motivating cases that I saw. I only included the prime number vector sizes to verify that the shuffle mask adjustment was working as intended. I'll add more tests and limit this.

spatel mentioned this in rG3590bae8d6fa: [InstCombine] add more tests for shuffle folding; NFC.May 21 2019, 2:45 PM

spatel mentioned this in rL361317: [InstCombine] add more tests for shuffle folding; NFC.

Patch updated:
Added power-of-2 element checks for all types, so we can be more confident that there are no codegen regressions.

LGTM

This revision is now accepted and ready to land.May 21 2019, 4:01 PM

Closed by commit rG6a554188aaef: [InstCombine] fold shuffles of insert_subvectors (authored by spatel). · Explain WhyMay 21 2019, 5:31 PM

This revision was automatically updated to reflect the committed changes.

sidorovd mentioned this in rGe5fb77500460: [InstCombine] add more tests for shuffle folding; NFC.May 30 2019, 10:55 AM

Revision Contents

Path

Size

llvm/

docs/

CodeGenerator.rst

25 lines

lib/

Transforms/

InstCombine/

InstCombineVectorOps.cpp

53 lines

test/

Transforms/

InstCombine/

vec_shuffle.ll

15 lines

Diff 200611

llvm/docs/CodeGenerator.rst

	Show First 20 Lines • Show All 906 Lines • ▼ Show 20 Lines
	emulate the operation ("expansion"), by promoting one type to a larger type that			emulate the operation ("expansion"), by promoting one type to a larger type that
	supports the operation ("promotion"), or by using a target-specific hook to			supports the operation ("promotion"), or by using a target-specific hook to
	implement the legalization ("custom").			implement the legalization ("custom").

	A target implementation tells the legalizer which operations are not supported			A target implementation tells the legalizer which operations are not supported
	(and which of the above three actions to take) by calling the			(and which of the above three actions to take) by calling the
	``setOperationAction`` method in its ``TargetLowering`` constructor.			``setOperationAction`` method in its ``TargetLowering`` constructor.

				If a target has legal vector types, it is expected to produce efficient machine
				code for common forms of the shufflevector IR instruction using those types.
				This may require custom legalization for SelectionDAG vector operations that
				are created from the shufflevector IR. The shufflevector forms that should be
				handled include:

				* Vector select --- Each element of the vector is chosen from either of the
				corresponding elements of the 2 input vectors. This operation may also be
				RKSimonUnsubmitted Not Done Reply Inline Actions "Each element of the vector is chosen from either of the corresponding elements of the 2 input vectors." RKSimon: "Each element of the vector is chosen from either of the corresponding elements of the 2 input…
				known as a "blend" or "bitwise select" in target assembly. This type of shuffle
				maps directly to the ``shuffle_vector`` SelectionDAG node.

				* Insert subvector --- A vector is placed into a longer vector type starting
				at index 0. This type of shuffle maps directly to the ``insert_subvector``
				SelectionDAG node with the ``index`` operand set to 0.

				* Extract subvector --- A vector is pulled from a longer vector type starting
				at index 0. This type of shuffle maps directly to the ``extract_subvector``
				SelectionDAG node with the ``index`` operand set to 0.
				efriedmaUnsubmitted Not Done Reply Inline Actions This description is useful from the perspective from the optimizer, but it would probably also be a good idea to mention what this means in practice from the perspective of the backend. For shuffles where operand and result types are all the same, and legal, that's pretty straightforward, but some of these are more complicated. Can a splat change the vector length, and if so, what DAG nodes does that produce? What DAG nodes are we actually talking about for an "insert subvector"? Also, which of those categories does the proposed transform fall under? efriedma: This description is useful from the perspective from the optimizer, but it would probably also…

				* Splat --- All elements of the vector have identical scalar elements. This
				operation may also be known as a "broadcast" or "duplicate" in target assembly.
				The shufflevector IR instruction may change the vector length, so this operation
				may map to multiple SelectionDAG nodes including ``shuffle_vector``,
				``concat_vectors``, ``insert_subvector``, and ``extract_subvector``.

	Prior to the existence of the Legalize passes, we required that every target			Prior to the existence of the Legalize passes, we required that every target
	`selector`_ supported and handled every operator and type even if they are not			`selector`_ supported and handled every operator and type even if they are not
	natively supported. The introduction of the Legalize phases allows all of the			natively supported. The introduction of the Legalize phases allows all of the
	canonicalization patterns to be shared across targets, and makes it very easy to			canonicalization patterns to be shared across targets, and makes it very easy to
	optimize the canonicalized code because it is still in the form of a DAG.			optimize the canonicalized code because it is still in the form of a DAG.

	.. _optimizations:			.. _optimizations:
	.. _Optimize SelectionDAG:			.. _Optimize SelectionDAG:
	▲ Show 20 Lines • Show All 1,750 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 1,616 Lines • ▼ Show 20 Lines	static Instruction *foldShuffleWithInsert(ShuffleVectorInst &Shuf) {
std::swap(V0, V1);		std::swap(V0, V1);
ShuffleVectorInst::commuteShuffleMask(Mask, NumElts);		ShuffleVectorInst::commuteShuffleMask(Mask, NumElts);
if (isShufflingScalarIntoOp1(Scalar, IndexC))		if (isShufflingScalarIntoOp1(Scalar, IndexC))
return InsertElementInst::Create(V1, Scalar, IndexC);		return InsertElementInst::Create(V1, Scalar, IndexC);

return nullptr;		return nullptr;
}		}

		static Instruction *foldIdentityPaddedShuffles(ShuffleVectorInst &Shuf) {
		// Match the operands as identity with padding (also known as concatenation
		// with undef) shuffles of the same source type. The backend is expected to
		// recreate these concatenations from a shuffle of narrow operands.
		auto *Shuffle0 = dyn_cast<ShuffleVectorInst>(Shuf.getOperand(0));
		auto *Shuffle1 = dyn_cast<ShuffleVectorInst>(Shuf.getOperand(1));
		if (!Shuffle0 \|\| !Shuffle0->isIdentityWithPadding() \|\|
		!Shuffle1 \|\| !Shuffle1->isIdentityWithPadding())
		return nullptr;

		// We limit this transform to power-of-2 types because we expect that the
		// backend can convert the simplified IR patterns to identical nodes as the
		// original IR.
		// TODO: If we can verify that behavior for arbitrary types, the power-of-2
		// checks can be removed.
		Value *X = Shuffle0->getOperand(0);
		Value *Y = Shuffle1->getOperand(0);
		if (X->getType() != Y->getType() \|\|
		!isPowerOf2_32(Shuf.getType()->getVectorNumElements()) \|\|
		!isPowerOf2_32(Shuffle0->getType()->getVectorNumElements()) \|\|
		!isPowerOf2_32(X->getType()->getVectorNumElements()) \|\|
		isa<UndefValue>(X) \|\| isa<UndefValue>(Y))
		return nullptr;
		assert(isa<UndefValue>(Shuffle0->getOperand(1)) &&
		isa<UndefValue>(Shuffle1->getOperand(1)) &&
		"Unexpected operand for identity shuffle");

		// This is a shuffle of 2 widening shuffles. We can shuffle the narrow source
		// operands directly by adjusting the shuffle mask to account for the narrower
		// types:
		// shuf (widen X), (widen Y), Mask --> shuf X, Y, Mask'
		int NarrowElts = X->getType()->getVectorNumElements();
		int WideElts = Shuffle0->getType()->getVectorNumElements();
		assert(WideElts > NarrowElts && "Unexpected types for identity with padding");

		Type *I32Ty = IntegerType::getInt32Ty(Shuf.getContext());
		SmallVector<int, 16> Mask = Shuf.getShuffleMask();
		SmallVector<Constant *, 16> NewMask(Mask.size(), UndefValue::get(I32Ty));
		for (int i = 0, e = Mask.size(); i != e; ++i) {
		if (Mask[i] == -1)
		continue;
		if (Mask[i] < WideElts)
		NewMask[i] = ConstantInt::get(I32Ty, Mask[i]);
		else
		NewMask[i] = ConstantInt::get(I32Ty, Mask[i] - (WideElts - NarrowElts));
		}
		return new ShuffleVectorInst(X, Y, ConstantVector::get(NewMask));
		}

Instruction *InstCombiner::visitShuffleVectorInst(ShuffleVectorInst &SVI) {		Instruction *InstCombiner::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
Value *LHS = SVI.getOperand(0);		Value *LHS = SVI.getOperand(0);
Value *RHS = SVI.getOperand(1);		Value *RHS = SVI.getOperand(1);
if (auto *V = SimplifyShuffleVectorInst(		if (auto *V = SimplifyShuffleVectorInst(
LHS, RHS, SVI.getMask(), SVI.getType(), SQ.getWithInstruction(&SVI)))		LHS, RHS, SVI.getMask(), SVI.getType(), SQ.getWithInstruction(&SVI)))
return replaceInstUsesWith(SVI, V);		return replaceInstUsesWith(SVI, V);

// Canonicalize shuffle(x ,x,mask) -> shuffle(x, undef,mask')		// Canonicalize shuffle(x ,x,mask) -> shuffle(x, undef,mask')
Show All 38 Lines	if (Value *V = SimplifyDemandedVectorElts(&SVI, AllOnesEltMask, UndefElts)) {
if (V != &SVI)		if (V != &SVI)
return replaceInstUsesWith(SVI, V);		return replaceInstUsesWith(SVI, V);
return &SVI;		return &SVI;
}		}

if (Instruction *I = foldIdentityExtractShuffle(SVI))		if (Instruction *I = foldIdentityExtractShuffle(SVI))
return I;		return I;

// This transform has the potential to lose undef knowledge, so it is		// These transforms have the potential to lose undef knowledge, so they are
// intentionally placed after SimplifyDemandedVectorElts().		// intentionally placed after SimplifyDemandedVectorElts().
if (Instruction *I = foldShuffleWithInsert(SVI))		if (Instruction *I = foldShuffleWithInsert(SVI))
return I;		return I;
		if (Instruction *I = foldIdentityPaddedShuffles(SVI))
		return I;

if (VWidth == LHSWidth) {		if (VWidth == LHSWidth) {
// Analyze the shuffle, are the LHS or RHS and identity shuffles?		// Analyze the shuffle, are the LHS or RHS and identity shuffles?
bool isLHSID, isRHSID;		bool isLHSID, isRHSID;
recognizeIdentityMask(Mask, isLHSID, isRHSID);		recognizeIdentityMask(Mask, isLHSID, isRHSID);

// Eliminate identity shuffles.		// Eliminate identity shuffles.
if (isLHSID) return replaceInstUsesWith(SVI, LHS);		if (isLHSID) return replaceInstUsesWith(SVI, LHS);
▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/vec_shuffle.ll

	Show First 20 Lines • Show All 1,134 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <2 x i1> [[R]]			; CHECK-NEXT: ret <2 x i1> [[R]]
	;			;
	%widen = shufflevector <1 x i1> zeroinitializer, <1 x i1> %x, <2 x i32> <i32 0, i32 1>			%widen = shufflevector <1 x i1> zeroinitializer, <1 x i1> %x, <2 x i32> <i32 0, i32 1>
	%narrow = shufflevector <4 x i1> %y, <4 x i1> undef, <2 x i32> <i32 0, i32 1>			%narrow = shufflevector <4 x i1> %y, <4 x i1> undef, <2 x i32> <i32 0, i32 1>
	%r = and <2 x i1> %widen, %narrow			%r = and <2 x i1> %widen, %narrow
	ret <2 x i1> %r			ret <2 x i1> %r
	}			}

				; Negative test - do not transform non-power-of-2 unless we know the backend handles these sequences identically.

	define <7 x i8> @insert_subvector_shuffles(<3 x i8> %x, <3 x i8> %y) {			define <7 x i8> @insert_subvector_shuffles(<3 x i8> %x, <3 x i8> %y) {
	; CHECK-LABEL: @insert_subvector_shuffles(			; CHECK-LABEL: @insert_subvector_shuffles(
	; CHECK-NEXT: [[S1:%.]] = shufflevector <3 x i8> [[X:%.]], <3 x i8> undef, <7 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[S1:%.]] = shufflevector <3 x i8> [[X:%.]], <3 x i8> undef, <7 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[S2:%.]] = shufflevector <3 x i8> [[Y:%.]], <3 x i8> undef, <7 x i32> <i32 undef, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[S2:%.]] = shufflevector <3 x i8> [[Y:%.]], <3 x i8> undef, <7 x i32> <i32 undef, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[S3:%.*]] = shufflevector <7 x i8> [[S1]], <7 x i8> [[S2]], <7 x i32> <i32 0, i32 8, i32 1, i32 undef, i32 8, i32 1, i32 9>			; CHECK-NEXT: [[S3:%.*]] = shufflevector <7 x i8> [[S1]], <7 x i8> [[S2]], <7 x i32> <i32 0, i32 8, i32 1, i32 undef, i32 8, i32 1, i32 9>
	; CHECK-NEXT: ret <7 x i8> [[S3]]			; CHECK-NEXT: ret <7 x i8> [[S3]]
	;			;
	%s1 = shufflevector <3 x i8> %x, <3 x i8> undef, <7 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%s1 = shufflevector <3 x i8> %x, <3 x i8> undef, <7 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%s2 = shufflevector <3 x i8> %y, <3 x i8> undef, <7 x i32> <i32 undef, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>			%s2 = shufflevector <3 x i8> %y, <3 x i8> undef, <7 x i32> <i32 undef, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>
	%s3 = shufflevector <7 x i8> %s1, <7 x i8> %s2, <7 x i32> <i32 0, i32 8, i32 1, i32 undef, i32 8, i32 1, i32 9>			%s3 = shufflevector <7 x i8> %s1, <7 x i8> %s2, <7 x i32> <i32 0, i32 8, i32 1, i32 undef, i32 8, i32 1, i32 9>
	ret <7 x i8> %s3			ret <7 x i8> %s3
	}			}

	define <8 x i8> @insert_subvector_shuffles_pow2elts(<2 x i8> %x, <2 x i8> %y) {			define <8 x i8> @insert_subvector_shuffles_pow2elts(<2 x i8> %x, <2 x i8> %y) {
	; CHECK-LABEL: @insert_subvector_shuffles_pow2elts(			; CHECK-LABEL: @insert_subvector_shuffles_pow2elts(
	; CHECK-NEXT: [[S1:%.]] = shufflevector <2 x i8> [[X:%.]], <2 x i8> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[S3:%.]] = shufflevector <2 x i8> [[X:%.]], <2 x i8> [[Y:%.*]], <8 x i32> <i32 0, i32 2, i32 1, i32 undef, i32 2, i32 1, i32 3, i32 0>
	; CHECK-NEXT: [[S2:%.]] = shufflevector <2 x i8> [[Y:%.]], <2 x i8> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[S3:%.*]] = shufflevector <8 x i8> [[S1]], <8 x i8> [[S2]], <8 x i32> <i32 0, i32 8, i32 1, i32 undef, i32 8, i32 1, i32 9, i32 0>
	; CHECK-NEXT: ret <8 x i8> [[S3]]			; CHECK-NEXT: ret <8 x i8> [[S3]]
	;			;
	%s1 = shufflevector <2 x i8> %x, <2 x i8> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%s1 = shufflevector <2 x i8> %x, <2 x i8> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%s2 = shufflevector <2 x i8> %y, <2 x i8> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%s2 = shufflevector <2 x i8> %y, <2 x i8> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%s3 = shufflevector <8 x i8> %s1, <8 x i8> %s2, <8 x i32> <i32 0, i32 8, i32 1, i32 undef, i32 8, i32 1, i32 9, i32 0>			%s3 = shufflevector <8 x i8> %s1, <8 x i8> %s2, <8 x i32> <i32 0, i32 8, i32 1, i32 undef, i32 8, i32 1, i32 9, i32 0>
	ret <8 x i8> %s3			ret <8 x i8> %s3
	}			}

	; The last shuffle may change the vector type.			; The last shuffle may change the vector type.
				; Negative test - do not transform non-power-of-2 unless we know the backend handles these sequences identically.

	define <2 x i8> @insert_subvector_shuffles_narrowing(<3 x i8> %x, <3 x i8> %y) {			define <2 x i8> @insert_subvector_shuffles_narrowing(<3 x i8> %x, <3 x i8> %y) {
	; CHECK-LABEL: @insert_subvector_shuffles_narrowing(			; CHECK-LABEL: @insert_subvector_shuffles_narrowing(
	; CHECK-NEXT: [[S1:%.]] = shufflevector <3 x i8> [[X:%.]], <3 x i8> undef, <7 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[S1:%.]] = shufflevector <3 x i8> [[X:%.]], <3 x i8> undef, <7 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[S2:%.]] = shufflevector <3 x i8> [[Y:%.]], <3 x i8> undef, <7 x i32> <i32 undef, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[S2:%.]] = shufflevector <3 x i8> [[Y:%.]], <3 x i8> undef, <7 x i32> <i32 undef, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[S3:%.*]] = shufflevector <7 x i8> [[S1]], <7 x i8> [[S2]], <2 x i32> <i32 0, i32 8>			; CHECK-NEXT: [[S3:%.*]] = shufflevector <7 x i8> [[S1]], <7 x i8> [[S2]], <2 x i32> <i32 0, i32 8>
	; CHECK-NEXT: ret <2 x i8> [[S3]]			; CHECK-NEXT: ret <2 x i8> [[S3]]
	;			;
	%s1 = shufflevector <3 x i8> %x, <3 x i8> undef, <7 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%s1 = shufflevector <3 x i8> %x, <3 x i8> undef, <7 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%s2 = shufflevector <3 x i8> %y, <3 x i8> undef, <7 x i32> <i32 undef, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>			%s2 = shufflevector <3 x i8> %y, <3 x i8> undef, <7 x i32> <i32 undef, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>
	%s3 = shufflevector <7 x i8> %s1, <7 x i8> %s2, <2 x i32> <i32 0, i32 8>			%s3 = shufflevector <7 x i8> %s1, <7 x i8> %s2, <2 x i32> <i32 0, i32 8>
	ret <2 x i8> %s3			ret <2 x i8> %s3
	}			}

	define <2 x i8> @insert_subvector_shuffles_narrowing_pow2elts(<4 x i8> %x, <4 x i8> %y) {			define <2 x i8> @insert_subvector_shuffles_narrowing_pow2elts(<4 x i8> %x, <4 x i8> %y) {
	; CHECK-LABEL: @insert_subvector_shuffles_narrowing_pow2elts(			; CHECK-LABEL: @insert_subvector_shuffles_narrowing_pow2elts(
	; CHECK-NEXT: [[S1:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[S3:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> [[Y:%.*]], <2 x i32> <i32 0, i32 4>
	; CHECK-NEXT: [[S2:%.]] = shufflevector <4 x i8> [[Y:%.]], <4 x i8> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[S3:%.*]] = shufflevector <8 x i8> [[S1]], <8 x i8> [[S2]], <2 x i32> <i32 0, i32 8>
	; CHECK-NEXT: ret <2 x i8> [[S3]]			; CHECK-NEXT: ret <2 x i8> [[S3]]
	;			;
	%s1 = shufflevector <4 x i8> %x, <4 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%s1 = shufflevector <4 x i8> %x, <4 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%s2 = shufflevector <4 x i8> %y, <4 x i8> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			%s2 = shufflevector <4 x i8> %y, <4 x i8> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	%s3 = shufflevector <8 x i8> %s1, <8 x i8> %s2, <2 x i32> <i32 0, i32 8>			%s3 = shufflevector <8 x i8> %s1, <8 x i8> %s2, <2 x i32> <i32 0, i32 8>
	ret <2 x i8> %s3			ret <2 x i8> %s3
	}			}

	; Similar to above, but this reduces to a widen with undefs of 'x'.			; Similar to above, but this reduces to a widen with undefs of 'x'.

	define <4 x double> @insert_subvector_shuffles_identity(<2 x double> %x) {			define <4 x double> @insert_subvector_shuffles_identity(<2 x double> %x) {
	; CHECK-LABEL: @insert_subvector_shuffles_identity(			; CHECK-LABEL: @insert_subvector_shuffles_identity(
	; CHECK-NEXT: [[S1:%.]] = shufflevector <2 x double> [[X:%.]], <2 x double> undef, <4 x i32> <i32 undef, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[S3:%.]] = shufflevector <2 x double> [[X:%.]], <2 x double> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[S2:%.*]] = shufflevector <2 x double> [[X]], <2 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[S3:%.*]] = shufflevector <4 x double> [[S2]], <4 x double> [[S1]], <4 x i32> <i32 0, i32 5, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x double> [[S3]]			; CHECK-NEXT: ret <4 x double> [[S3]]
	;			;
	%s1 = shufflevector <2 x double> %x, <2 x double> undef, <4 x i32> <i32 undef, i32 1, i32 undef, i32 undef>			%s1 = shufflevector <2 x double> %x, <2 x double> undef, <4 x i32> <i32 undef, i32 1, i32 undef, i32 undef>
	%s2 = shufflevector <2 x double> %x, <2 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>			%s2 = shufflevector <2 x double> %x, <2 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
	%s3 = shufflevector <4 x double> %s2, <4 x double> %s1, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef>			%s3 = shufflevector <4 x double> %s2, <4 x double> %s1, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef>
	ret <4 x double> %s3			ret <4 x double> %s3
	}			}

	Show All 29 Lines