This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
12/22
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
ARM/
-
crash-on-pow2-shufflevector.ll
-
vext.ll
-
X86/
-
vector-fshl-rot-256.ll
-
vector-fshr-rot-256.ll
-
vector-rotate-256.ll
-
vector-shift-ashr-256.ll
-
vector-shift-shl-256.ll
-
vector-trunc.ll

Differential D104156

[DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask')
ClosedPublic

Authored by lebedev.ri on Jun 11 2021, 3:31 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
craig.topper

Commits

rGc1a36ba002b8: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) ->…

Summary

In most test changes this allows us to drop some broadcasts/shuffles.

I think i got the logic right (at least, i have already caught the obvious bugs i had..)

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	2,610 ms	x64 debian > libarcher.critical::critical.c
	2,780 ms	x64 debian > libarcher.races::critical-unrelated.c
	2,630 ms	x64 debian > libarcher.races::lock-nested-unrelated.c
	2,500 ms	x64 debian > libarcher.races::lock-unrelated.c
	3,150 ms	x64 debian > libarcher.races::parallel-simple.c
		View Full Test Results (16 Failed)

Event Timeline

lebedev.ri created this revision.Jun 11 2021, 3:31 PM

Herald added subscribers: ecnelises, danielkiss, pengfei and 2 others. · View Herald TranscriptJun 11 2021, 3:31 PM

lebedev.ri requested review of this revision.Jun 11 2021, 3:31 PM

Harbormaster completed remote builds in B108904: Diff 351570.Jun 11 2021, 3:48 PM

Aha, looks like this is salvageable, adding even more restrictions seems to have gotten rid of obvious regressions.

Add more context to the diff, NFC.

Harbormaster completed remote builds in B108967: Diff 351652.Jun 12 2021, 5:21 AM

RKSimon added inline comments.Jun 14 2021, 5:59 AM

llvm/test/CodeGen/X86/min-legal-vector-width.ll
1693 ↗	(On Diff #351652)	A lot of these changes just look like we're missing something in SimplifyDemandedVectorElts tbh

lebedev.ri marked an inline comment as done.Jun 14 2021, 11:25 AM

lebedev.ri added inline comments.

llvm/test/CodeGen/X86/min-legal-vector-width.ll
1693 ↗	(On Diff #351652)	Aha, D104250.

Matt added a subscriber: Matt.Jun 14 2021, 12:40 PM

Rebasing now that D104250 landed.
Some changes remain, and some of them still look like they're missing SimplifyDemandedVectorElts() folds.

Many remaining cases are rotates, with the pattern like:

Combining: t0: ch = EntryToken
Optimized vector-legalized selection DAG: %bb.0 'splatvar_funnnel_v8i32:'
SelectionDAG has 26 nodes:
  t0: ch = EntryToken
  t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %0
          t25: v2i64 = zero_extend_vector_inreg t45
        t26: v4i32 = bitcast t25
      t27: v8i32 = X86ISD::VSHL t2, t26
              t40: v4i32 = BUILD_VECTOR Constant:i32<32>, Constant:i32<32>, Constant:i32<32>, Constant:i32<32>
            t38: v4i32 = sub t40, t45
          t30: v2i64 = zero_extend_vector_inreg t38
        t31: v4i32 = bitcast t30
      t32: v8i32 = X86ISD::VSRL t2, t31
    t21: v8i32 = or t27, t32
  t10: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t21
        t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1
      t6: v8i32 = vector_shuffle<0,0,0,0,0,0,0,0> t4, undef:v8i32
    t43: v4i32 = extract_subvector t6, Constant:i64<0>
    t47: v4i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>
  t45: v4i32 = and t43, t47
  t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:v8i32 $ymm0, t10:1

Let's suppose we start at t27: v8i32 = X86ISD::VSHL t2, t26, and demand the 0'th element of shift amount t26.
I think the problem is that t45 has another use - t38,
for another shift. Likewise, if we start from the other shift.
I'm not sure if this can be solved within the existing demandedelts infra?

Harbormaster completed remote builds in B109195: Diff 351993.Jun 14 2021, 3:15 PM

In D104156#2818072, @lebedev.ri wrote:

Many remaining cases are rotates, with the pattern like:

Combining: t0: ch = EntryToken
Optimized vector-legalized selection DAG: %bb.0 'splatvar_funnnel_v8i32:'
SelectionDAG has 26 nodes:
  t0: ch = EntryToken
  t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %0
          t25: v2i64 = zero_extend_vector_inreg t45
        t26: v4i32 = bitcast t25
      t27: v8i32 = X86ISD::VSHL t2, t26
              t40: v4i32 = BUILD_VECTOR Constant:i32<32>, Constant:i32<32>, Constant:i32<32>, Constant:i32<32>
            t38: v4i32 = sub t40, t45
          t30: v2i64 = zero_extend_vector_inreg t38
        t31: v4i32 = bitcast t30
      t32: v8i32 = X86ISD::VSRL t2, t31
    t21: v8i32 = or t27, t32
  t10: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t21
        t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1
      t6: v8i32 = vector_shuffle<0,0,0,0,0,0,0,0> t4, undef:v8i32
    t43: v4i32 = extract_subvector t6, Constant:i64<0>
    t47: v4i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>
  t45: v4i32 = and t43, t47
  t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:v8i32 $ymm0, t10:1

Looks like SimplifyMultipleUseDemandedBits() could theoretically deal with it,
but then it would need to learn not only about scalar_to_vector-of-extract_vector_elt,
but also and and vector_shuffle would need to be handled to recourse throught SimplifyMultipleUseDemandedBits().

In D104156#2818223, @lebedev.ri wrote:
In D104156#2818072, @lebedev.ri wrote:
Many remaining cases are rotates, with the pattern like:
Combining: t0: ch = EntryToken
Optimized vector-legalized selection DAG: %bb.0 'splatvar_funnnel_v8i32:'
SelectionDAG has 26 nodes:
  t0: ch = EntryToken
  t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %0
          t25: v2i64 = zero_extend_vector_inreg t45
        t26: v4i32 = bitcast t25
      t27: v8i32 = X86ISD::VSHL t2, t26
              t40: v4i32 = BUILD_VECTOR Constant:i32<32>, Constant:i32<32>, Constant:i32<32>, Constant:i32<32>
            t38: v4i32 = sub t40, t45
          t30: v2i64 = zero_extend_vector_inreg t38
        t31: v4i32 = bitcast t30
      t32: v8i32 = X86ISD::VSRL t2, t31
    t21: v8i32 = or t27, t32
  t10: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t21
        t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1
      t6: v8i32 = vector_shuffle<0,0,0,0,0,0,0,0> t4, undef:v8i32
    t43: v4i32 = extract_subvector t6, Constant:i64<0>
    t47: v4i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>
  t45: v4i32 = and t43, t47
  t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:v8i32 $ymm0, t10:1
Let's suppose we start at t27: v8i32 = X86ISD::VSHL t2, t26, and demand the 0'th element of shift amount t26.
I think the problem is that t45 has another use - t38,
for another shift. Likewise, if we start from the other shift.
I'm not sure if this can be solved within the existing demandedelts infra?
Looks like SimplifyMultipleUseDemandedBits() could theoretically deal with it,
but then it would need to learn not only about scalar_to_vector-of-extract_vector_elt,
but also and and vector_shuffle would need to be handled to recourse throught SimplifyMultipleUseDemandedBits().

Hm, and perhaps the biggest issue is, how would we skip over element count changing extract_subvector?
So i'm not sure the rest can be dealt with by SimplifyMultipleUseDemandedBits().

In D104156#2818946, @lebedev.ri wrote:

So i'm not sure the rest can be dealt with by SimplifyMultipleUseDemandedBits().

What if the ZERO_EXTEND_VECTOR_INREG case is extended to just bitcast the source if we only need the 0th element and the upper source elements aliasing that 0th element is known to be zero?

In D104156#2819328, @RKSimon wrote:

In D104156#2818946, @lebedev.ri wrote:

So i'm not sure the rest can be dealt with by SimplifyMultipleUseDemandedBits().

What if the ZERO_EXTEND_VECTOR_INREG case is extended to just bitcast the source if we only need the 0th element and the upper source elements aliasing that 0th element is known to be zero?

But in that snippet they are clearly not zeros, since it's a splat vector?

@RKSimon ping. any further thoughts/hints at how this might be achievable via demandedbits/etc?

ping

Sorry for not getting back to this - I've been meaning to take a look at the x86 test changes - the fact that they are mostly from the same funnelshift/rotation lowering code suggests we've just missed something for splat values in there.

ping

I still think getting to the bottom of why we don't already fold the splatted rotation/funnel shift amount would be more useful.

RKSimon mentioned this in rGb95f66ad786b: [X86][SSE] LowerRotate - perform modulo on the amount splat source directly..Jul 25 2021, 9:41 AM

Rebased

In D104156#2902910, @RKSimon wrote:

I still think getting to the bottom of why we don't already fold the splatted rotation/funnel shift amount would be more useful.

rGb95f66ad786b8f2814d4ef4373e8ac3902e6f62a is cheating :)
I was sure the question was *not* about fixing the emitted code,
but fixing the already-emitted code.

Harbormaster completed remote builds in B116088: Diff 361526.Jul 25 2021, 1:19 PM

bump

@RKSimon do you have any concrete arguments against this?

I'm seeing this pattern (well, roughly) when looking at https://bugs.llvm.org/show_bug.cgi?id=52337:

Optimized type-legalized selection DAG: %bb.0 'mask_i32_stride8_vf4:'
SelectionDAG has 56 nodes:
  t0: ch = EntryToken
  t6: i64,ch = CopyFromReg t0, Register:i64 %2
            t294: v16i8 = vector_shuffle<0,u,0,u,0,u,0,u,0,u,0,u,0,u,0,u> t282, undef:v16i8
          t255: v8i16 = bitcast t294
        t236: v8i32 = any_extend t255
      t261: v8i32 = shl t236, t259
    t194: v8i32,ch = masked_load<(load (s256) from %ir.ptr, align 4)> t0, t6, undef:i64, t261, undef:v8i32
  t29: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t194
      t195: i64 = add t6, Constant:i64<32>
            t292: v16i8 = vector_shuffle<4,u,4,u,4,u,4,u,4,u,4,u,4,u,4,u> t282, undef:v16i8
          t257: v8i16 = bitcast t292
        t216: v8i32 = any_extend t257
      t260: v8i32 = shl t216, t259
    t196: v8i32,ch = masked_load<(load (s256) from %ir.ptr + 32, align 4)> t0, t195, undef:i64, t260, undef:v8i32
  t31: ch,glue = CopyToReg t29, Register:v8i32 $ymm1, t196, t29:1
      t62: i64 = add t6, Constant:i64<64>
          t268: v8i16 = any_extend_vector_inreg t264
        t173: v8i32 = any_extend t268
      t281: v8i32 = shl t173, t259
    t129: v8i32,ch = masked_load<(load (s256) from %ir.ptr + 64, align 4)> t0, t62, undef:i64, t281, undef:v8i32
  t33: ch,glue = CopyToReg t31, Register:v8i32 $ymm2, t129, t31:1
      t280: i64 = add t6, Constant:i64<96>
            t275: v16i8 = vector_shuffle<8,u,9,u,10,u,11,u,12,u,13,u,14,u,15,u> t264, undef:v16i8
          t272: v8i16 = bitcast t275
        t152: v8i32 = any_extend t272
      t278: v8i32 = shl t152, t259
    t132: v8i32,ch = masked_load<(load (s256) from %ir.ptr + 96, align 4)> t0, t280, undef:i64, t278, undef:v8i32
  t35: ch,glue = CopyToReg t33, Register:v8i32 $ymm3, t132, t33:1
  t259: v8i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>
      t287: v32i8 = concat_vectors t282, undef:v16i8
    t289: v32i8 = vector_shuffle<u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,8,8,8,8,8,8,8,8,12,12,12,12,12,12,12,12> t287, undef:v32i8
  t264: v16i8 = extract_subvector t289, Constant:i64<16>
        t2: i64,ch = CopyFromReg t0, Register:i64 %0
      t9: v4i32,ch = load<(load (s128) from %ir.in.vec, align 32)> t0, t2, undef:i64
      t11: v4i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
    t42: v4i32 = setcc t9, t11, setlt:ch
  t282: v16i8 = bitcast t42
  t36: ch = X86ISD::RET_FLAG t35, TargetConstant:i32<0>, Register:v8i32 $ymm0, Register:v8i32 $ymm1, Register:v8i32 $ymm2, Register:v8i32 $ymm3, t35:1

i.e.

    t287: v32i8 = concat_vectors t282, undef:v16i8
  t289: v32i8 = vector_shuffle<u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,8,8,8,8,8,8,8,8,12,12,12,12,12,12,12,12> t287, undef:v32i8
t264: v16i8 = extract_subvector t289, Constant:i64<16>`

Rebased, NFC.

Harbormaster completed remote builds in B131413: Diff 383335.Oct 29 2021, 7:31 AM

ping

where has the code change gone?

In D104156#3105527, @RKSimon wrote:

where has the code change gone?

Uhm, oops, let me upload the right diff.

Actually upload the diff with the change, not just the test changes.
Sorry for noise.

Do we have any test coverage with any subvectors that are not half the width of the source? 512-bit -> 128-bit AVX512 shuffles for instance.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20392	Merge the 2 asserts? assert((-1 <= M) && (M < (2 * WideNumElts)) && "Out-of-bounds shuffle mask?");
20423	This is actually pretty AVX specific - for instance NEON can usually reference both 64-bit vectors for free.
20426	make_pair?
20442	There are a lot of assertions in this function :)
20456	Shouldn't this be done earlier as soon as we know NarrowVT to early-out ?

Harbormaster completed remote builds in B132173: Diff 384384.Nov 3 2021, 4:17 AM

@RKSimon thank you for taking a look!
Addressed some nits.

In D104156#3105627, @RKSimon wrote:

Do we have any test coverage with any subvectors that are not half the width of the source? 512-bit -> 128-bit AVX512 shuffles for instance.

I've checked and it doesn't appear so.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20423	Right, this needs relaxation. I intend to at least special-handle the case where we are extracting from vector concatenation, then in certain cases subvec idx is also irrelevant.
20442	Generally i feel like there are a lot of invariants in LLVM codebase that aren't asserted.
20456	Well, yes and no. As it can be seen in the comment for the previous code block, we then would loose `return DAG.getUNDEF(NarrowVT);` constant-fold case. Should we not have it?

RKSimon added inline comments.Nov 3 2021, 5:26 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20456	I suppose it depends if these UNDEF paths are actually active?

lebedev.ri added inline comments.Nov 3 2021, 5:33 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

20445–20447

I suppose it depends if these UNDEF paths are actually active?

If i comment-out this block, then "Should end up with either one or two ops" assertion fires at least once:

$ ninja check-llvm-codegen
[ 99% 261/262][ 30% 0:00:21 + 0:00:47] Running lit suite /repositories/llvm-project/llvm/test/CodeGen
FAIL: LLVM :: CodeGen/X86/sad.ll (1064 of 18987)
******************** TEST 'LLVM :: CodeGen/X86/sad.ll' FAILED ********************
Script:
--
: 'RUN: at line 2';   /builddirs/llvm-project/build-Clang13/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll -mtriple=x86_64-unknown-unknown -mattr=+sse2 | /builddirs/llvm-project/build-Clang13/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll --check-prefixes=SSE2
: 'RUN: at line 3';   /builddirs/llvm-project/build-Clang13/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll -mtriple=x86_64-unknown-unknown -mattr=+avx  | /builddirs/llvm-project/build-Clang13/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll --check-prefixes=AVX,AVX1
: 'RUN: at line 4';   /builddirs/llvm-project/build-Clang13/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll -mtriple=x86_64-unknown-unknown -mattr=+avx2 | /builddirs/llvm-project/build-Clang13/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll --check-prefixes=AVX,AVX2
: 'RUN: at line 5';   /builddirs/llvm-project/build-Clang13/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll -mtriple=x86_64-unknown-unknown -mattr=+avx512f | /builddirs/llvm-project/build-Clang13/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll --check-prefixes=AVX,AVX512,AVX512F
: 'RUN: at line 6';   /builddirs/llvm-project/build-Clang13/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll -mtriple=x86_64-unknown-unknown -mattr=+avx512bw | /builddirs/llvm-project/build-Clang13/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll --check-prefixes=AVX,AVX512,AVX512BW
--
Exit Code: 1

Command Output (stderr):
--
llc: /repositories/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:20710: llvm::SDValue foldExtractSubvectorFromShuffleVector(llvm::SDNode *, llvm::SelectionDAG &, const llvm::TargetLowering &, bool): Assertion `(NewOps.size() == 1 || NewOps.size() == 2) && "Should end up with either one or two ops"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /builddirs/llvm-project/build-Clang13/bin/llc -mtriple=x86_64-unknown-unknown -mattr=+avx
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'X86 DAG->DAG Instruction Selection' on function '@sad_unroll_nonzero_initial'

So i would say it is active.

Harbormaster completed remote builds in B132185: Diff 384403.Nov 3 2021, 5:38 AM

Now that the costmodel patch spam has winded down,
rebase&ping.

Harbormaster completed remote builds in B136638: Diff 390628.Nov 30 2021, 2:51 AM

ping

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20456	https://reviews.llvm.org/D104156#inline-1078555, but is the comment that some generic handling should catch such cases regardless? I can drop it if that's the case.

RKSimon added inline comments.Dec 6 2021, 6:41 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20476	Do the isExtractSubvectorCheap tests in their own all_of() check before we start creating nodes? It helps stop oneuse failures in later folds if this fails.

Forego of the undef constant folding, rebased.

Harbormaster completed remote builds in B137654: Diff 392056.Dec 6 2021, 8:29 AM

RKSimon added inline comments.Dec 6 2021, 9:58 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20393	WideShuffleVector->getMask().slice() ?
20432	Why do we need this and isExtractSubvectorCheap ?

Relaxed the profitability check - i'm not sure this is better, and that is why it was there.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20432	Because the test changes were somewhat less controversial with that check.

Harbormaster completed remote builds in B137703: Diff 392122.Dec 6 2021, 11:00 AM

lebedev.ri added inline comments.Dec 6 2021, 1:04 PM

llvm/test/CodeGen/X86/vector-fshr-rot-128.ll
402 ↗	(On Diff #392122)	The problem is that cheap!=free, so we could be turning a cheap & free extraction (from 0'th subvec) into a cheap non-free extraction (from non-0'th subvec), or even into two of these. Or i'm wrong and this explicit extraction is actually better? Because if it's not, we'll need 'is free subvec extract' hook, i believe?

RKSimon added inline comments.Dec 6 2021, 1:15 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20432	OK, let's keep it for now

And back to restrictive profitability check.

Harbormaster completed remote builds in B137732: Diff 392163.Dec 6 2021, 2:11 PM

RKSimon added inline comments.Dec 8 2021, 8:07 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20393	slice() ?

@RKSimon sorry for so much back-and-forth.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20393	Ugh, sorry, that change got lost when i rolled back to previous revision without double-checking...

Harbormaster completed remote builds in B138229: Diff 392860.Dec 8 2021, 12:38 PM

RKSimon added inline comments.Dec 12 2021, 1:00 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20369	"The old shuffle needs to go away."
20386	SmallVector<int> NewMask;
20388	I'm torn whether you need the complexity of a set here - is the code that much worse if you manually handle 2 x SDValue/int variables?

lebedev.ri added inline comments.Dec 12 2021, 1:11 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20388	I'm not sure how this can be done without a set, we need something set-like for this, there could be from 0 to 2 subvectors. It could be a linear search over a vector, but i don't think there is a such container in ADT? (there's `SmartSmallSetVector` in `lang-tools-extra/clang-tidy/misc/NoRecursionCheck.cpp`)

LGTM - cheers @lebedev.ri !

This revision is now accepted and ready to land.Dec 13 2021, 4:32 AM

In D104156#3188695, @RKSimon wrote:

LGTM - cheers @lebedev.ri !

Hmm. Thank you for the review.
I will admit, in an alternative reality this comment was going to be "let me just abandon this".

In D104156#3189248, @lebedev.ri wrote:

In D104156#3188695, @RKSimon wrote:

LGTM - cheers @lebedev.ri !

Hmm. Thank you for the review.
I will admit, in an alternative reality this comment was going to be "let me just abandon this".

I'm mainly interested in the rotation splat fixes as this removes a headache for some ongoing work for better vector rotation/funnel shift codegen.

This revision was landed with ongoing or failed builds.Dec 13 2021, 9:06 AM

Closed by commit rGc1a36ba002b8: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) ->… (authored by lebedev.ri). · Explain Why

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rGc1a36ba002b8: [DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) ->….

lebedev.ri mentioned this in D115646: [DAG][TLI][X86][ARM][AArch64] Add `isExtractSubvectorFree` / use it in `foldExtractSubvectorFromShuffleVector()`.Dec 13 2021, 9:37 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

152 lines

test/

CodeGen/

ARM/

crash-on-pow2-shufflevector.ll

5 lines

vext.ll

10 lines

X86/

vector-fshl-rot-256.ll

14 lines

vector-fshr-rot-256.ll

19 lines

vector-rotate-256.ll

14 lines

vector-shift-ashr-256.ll

2 lines

vector-shift-shl-256.ll

2 lines

vector-trunc.ll

28 lines

Diff 361526

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 20,333 Lines • ▼ Show 20 Lines	if (Offset.isScalable()) {
MMO = MF.getMachineMemOperand(Ld->getMemOperand(), Offset.getFixedSize(),		MMO = MF.getMachineMemOperand(Ld->getMemOperand(), Offset.getFixedSize(),
StoreSize);		StoreSize);

SDValue NewLd = DAG.getLoad(VT, DL, Ld->getChain(), NewAddr, MMO);		SDValue NewLd = DAG.getLoad(VT, DL, Ld->getChain(), NewAddr, MMO);
DAG.makeEquivalentMemoryOrdering(Ld, NewLd);		DAG.makeEquivalentMemoryOrdering(Ld, NewLd);
return NewLd;		return NewLd;
}		}

		/// Given EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(Op0, Op1, Mask)),
		/// try to produce VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(Op?, ?),
		/// EXTRACT_SUBVECTOR(Op?, ?),
		/// Mask'))
		/// iff it is legal and profitable to do so. Notably, the trimmed mask
		/// (containing only the elements that are extracted)
		/// must reference at most two subvectors.
		static SDValue foldExtractSubvectorFromShuffleVector(SDNode *N,
		SelectionDAG &DAG,
		const TargetLowering &TLI,
		bool LegalOperations) {
		assert(N->getOpcode() == ISD::EXTRACT_SUBVECTOR &&
		"Must only be called on EXTRACT_SUBVECTOR's");

		SDValue N0 = N->getOperand(0);

		// Only deal with non-scalable vectors.
		EVT NarrowVT = N->getValueType(0);
		EVT WideVT = N0.getValueType();
		if (!NarrowVT.isFixedLengthVector() \|\| !WideVT.isFixedLengthVector())
		return SDValue();

		// The operand must be a shufflevector.
		auto *WideShuffleVector = dyn_cast<ShuffleVectorSDNode>(N0);
		if (!WideShuffleVector)
		return SDValue();

		uint64_t FirstExtractedEltIdx = N->getConstantOperandVal(1);
		RKSimonUnsubmitted Not Done Reply Inline Actions "The old shuffle needs to go away." RKSimon: "The old shuffle needs to go away."
		int NumEltsExtracted = NarrowVT.getVectorNumElements();
		assert((FirstExtractedEltIdx % NumEltsExtracted) == 0 &&
		"Extract index is not a multiple of the output vector length.");

		int WideNumElts = WideVT.getVectorNumElements();

		SmallVector<int, 16> NewMask;
		NewMask.reserve(NumEltsExtracted);
		SmallSetVector<std::pair<SDValue /Op/, int /SubvectorIndex/>, 2>
		DemandedSubvectors;

		// Try to decode the wide mask into narrow mask from at most two subvectors.
		for (int M : WideShuffleVector->getMask()
		.drop_front(FirstExtractedEltIdx)
		.take_front(NumEltsExtracted)) {
		if (M < 0) {
		assert(M == -1 && "Unexpected target shuffle mask?");
		RKSimonUnsubmitted Not Done Reply Inline Actions SmallVector<int> NewMask; RKSimon: SmallVector<int> NewMask;
		// Does not depend on operands, does not require adjustment.
		NewMask.emplace_back(M);
		RKSimonUnsubmitted Not Done Reply Inline Actions I'm torn whether you need the complexity of a set here - is the code that much worse if you manually handle 2 x SDValue/int variables? RKSimon: I'm torn whether you need the complexity of a set here - is the code that much worse if you…
		lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure how this can be done without a set, we need something set-like for this, there could be from 0 to 2 subvectors. It could be a linear search over a vector, but i don't think there is a such container in ADT? (there's `SmartSmallSetVector` in `lang-tools-extra/clang-tidy/misc/NoRecursionCheck.cpp`) lebedev.ri: I'm not sure how this can be done without a set, we need something set-like for this, there…
		continue;
		}

		assert(M < (2 * WideNumElts) && "Out-of-bounds shuffle mask?");
		RKSimonUnsubmitted Done Reply Inline Actions Merge the 2 asserts? assert((-1 <= M) && (M < (2 * WideNumElts)) && "Out-of-bounds shuffle mask?"); RKSimon: Merge the 2 asserts? assert((-1 <= M) && (M < (2 * WideNumElts)) && "Out-of-bounds shuffle…

		RKSimonUnsubmitted Done Reply Inline Actions WideShuffleVector->getMask().slice() ? RKSimon: WideShuffleVector->getMask().slice() ?
		RKSimonUnsubmitted Done Reply Inline Actions slice() ? RKSimon: slice() ?
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Ugh, sorry, that change got lost when i rolled back to previous revision without double-checking... lebedev.ri: Ugh, sorry, that change got lost when i rolled back to previous revision without double…
		// From which operand of the shuffle does this shuffle mask element pick?
		int WideShufOpIdx = M / WideNumElts;
		// Which element of that operand is picked?
		int OpEltIdx = M % WideNumElts;

		assert((OpEltIdx + WideShufOpIdx * WideNumElts) == M &&
		"Shuffle mask vector decomposition failure.");

		// And which NumEltsExtracted-sized subvector of that operand is that?
		int OpSubvecIdx = OpEltIdx / NumEltsExtracted;
		// And which element within that subvector of that operand is that?
		int OpEltIdxInSubvec = OpEltIdx % NumEltsExtracted;

		assert((OpEltIdxInSubvec + OpSubvecIdx * NumEltsExtracted) == OpEltIdx &&
		"Shuffle mask subvector decomposition failure.");

		assert((OpEltIdxInSubvec + OpSubvecIdx * NumEltsExtracted +
		WideShufOpIdx * WideNumElts) == M &&
		"Shuffle mask full decomposition failure.");

		SDValue Op = WideShuffleVector->getOperand(WideShufOpIdx);

		if (Op.isUndef()) {
		// Picking from an undef operand. Let's adjust mask instead.
		NewMask.emplace_back(-1);
		continue;
		}

		// Profitability check: only deal with extractions from the first subvector.
		if (OpSubvecIdx != 0)
		RKSimonUnsubmitted Not Done Reply Inline Actions This is actually pretty AVX specific - for instance NEON can usually reference both 64-bit vectors for free. RKSimon: This is actually pretty AVX specific - for instance NEON can usually reference both 64-bit…
		lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions Right, this needs relaxation. I intend to at least special-handle the case where we are extracting from vector concatenation, then in certain cases subvec idx is also irrelevant. lebedev.ri: Right, this needs relaxation. I intend to at least special-handle the case where we are…
		return SDValue();

		const std::pair<SDValue, int> DemandedSubvector(Op, OpSubvecIdx);
		RKSimonUnsubmitted Done Reply Inline Actions make_pair? RKSimon: make_pair?

		DemandedSubvectors.insert(DemandedSubvector);
		if (DemandedSubvectors.size() > 2)
		return SDValue(); // We can't handle more than two subvectors.

		// Ok, but from which operand of the new shuffle will this element pick?
		RKSimonUnsubmitted Done Reply Inline Actions Why do we need this and isExtractSubvectorCheap ? RKSimon: Why do we need this and isExtractSubvectorCheap ?
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Because the test changes were somewhat less controversial with that check. lebedev.ri: Because the test changes were somewhat less controversial with that check.
		RKSimonUnsubmitted Done Reply Inline Actions OK, let's keep it for now RKSimon: OK, let's keep it for now
		int NewOpIdx =
		getFirstIndexOf(DemandedSubvectors.getArrayRef(), DemandedSubvector);
		assert((NewOpIdx == 0 \|\| NewOpIdx == 1) && "Unexpected operand index.");

		int AdjM = OpEltIdxInSubvec + NewOpIdx * NumEltsExtracted;
		NewMask.emplace_back(AdjM);
		}
		assert(NewMask.size() == (unsigned)NumEltsExtracted && "Produced bad mask.");
		assert(DemandedSubvectors.size() <= 2 &&
		"Should have ended up demanding at most two subvectors.");
		RKSimonUnsubmitted Done Reply Inline Actions There are a lot of assertions in this function :) RKSimon: There are a lot of assertions in this function :)
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Generally i feel like there are a lot of invariants in LLVM codebase that aren't asserted. lebedev.ri: Generally i feel like there are a lot of invariants in LLVM codebase that aren't asserted.

		// Did we discover that the shuffle does not actually depend on operands?
		if (DemandedSubvectors.empty())
		return DAG.getUNDEF(NarrowVT);

		lebedev.riAuthorUnsubmitted Done Reply Inline Actions I suppose it depends if these UNDEF paths are actually active? If i comment-out this block, then `"Should end up with either one or two ops"` assertion fires at least once: $ ninja check-llvm-codegen [ 99% 261/262][ 30% 0:00:21 + 0:00:47] Running lit suite /repositories/llvm-project/llvm/test/CodeGen FAIL: LLVM :: CodeGen/X86/sad.ll (1064 of 18987) ****************** TEST 'LLVM :: CodeGen/X86/sad.ll' FAILED ****************** Script: -- : 'RUN: at line 2'; /builddirs/llvm-project/build-Clang13/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| /builddirs/llvm-project/build-Clang13/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll --check-prefixes=SSE2 : 'RUN: at line 3'; /builddirs/llvm-project/build-Clang13/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll -mtriple=x86_64-unknown-unknown -mattr=+avx \| /builddirs/llvm-project/build-Clang13/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll --check-prefixes=AVX,AVX1 : 'RUN: at line 4'; /builddirs/llvm-project/build-Clang13/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| /builddirs/llvm-project/build-Clang13/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll --check-prefixes=AVX,AVX2 : 'RUN: at line 5'; /builddirs/llvm-project/build-Clang13/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| /builddirs/llvm-project/build-Clang13/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll --check-prefixes=AVX,AVX512,AVX512F : 'RUN: at line 6'; /builddirs/llvm-project/build-Clang13/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll -mtriple=x86_64-unknown-unknown -mattr=+avx512bw \| /builddirs/llvm-project/build-Clang13/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/X86/sad.ll --check-prefixes=AVX,AVX512,AVX512BW -- Exit Code: 1 Command Output (stderr): -- llc: /repositories/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:20710: llvm::SDValue foldExtractSubvectorFromShuffleVector(llvm::SDNode , llvm::SelectionDAG &, const llvm::TargetLowering &, bool): Assertion `(NewOps.size() == 1 \|\| NewOps.size() == 2) && "Should end up with either one or two ops"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: /builddirs/llvm-project/build-Clang13/bin/llc -mtriple=x86_64-unknown-unknown -mattr=+avx 1. Running pass 'Function Pass Manager' on module '<stdin>'. 2. Running pass 'X86 DAG->DAG Instruction Selection' on function '@sad_unroll_nonzero_initial' So i would say it is active. lebedev.ri:* > I suppose it depends if these UNDEF paths are actually active? If i comment-out this block…
		// Ok, looks like we will end up forming a new shuffle after all,
		// which means that the old one needs to go away.
		if (!WideShuffleVector->hasOneUse())
		return SDValue();

		// And the narrow shufflevector that we'll form must be legal.
		if (LegalOperations &&
		!TLI.isOperationLegalOrCustom(ISD::VECTOR_SHUFFLE, NarrowVT))
		return SDValue();
		RKSimonUnsubmitted Not Done Reply Inline Actions Shouldn't this be done earlier as soon as we know NarrowVT to early-out ? RKSimon: Shouldn't this be done earlier as soon as we know NarrowVT to early-out ?
		lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions Well, yes and no. As it can be seen in the comment for the previous code block, we then would loose `return DAG.getUNDEF(NarrowVT);` constant-fold case. Should we not have it? lebedev.ri: Well, yes and no. As it can be seen in the comment for the previous code block, we then would…
		RKSimonUnsubmitted Not Done Reply Inline Actions I suppose it depends if these UNDEF paths are actually active? RKSimon: I suppose it depends if these UNDEF paths are actually active?
		lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions https://reviews.llvm.org/D104156#inline-1078555, but is the comment that some generic handling should catch such cases regardless? I can drop it if that's the case. lebedev.ri: https://reviews.llvm.org/D104156#inline-1078555, but is the comment that some generic handling…

		// We still perform the exact same EXTRACT_SUBVECTOR, just on different
		// operand[s]/index[es], so there is no point in checking for it's legality.

		// Do not turn a legal shuffle into an illegal one.
		if (TLI.isShuffleMaskLegal(WideShuffleVector->getMask(), WideVT) &&
		!TLI.isShuffleMaskLegal(NewMask, NarrowVT))
		return SDValue();

		SDLoc DL(N);

		SmallVector<SDValue, 2> NewOps;
		for (const std::pair<SDValue /Op/, int /SubvectorIndex/>
		&DemandedSubvector : DemandedSubvectors) {
		// How many elements into the WideVT does this subvector start?
		int Index = NumEltsExtracted * DemandedSubvector.second;
		// Bail out if the extraction isn't going to be cheap.
		if (!TLI.isExtractSubvectorCheap(NarrowVT, WideVT, Index))
		return SDValue();
		SDValue IndexC = DAG.getVectorIdxConstant(Index, DL);
		RKSimonUnsubmitted Done Reply Inline Actions Do the isExtractSubvectorCheap tests in their own all_of() check before we start creating nodes? It helps stop oneuse failures in later folds if this fails. RKSimon: Do the isExtractSubvectorCheap tests in their own all_of() check before we start creating nodes?
		NewOps.emplace_back(DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NarrowVT,
		DemandedSubvector.first, IndexC));
		}
		assert((NewOps.size() == 1 \|\| NewOps.size() == 2) &&
		"Should end up with either one or two ops");

		// If we ended up with only one operand, pad with an undef.
		if (NewOps.size() == 1)
		NewOps.emplace_back(DAG.getUNDEF(NarrowVT));

		return DAG.getVectorShuffle(NarrowVT, DL, NewOps[0], NewOps[1], NewMask);
		}

SDValue DAGCombiner::visitEXTRACT_SUBVECTOR(SDNode *N) {		SDValue DAGCombiner::visitEXTRACT_SUBVECTOR(SDNode *N) {
EVT NVT = N->getValueType(0);		EVT NVT = N->getValueType(0);
SDValue V = N->getOperand(0);		SDValue V = N->getOperand(0);
uint64_t ExtIdx = N->getConstantOperandVal(1);		uint64_t ExtIdx = N->getConstantOperandVal(1);

// Extract from UNDEF is UNDEF.		// Extract from UNDEF is UNDEF.
if (V.isUndef())		if (V.isUndef())
return DAG.getUNDEF(NVT);		return DAG.getUNDEF(NVT);
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	if (NVT.isFixedLengthVector() && ConcatSrcNumElts % ExtNumElts == 0) {
assert(NewExtIdx % ExtNumElts == 0 &&		assert(NewExtIdx % ExtNumElts == 0 &&
"Extract index is not a multiple of the input vector length.");		"Extract index is not a multiple of the input vector length.");
SDValue NewIndexC = DAG.getVectorIdxConstant(NewExtIdx, DL);		SDValue NewIndexC = DAG.getVectorIdxConstant(NewExtIdx, DL);
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NVT,		return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NVT,
V.getOperand(ConcatOpIdx), NewIndexC);		V.getOperand(ConcatOpIdx), NewIndexC);
}		}
}		}

		if (SDValue V =
		foldExtractSubvectorFromShuffleVector(N, DAG, TLI, LegalOperations))
		return V;

V = peekThroughBitcasts(V);		V = peekThroughBitcasts(V);

// If the input is a build vector. Try to make a smaller build vector.		// If the input is a build vector. Try to make a smaller build vector.
if (V.getOpcode() == ISD::BUILD_VECTOR) {		if (V.getOpcode() == ISD::BUILD_VECTOR) {
EVT InVT = V.getValueType();		EVT InVT = V.getValueType();
unsigned ExtractSize = NVT.getSizeInBits();		unsigned ExtractSize = NVT.getSizeInBits();
unsigned EltSize = InVT.getScalarSizeInBits();		unsigned EltSize = InVT.getScalarSizeInBits();
// Only do this if we won't split any elements.		// Only do this if we won't split any elements.
▲ Show 20 Lines • Show All 2,910 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/crash-on-pow2-shufflevector.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=armv7 \| FileCheck %s			; RUN: llc < %s -mtriple=armv7 \| FileCheck %s
	;			;
	; Ensure that don't crash given a largeish power-of-two shufflevector index.			; Ensure that don't crash given a largeish power-of-two shufflevector index.

	%struct.desc = type { i32, [7 x i32] }			%struct.desc = type { i32, [7 x i32] }

	define i32 @foo(%struct.desc* %descs, i32 %num, i32 %cw) local_unnamed_addr #0 {			define i32 @foo(%struct.desc* %descs, i32 %num, i32 %cw) local_unnamed_addr #0 {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: mov r1, #32			; CHECK-NEXT: mov r1, #32
	; CHECK-NEXT: vld1.32 {d16, d17}, [r0], r1			; CHECK-NEXT: vld1.32 {d16, d17}, [r0], r1
	; CHECK-NEXT: vld1.32 {d18, d19}, [r0]			; CHECK-NEXT: vldr d18, [r0]
	; CHECK-NEXT: vtrn.32 q8, q9			; CHECK-NEXT: vtrn.32 d16, d18
	; CHECK-NEXT: vadd.i32 d16, d16, d16			; CHECK-NEXT: vadd.i32 d16, d16, d16
	; CHECK-NEXT: vmov.32 r0, d16[1]			; CHECK-NEXT: vmov.32 r0, d16[1]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%descs.vec = bitcast %struct.desc* %descs to <16 x i32>*			%descs.vec = bitcast %struct.desc* %descs to <16 x i32>*
	%wide.vec = load <16 x i32>, <16 x i32>* %descs.vec, align 4			%wide.vec = load <16 x i32>, <16 x i32>* %descs.vec, align 4
	%strided.vec = shufflevector <16 x i32> %wide.vec, <16 x i32> undef, <2 x i32> <i32 0, i32 8>			%strided.vec = shufflevector <16 x i32> %wide.vec, <16 x i32> undef, <2 x i32> <i32 0, i32 8>
	%bin.rdx20 = add <2 x i32> %strided.vec, %strided.vec			%bin.rdx20 = add <2 x i32> %strided.vec, %strided.vec
	%0 = extractelement <2 x i32> %bin.rdx20, i32 1			%0 = extractelement <2 x i32> %bin.rdx20, i32 1
	ret i32 %0			ret i32 %0
	}			}

llvm/test/CodeGen/ARM/vext.ll

	Show First 20 Lines • Show All 212 Lines • ▼ Show 20 Lines
	; We should ignore a build_vector with more than two sources.			; We should ignore a build_vector with more than two sources.
	; Use illegal <32 x i16> type to produce such a shuffle after legalizing types.			; Use illegal <32 x i16> type to produce such a shuffle after legalizing types.
	; Try to look for fallback to by-element inserts.			; Try to look for fallback to by-element inserts.
	define <4 x i16> @test_multisource(<32 x i16>* %B) nounwind {			define <4 x i16> @test_multisource(<32 x i16>* %B) nounwind {
	; CHECK-LABEL: test_multisource:			; CHECK-LABEL: test_multisource:
	; CHECK: @ %bb.0:			; CHECK: @ %bb.0:
	; CHECK-NEXT: vldr d18, [r0, #32]			; CHECK-NEXT: vldr d18, [r0, #32]
	; CHECK-NEXT: mov r1, r0			; CHECK-NEXT: mov r1, r0
	; CHECK-NEXT: vorr d22, d18, d18			; CHECK-NEXT: vorr d21, d18, d18
	; CHECK-NEXT: vld1.16 {d16, d17}, [r1:128]!			; CHECK-NEXT: vld1.16 {d16, d17}, [r1:128]!
	; CHECK-NEXT: vldr d19, [r0, #48]			; CHECK-NEXT: vldr d19, [r0, #48]
	; CHECK-NEXT: vld1.64 {d20, d21}, [r1:128]			; CHECK-NEXT: vldr d20, [r1]
	; CHECK-NEXT: vzip.16 d22, d19			; CHECK-NEXT: vzip.16 d21, d19
	; CHECK-NEXT: vtrn.16 q8, q10			; CHECK-NEXT: vtrn.16 d16, d20
	; CHECK-NEXT: vext.16 d18, d18, d22, #2			; CHECK-NEXT: vext.16 d18, d18, d21, #2
	; CHECK-NEXT: vext.16 d16, d18, d16, #2			; CHECK-NEXT: vext.16 d16, d18, d16, #2
	; CHECK-NEXT: vext.16 d16, d16, d16, #2			; CHECK-NEXT: vext.16 d16, d16, d16, #2
	; CHECK-NEXT: vmov r0, r1, d16			; CHECK-NEXT: vmov r0, r1, d16
	; CHECK-NEXT: mov pc, lr			; CHECK-NEXT: mov pc, lr
	%tmp1 = load <32 x i16>, <32 x i16>* %B			%tmp1 = load <32 x i16>, <32 x i16>* %B
	%tmp2 = shufflevector <32 x i16> %tmp1, <32 x i16> undef, <4 x i32> <i32 0, i32 8, i32 16, i32 24>			%tmp2 = shufflevector <32 x i16> %tmp1, <32 x i16> undef, <4 x i32> <i32 0, i32 8, i32 16, i32 24>
	ret <4 x i16> %tmp2			ret <4 x i16> %tmp2
	}			}
	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshl-rot-256.ll

	Show First 20 Lines • Show All 657 Lines • ▼ Show 20 Lines
	; AVX512VLVBMI2-LABEL: splatvar_funnnel_v4i64:			; AVX512VLVBMI2-LABEL: splatvar_funnnel_v4i64:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpbroadcastq %xmm1, %ymm1			; AVX512VLVBMI2-NEXT: vpbroadcastq %xmm1, %ymm1
	; AVX512VLVBMI2-NEXT: vprolvq %ymm1, %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vprolvq %ymm1, %ymm0, %ymm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOPAVX1-LABEL: splatvar_funnnel_v4i64:			; XOPAVX1-LABEL: splatvar_funnnel_v4i64:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
	; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
				; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]
	; XOPAVX1-NEXT: vprotq %xmm1, %xmm2, %xmm2			; XOPAVX1-NEXT: vprotq %xmm1, %xmm2, %xmm2
	; XOPAVX1-NEXT: vprotq %xmm1, %xmm0, %xmm0			; XOPAVX1-NEXT: vprotq %xmm1, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX1-NEXT: retq			; XOPAVX1-NEXT: retq
	;			;
	; XOPAVX2-LABEL: splatvar_funnnel_v4i64:			; XOPAVX2-LABEL: splatvar_funnnel_v4i64:
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	; XOPAVX2-NEXT: vpbroadcastq %xmm1, %xmm1
	; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2			; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2
				; XOPAVX2-NEXT: vpbroadcastq %xmm1, %xmm1
	; XOPAVX2-NEXT: vprotq %xmm1, %xmm2, %xmm2			; XOPAVX2-NEXT: vprotq %xmm1, %xmm2, %xmm2
	; XOPAVX2-NEXT: vprotq %xmm1, %xmm0, %xmm0			; XOPAVX2-NEXT: vprotq %xmm1, %xmm0, %xmm0
	; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	%splat = shufflevector <4 x i64> %amt, <4 x i64> undef, <4 x i32> zeroinitializer			%splat = shufflevector <4 x i64> %amt, <4 x i64> undef, <4 x i32> zeroinitializer
	%res = call <4 x i64> @llvm.fshl.v4i64(<4 x i64> %x, <4 x i64> %x, <4 x i64> %splat)			%res = call <4 x i64> @llvm.fshl.v4i64(<4 x i64> %x, <4 x i64> %x, <4 x i64> %splat)
	ret <4 x i64> %res			ret <4 x i64> %res
	}			}
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; AVX512VLVBMI2-LABEL: splatvar_funnnel_v8i32:			; AVX512VLVBMI2-LABEL: splatvar_funnnel_v8i32:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpbroadcastd %xmm1, %ymm1			; AVX512VLVBMI2-NEXT: vpbroadcastd %xmm1, %ymm1
	; AVX512VLVBMI2-NEXT: vprolvd %ymm1, %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vprolvd %ymm1, %ymm0, %ymm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOPAVX1-LABEL: splatvar_funnnel_v8i32:			; XOPAVX1-LABEL: splatvar_funnnel_v8i32:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
	; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
				; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
	; XOPAVX1-NEXT: vprotd %xmm1, %xmm2, %xmm2			; XOPAVX1-NEXT: vprotd %xmm1, %xmm2, %xmm2
	; XOPAVX1-NEXT: vprotd %xmm1, %xmm0, %xmm0			; XOPAVX1-NEXT: vprotd %xmm1, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX1-NEXT: retq			; XOPAVX1-NEXT: retq
	;			;
	; XOPAVX2-LABEL: splatvar_funnnel_v8i32:			; XOPAVX2-LABEL: splatvar_funnnel_v8i32:
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	; XOPAVX2-NEXT: vpbroadcastd %xmm1, %xmm1
	; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2			; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2
				; XOPAVX2-NEXT: vpbroadcastd %xmm1, %xmm1
	; XOPAVX2-NEXT: vprotd %xmm1, %xmm2, %xmm2			; XOPAVX2-NEXT: vprotd %xmm1, %xmm2, %xmm2
	; XOPAVX2-NEXT: vprotd %xmm1, %xmm0, %xmm0			; XOPAVX2-NEXT: vprotd %xmm1, %xmm0, %xmm0
	; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	%splat = shufflevector <8 x i32> %amt, <8 x i32> undef, <8 x i32> zeroinitializer			%splat = shufflevector <8 x i32> %amt, <8 x i32> undef, <8 x i32> zeroinitializer
	%res = call <8 x i32> @llvm.fshl.v8i32(<8 x i32> %x, <8 x i32> %x, <8 x i32> %splat)			%res = call <8 x i32> @llvm.fshl.v8i32(<8 x i32> %x, <8 x i32> %x, <8 x i32> %splat)
	ret <8 x i32> %res			ret <8 x i32> %res
	}			}
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; AVX512VLVBMI2-LABEL: splatvar_funnnel_v16i16:			; AVX512VLVBMI2-LABEL: splatvar_funnnel_v16i16:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpbroadcastw %xmm1, %ymm1			; AVX512VLVBMI2-NEXT: vpbroadcastw %xmm1, %ymm1
	; AVX512VLVBMI2-NEXT: vpshldvw %ymm1, %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vpshldvw %ymm1, %ymm0, %ymm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOPAVX1-LABEL: splatvar_funnnel_v16i16:			; XOPAVX1-LABEL: splatvar_funnnel_v16i16:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
				; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
	; XOPAVX1-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,0,0,0,4,5,6,7]			; XOPAVX1-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,0,0,0,4,5,6,7]
	; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]			; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
	; XOPAVX1-NEXT: vprotw %xmm1, %xmm2, %xmm2			; XOPAVX1-NEXT: vprotw %xmm1, %xmm2, %xmm2
	; XOPAVX1-NEXT: vprotw %xmm1, %xmm0, %xmm0			; XOPAVX1-NEXT: vprotw %xmm1, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX1-NEXT: retq			; XOPAVX1-NEXT: retq
	;			;
	; XOPAVX2-LABEL: splatvar_funnnel_v16i16:			; XOPAVX2-LABEL: splatvar_funnnel_v16i16:
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	; XOPAVX2-NEXT: vpbroadcastw %xmm1, %xmm1
	; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2			; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2
				; XOPAVX2-NEXT: vpbroadcastw %xmm1, %xmm1
	; XOPAVX2-NEXT: vprotw %xmm1, %xmm2, %xmm2			; XOPAVX2-NEXT: vprotw %xmm1, %xmm2, %xmm2
	; XOPAVX2-NEXT: vprotw %xmm1, %xmm0, %xmm0			; XOPAVX2-NEXT: vprotw %xmm1, %xmm0, %xmm0
	; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	%splat = shufflevector <16 x i16> %amt, <16 x i16> undef, <16 x i32> zeroinitializer			%splat = shufflevector <16 x i16> %amt, <16 x i16> undef, <16 x i32> zeroinitializer
	%res = call <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x, <16 x i16> %x, <16 x i16> %splat)			%res = call <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x, <16 x i16> %x, <16 x i16> %splat)
	ret <16 x i16> %res			ret <16 x i16> %res
	}			}
	▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
	; XOPAVX1-NEXT: vprotb %xmm1, %xmm2, %xmm2			; XOPAVX1-NEXT: vprotb %xmm1, %xmm2, %xmm2
	; XOPAVX1-NEXT: vprotb %xmm1, %xmm0, %xmm0			; XOPAVX1-NEXT: vprotb %xmm1, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX1-NEXT: retq			; XOPAVX1-NEXT: retq
	;			;
	; XOPAVX2-LABEL: splatvar_funnnel_v32i8:			; XOPAVX2-LABEL: splatvar_funnnel_v32i8:
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	; XOPAVX2-NEXT: vpbroadcastb %xmm1, %xmm1
	; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2			; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2
				; XOPAVX2-NEXT: vpbroadcastb %xmm1, %xmm1
	; XOPAVX2-NEXT: vprotb %xmm1, %xmm2, %xmm2			; XOPAVX2-NEXT: vprotb %xmm1, %xmm2, %xmm2
	; XOPAVX2-NEXT: vprotb %xmm1, %xmm0, %xmm0			; XOPAVX2-NEXT: vprotb %xmm1, %xmm0, %xmm0
	; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	%splat = shufflevector <32 x i8> %amt, <32 x i8> undef, <32 x i32> zeroinitializer			%splat = shufflevector <32 x i8> %amt, <32 x i8> undef, <32 x i32> zeroinitializer
	%res = call <32 x i8> @llvm.fshl.v32i8(<32 x i8> %x, <32 x i8> %x, <32 x i8> %splat)			%res = call <32 x i8> @llvm.fshl.v32i8(<32 x i8> %x, <32 x i8> %x, <32 x i8> %splat)
	ret <32 x i8> %res			ret <32 x i8> %res
	}			}
	▲ Show 20 Lines • Show All 770 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshr-rot-256.ll

Show First 20 Lines • Show All 703 Lines • ▼ Show 20 Lines
; AVX512VLVBMI2-LABEL: splatvar_funnnel_v4i64:		; AVX512VLVBMI2-LABEL: splatvar_funnnel_v4i64:
; AVX512VLVBMI2: # %bb.0:		; AVX512VLVBMI2: # %bb.0:
; AVX512VLVBMI2-NEXT: vpbroadcastq %xmm1, %ymm1		; AVX512VLVBMI2-NEXT: vpbroadcastq %xmm1, %ymm1
; AVX512VLVBMI2-NEXT: vprorvq %ymm1, %ymm0, %ymm0		; AVX512VLVBMI2-NEXT: vprorvq %ymm1, %ymm0, %ymm0
; AVX512VLVBMI2-NEXT: retq		; AVX512VLVBMI2-NEXT: retq
;		;
; XOPAVX1-LABEL: splatvar_funnnel_v4i64:		; XOPAVX1-LABEL: splatvar_funnnel_v4i64:
; XOPAVX1: # %bb.0:		; XOPAVX1: # %bb.0:
; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]
; XOPAVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2		; XOPAVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2
; XOPAVX1-NEXT: vpsubq %xmm1, %xmm2, %xmm1		; XOPAVX1-NEXT: vpsubq %xmm1, %xmm2, %xmm1
		; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]
; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2		; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
; XOPAVX1-NEXT: vprotq %xmm1, %xmm2, %xmm2		; XOPAVX1-NEXT: vprotq %xmm1, %xmm2, %xmm2
; XOPAVX1-NEXT: vprotq %xmm1, %xmm0, %xmm0		; XOPAVX1-NEXT: vprotq %xmm1, %xmm0, %xmm0
; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0		; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
; XOPAVX1-NEXT: retq		; XOPAVX1-NEXT: retq
;		;
; XOPAVX2-LABEL: splatvar_funnnel_v4i64:		; XOPAVX2-LABEL: splatvar_funnnel_v4i64:
; XOPAVX2: # %bb.0:		; XOPAVX2: # %bb.0:
; XOPAVX2-NEXT: vpbroadcastq %xmm1, %xmm1		; XOPAVX2-NEXT: vpbroadcastq %xmm1, %xmm1
; XOPAVX2-NEXT: vpxor %xmm2, %xmm2, %xmm2		; XOPAVX2-NEXT: vpxor %xmm2, %xmm2, %xmm2
; XOPAVX2-NEXT: vpsubq %xmm1, %xmm2, %xmm1		; XOPAVX2-NEXT: vpsubq %xmm1, %xmm2, %xmm1
; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2		; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2
; XOPAVX2-NEXT: vprotq %xmm1, %xmm2, %xmm2		; XOPAVX2-NEXT: vprotq %xmm1, %xmm2, %xmm2
; XOPAVX2-NEXT: vprotq %xmm1, %xmm0, %xmm0		; XOPAVX2-NEXT: vprotq %xmm1, %xmm0, %xmm0
; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0		; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
; XOPAVX2-NEXT: retq		; XOPAVX2-NEXT: retq
%splat = shufflevector <4 x i64> %amt, <4 x i64> undef, <4 x i32> zeroinitializer		%splat = shufflevector <4 x i64> %amt, <4 x i64> undef, <4 x i32> zeroinitializer
%res = call <4 x i64> @llvm.fshr.v4i64(<4 x i64> %x, <4 x i64> %x, <4 x i64> %splat)		%res = call <4 x i64> @llvm.fshr.v4i64(<4 x i64> %x, <4 x i64> %x, <4 x i64> %splat)
ret <4 x i64> %res		ret <4 x i64> %res
}		}

define <8 x i32> @splatvar_funnnel_v8i32(<8 x i32> %x, <8 x i32> %amt) nounwind {		define <8 x i32> @splatvar_funnnel_v8i32(<8 x i32> %x, <8 x i32> %amt) nounwind {
; AVX1-LABEL: splatvar_funnnel_v8i32:		; AVX1-LABEL: splatvar_funnnel_v8i32:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX1-NEXT: vpsubd %xmm1, %xmm2, %xmm1		; AVX1-NEXT: vpsubd %xmm1, %xmm2, %xmm1
; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm2 = xmm1[0],zero,xmm1[1],zero		; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm2 = xmm1[0],zero,xmm1[1],zero
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3		; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3
; AVX1-NEXT: vpslld %xmm2, %xmm3, %xmm4		; AVX1-NEXT: vpslld %xmm2, %xmm3, %xmm4
; AVX1-NEXT: vmovdqa {{.*#+}} xmm5 = [32,32,32,32]		; AVX1-NEXT: vmovdqa {{.*#+}} xmm5 = [32,32,32,32]
; AVX1-NEXT: vpsubd %xmm1, %xmm5, %xmm1		; AVX1-NEXT: vpsubd %xmm1, %xmm5, %xmm1
; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero		; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero
; AVX1-NEXT: vpsrld %xmm1, %xmm3, %xmm3		; AVX1-NEXT: vpsrld %xmm1, %xmm3, %xmm3
; AVX1-NEXT: vpor %xmm3, %xmm4, %xmm3		; AVX1-NEXT: vpor %xmm3, %xmm4, %xmm3
; AVX1-NEXT: vpslld %xmm2, %xmm0, %xmm2		; AVX1-NEXT: vpslld %xmm2, %xmm0, %xmm2
; AVX1-NEXT: vpsrld %xmm1, %xmm0, %xmm0		; AVX1-NEXT: vpsrld %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpor %xmm0, %xmm2, %xmm0		; AVX1-NEXT: vpor %xmm0, %xmm2, %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: splatvar_funnnel_v8i32:		; AVX2-LABEL: splatvar_funnnel_v8i32:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vpbroadcastd %xmm1, %xmm1
; AVX2-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX2-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX2-NEXT: vpsubd %xmm1, %xmm2, %xmm1		; AVX2-NEXT: vpsubd %xmm1, %xmm2, %xmm1
; AVX2-NEXT: vpbroadcastd {{.*#+}} xmm2 = [31,31,31,31]		; AVX2-NEXT: vpbroadcastd {{.*#+}} xmm2 = [31,31,31,31]
; AVX2-NEXT: vpand %xmm2, %xmm1, %xmm1		; AVX2-NEXT: vpand %xmm2, %xmm1, %xmm1
; AVX2-NEXT: vpmovzxdq {{.*#+}} xmm2 = xmm1[0],zero,xmm1[1],zero		; AVX2-NEXT: vpmovzxdq {{.*#+}} xmm2 = xmm1[0],zero,xmm1[1],zero
; AVX2-NEXT: vpslld %xmm2, %ymm0, %ymm2		; AVX2-NEXT: vpslld %xmm2, %ymm0, %ymm2
; AVX2-NEXT: vpbroadcastd {{.*#+}} xmm3 = [32,32,32,32]		; AVX2-NEXT: vpbroadcastd {{.*#+}} xmm3 = [32,32,32,32]
; AVX2-NEXT: vpsubd %xmm1, %xmm3, %xmm1		; AVX2-NEXT: vpsubd %xmm1, %xmm3, %xmm1
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	; XOPAVX2-NEXT: retq
%splat = shufflevector <8 x i32> %amt, <8 x i32> undef, <8 x i32> zeroinitializer		%splat = shufflevector <8 x i32> %amt, <8 x i32> undef, <8 x i32> zeroinitializer
%res = call <8 x i32> @llvm.fshr.v8i32(<8 x i32> %x, <8 x i32> %x, <8 x i32> %splat)		%res = call <8 x i32> @llvm.fshr.v8i32(<8 x i32> %x, <8 x i32> %x, <8 x i32> %splat)
ret <8 x i32> %res		ret <8 x i32> %res
}		}

define <16 x i16> @splatvar_funnnel_v16i16(<16 x i16> %x, <16 x i16> %amt) nounwind {		define <16 x i16> @splatvar_funnnel_v16i16(<16 x i16> %x, <16 x i16> %amt) nounwind {
; AVX1-LABEL: splatvar_funnnel_v16i16:		; AVX1-LABEL: splatvar_funnnel_v16i16:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,0,0,0,4,5,6,7]
; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX1-NEXT: vpsubw %xmm1, %xmm2, %xmm1		; AVX1-NEXT: vpsubw %xmm1, %xmm2, %xmm1
; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3		; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3
; AVX1-NEXT: vpsllw %xmm2, %xmm3, %xmm4		; AVX1-NEXT: vpsllw %xmm2, %xmm3, %xmm4
; AVX1-NEXT: vmovdqa {{.*#+}} xmm5 = [16,16,16,16,16,16,16,16]		; AVX1-NEXT: vmovdqa {{.*#+}} xmm5 = [16,16,16,16,16,16,16,16]
; AVX1-NEXT: vpsubw %xmm1, %xmm5, %xmm1		; AVX1-NEXT: vpsubw %xmm1, %xmm5, %xmm1
; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; AVX1-NEXT: vpsrlw %xmm1, %xmm3, %xmm3		; AVX1-NEXT: vpsrlw %xmm1, %xmm3, %xmm3
; AVX1-NEXT: vpor %xmm3, %xmm4, %xmm3		; AVX1-NEXT: vpor %xmm3, %xmm4, %xmm3
; AVX1-NEXT: vpsllw %xmm2, %xmm0, %xmm2		; AVX1-NEXT: vpsllw %xmm2, %xmm0, %xmm2
; AVX1-NEXT: vpsrlw %xmm1, %xmm0, %xmm0		; AVX1-NEXT: vpsrlw %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpor %xmm0, %xmm2, %xmm0		; AVX1-NEXT: vpor %xmm0, %xmm2, %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: splatvar_funnnel_v16i16:		; AVX2-LABEL: splatvar_funnnel_v16i16:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vpbroadcastw %xmm1, %xmm1
; AVX2-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX2-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX2-NEXT: vpsubw %xmm1, %xmm2, %xmm1		; AVX2-NEXT: vpsubw %xmm1, %xmm2, %xmm1
; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX2-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX2-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; AVX2-NEXT: vpsllw %xmm2, %ymm0, %ymm2		; AVX2-NEXT: vpsllw %xmm2, %ymm0, %ymm2
; AVX2-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]		; AVX2-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]
; AVX2-NEXT: vpsubw %xmm1, %xmm3, %xmm1		; AVX2-NEXT: vpsubw %xmm1, %xmm3, %xmm1
; AVX2-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX2-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; AVX2-NEXT: vpsrlw %xmm1, %ymm0, %ymm0		; AVX2-NEXT: vpsrlw %xmm1, %ymm0, %ymm0
; AVX2-NEXT: vpor %ymm0, %ymm2, %ymm0		; AVX2-NEXT: vpor %ymm0, %ymm2, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: splatvar_funnnel_v16i16:		; AVX512F-LABEL: splatvar_funnnel_v16i16:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vpbroadcastw %xmm1, %xmm1
; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vpsubw %xmm1, %xmm2, %xmm1		; AVX512F-NEXT: vpsubw %xmm1, %xmm2, %xmm1
; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX512F-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX512F-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; AVX512F-NEXT: vpsllw %xmm2, %ymm0, %ymm2		; AVX512F-NEXT: vpsllw %xmm2, %ymm0, %ymm2
; AVX512F-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]		; AVX512F-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]
; AVX512F-NEXT: vpsubw %xmm1, %xmm3, %xmm1		; AVX512F-NEXT: vpsubw %xmm1, %xmm3, %xmm1
; AVX512F-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX512F-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; AVX512F-NEXT: vpsrlw %xmm1, %ymm0, %ymm0		; AVX512F-NEXT: vpsrlw %xmm1, %ymm0, %ymm0
; AVX512F-NEXT: vpor %ymm0, %ymm2, %ymm0		; AVX512F-NEXT: vpor %ymm0, %ymm2, %ymm0
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; AVX512VL-LABEL: splatvar_funnnel_v16i16:		; AVX512VL-LABEL: splatvar_funnnel_v16i16:
; AVX512VL: # %bb.0:		; AVX512VL: # %bb.0:
; AVX512VL-NEXT: vpbroadcastw %xmm1, %xmm1
; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512VL-NEXT: vpsubw %xmm1, %xmm2, %xmm1		; AVX512VL-NEXT: vpsubw %xmm1, %xmm2, %xmm1
; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX512VL-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX512VL-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; AVX512VL-NEXT: vpsllw %xmm2, %ymm0, %ymm2		; AVX512VL-NEXT: vpsllw %xmm2, %ymm0, %ymm2
; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]		; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]
; AVX512VL-NEXT: vpsubw %xmm1, %xmm3, %xmm1		; AVX512VL-NEXT: vpsubw %xmm1, %xmm3, %xmm1
; AVX512VL-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX512VL-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; AVX512VL-NEXT: vpsrlw %xmm1, %ymm0, %ymm0		; AVX512VL-NEXT: vpsrlw %xmm1, %ymm0, %ymm0
; AVX512VL-NEXT: vpor %ymm0, %ymm2, %ymm0		; AVX512VL-NEXT: vpor %ymm0, %ymm2, %ymm0
; AVX512VL-NEXT: retq		; AVX512VL-NEXT: retq
;		;
; AVX512BW-LABEL: splatvar_funnnel_v16i16:		; AVX512BW-LABEL: splatvar_funnnel_v16i16:
; AVX512BW: # %bb.0:		; AVX512BW: # %bb.0:
; AVX512BW-NEXT: vpbroadcastw %xmm1, %xmm1
; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512BW-NEXT: vpsubw %xmm1, %xmm2, %xmm1		; AVX512BW-NEXT: vpsubw %xmm1, %xmm2, %xmm1
; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX512BW-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX512BW-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; AVX512BW-NEXT: vpsllw %xmm2, %ymm0, %ymm2		; AVX512BW-NEXT: vpsllw %xmm2, %ymm0, %ymm2
; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]		; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]
; AVX512BW-NEXT: vpsubw %xmm1, %xmm3, %xmm1		; AVX512BW-NEXT: vpsubw %xmm1, %xmm3, %xmm1
; AVX512BW-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX512BW-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; AVX512BW-NEXT: vpsrlw %xmm1, %ymm0, %ymm0		; AVX512BW-NEXT: vpsrlw %xmm1, %ymm0, %ymm0
; AVX512BW-NEXT: vpor %ymm0, %ymm2, %ymm0		; AVX512BW-NEXT: vpor %ymm0, %ymm2, %ymm0
; AVX512BW-NEXT: retq		; AVX512BW-NEXT: retq
;		;
; AVX512VLBW-LABEL: splatvar_funnnel_v16i16:		; AVX512VLBW-LABEL: splatvar_funnnel_v16i16:
; AVX512VLBW: # %bb.0:		; AVX512VLBW: # %bb.0:
; AVX512VLBW-NEXT: vpbroadcastw %xmm1, %xmm1
; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512VLBW-NEXT: vpsubw %xmm1, %xmm2, %xmm1		; AVX512VLBW-NEXT: vpsubw %xmm1, %xmm2, %xmm1
; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX512VLBW-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX512VLBW-NEXT: vpmovzxwq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; AVX512VLBW-NEXT: vpsllw %xmm2, %ymm0, %ymm2		; AVX512VLBW-NEXT: vpsllw %xmm2, %ymm0, %ymm2
; AVX512VLBW-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]		; AVX512VLBW-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]
; AVX512VLBW-NEXT: vpsubw %xmm1, %xmm3, %xmm1		; AVX512VLBW-NEXT: vpsubw %xmm1, %xmm3, %xmm1
; AVX512VLBW-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero		; AVX512VLBW-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
Show All 12 Lines
; AVX512VLVBMI2-LABEL: splatvar_funnnel_v16i16:		; AVX512VLVBMI2-LABEL: splatvar_funnnel_v16i16:
; AVX512VLVBMI2: # %bb.0:		; AVX512VLVBMI2: # %bb.0:
; AVX512VLVBMI2-NEXT: vpbroadcastw %xmm1, %ymm1		; AVX512VLVBMI2-NEXT: vpbroadcastw %xmm1, %ymm1
; AVX512VLVBMI2-NEXT: vpshrdvw %ymm1, %ymm0, %ymm0		; AVX512VLVBMI2-NEXT: vpshrdvw %ymm1, %ymm0, %ymm0
; AVX512VLVBMI2-NEXT: retq		; AVX512VLVBMI2-NEXT: retq
;		;
; XOPAVX1-LABEL: splatvar_funnnel_v16i16:		; XOPAVX1-LABEL: splatvar_funnnel_v16i16:
; XOPAVX1: # %bb.0:		; XOPAVX1: # %bb.0:
; XOPAVX1-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,0,0,0,4,5,6,7]
; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
; XOPAVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2		; XOPAVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2
; XOPAVX1-NEXT: vpsubw %xmm1, %xmm2, %xmm1		; XOPAVX1-NEXT: vpsubw %xmm1, %xmm2, %xmm1
		; XOPAVX1-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,0,0,0,4,5,6,7]
		; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2		; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
; XOPAVX1-NEXT: vprotw %xmm1, %xmm2, %xmm2		; XOPAVX1-NEXT: vprotw %xmm1, %xmm2, %xmm2
; XOPAVX1-NEXT: vprotw %xmm1, %xmm0, %xmm0		; XOPAVX1-NEXT: vprotw %xmm1, %xmm0, %xmm0
; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0		; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
; XOPAVX1-NEXT: retq		; XOPAVX1-NEXT: retq
;		;
; XOPAVX2-LABEL: splatvar_funnnel_v16i16:		; XOPAVX2-LABEL: splatvar_funnnel_v16i16:
; XOPAVX2: # %bb.0:		; XOPAVX2: # %bb.0:
Show All 9 Lines	; XOPAVX2-NEXT: retq
%res = call <16 x i16> @llvm.fshr.v16i16(<16 x i16> %x, <16 x i16> %x, <16 x i16> %splat)		%res = call <16 x i16> @llvm.fshr.v16i16(<16 x i16> %x, <16 x i16> %x, <16 x i16> %splat)
ret <16 x i16> %res		ret <16 x i16> %res
}		}

define <32 x i8> @splatvar_funnnel_v32i8(<32 x i8> %x, <32 x i8> %amt) nounwind {		define <32 x i8> @splatvar_funnnel_v32i8(<32 x i8> %x, <32 x i8> %amt) nounwind {
; AVX1-LABEL: splatvar_funnnel_v32i8:		; AVX1-LABEL: splatvar_funnnel_v32i8:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1
; AVX1-NEXT: vpsubb %xmm1, %xmm2, %xmm1		; AVX1-NEXT: vpsubb %xmm1, %xmm2, %xmm1
; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm3 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero		; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm3 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm4		; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm4
; AVX1-NEXT: vpsllw %xmm3, %xmm4, %xmm5		; AVX1-NEXT: vpsllw %xmm3, %xmm4, %xmm5
; AVX1-NEXT: vpcmpeqd %xmm6, %xmm6, %xmm6		; AVX1-NEXT: vpcmpeqd %xmm6, %xmm6, %xmm6
; AVX1-NEXT: vpsllw %xmm3, %xmm6, %xmm7		; AVX1-NEXT: vpsllw %xmm3, %xmm6, %xmm7
; AVX1-NEXT: vpshufb %xmm2, %xmm7, %xmm2		; AVX1-NEXT: vpshufb %xmm2, %xmm7, %xmm2
Show All 11 Lines
; AVX1-NEXT: vpsrlw %xmm1, %xmm0, %xmm0		; AVX1-NEXT: vpsrlw %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpand %xmm6, %xmm0, %xmm0		; AVX1-NEXT: vpand %xmm6, %xmm0, %xmm0
; AVX1-NEXT: vpor %xmm0, %xmm2, %xmm0		; AVX1-NEXT: vpor %xmm0, %xmm2, %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm4, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm4, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: splatvar_funnnel_v32i8:		; AVX2-LABEL: splatvar_funnnel_v32i8:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vpbroadcastb %xmm1, %xmm1
; AVX2-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX2-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX2-NEXT: vpsubb %xmm1, %xmm2, %xmm1		; AVX2-NEXT: vpsubb %xmm1, %xmm2, %xmm1
; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX2-NEXT: vpmovzxbq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero		; AVX2-NEXT: vpmovzxbq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
; AVX2-NEXT: vpsllw %xmm2, %ymm0, %ymm3		; AVX2-NEXT: vpsllw %xmm2, %ymm0, %ymm3
; AVX2-NEXT: vpcmpeqd %xmm4, %xmm4, %xmm4		; AVX2-NEXT: vpcmpeqd %xmm4, %xmm4, %xmm4
; AVX2-NEXT: vpsllw %xmm2, %xmm4, %xmm2		; AVX2-NEXT: vpsllw %xmm2, %xmm4, %xmm2
; AVX2-NEXT: vpbroadcastb %xmm2, %ymm2		; AVX2-NEXT: vpbroadcastb %xmm2, %ymm2
; AVX2-NEXT: vpand %ymm2, %ymm3, %ymm2		; AVX2-NEXT: vpand %ymm2, %ymm3, %ymm2
; AVX2-NEXT: vmovdqa {{.*#+}} xmm3 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]		; AVX2-NEXT: vmovdqa {{.*#+}} xmm3 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]
; AVX2-NEXT: vpsubb %xmm1, %xmm3, %xmm1		; AVX2-NEXT: vpsubb %xmm1, %xmm3, %xmm1
; AVX2-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero		; AVX2-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
; AVX2-NEXT: vpsrlw %xmm1, %ymm0, %ymm0		; AVX2-NEXT: vpsrlw %xmm1, %ymm0, %ymm0
; AVX2-NEXT: vpsrlw %xmm1, %xmm4, %xmm1		; AVX2-NEXT: vpsrlw %xmm1, %xmm4, %xmm1
; AVX2-NEXT: vpsrlw $8, %xmm1, %xmm1		; AVX2-NEXT: vpsrlw $8, %xmm1, %xmm1
; AVX2-NEXT: vpbroadcastb %xmm1, %ymm1		; AVX2-NEXT: vpbroadcastb %xmm1, %ymm1
; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0		; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0
; AVX2-NEXT: vpor %ymm0, %ymm2, %ymm0		; AVX2-NEXT: vpor %ymm0, %ymm2, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: splatvar_funnnel_v32i8:		; AVX512F-LABEL: splatvar_funnnel_v32i8:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vpbroadcastb %xmm1, %xmm1
; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vpsubb %xmm1, %xmm2, %xmm1		; AVX512F-NEXT: vpsubb %xmm1, %xmm2, %xmm1
; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX512F-NEXT: vpmovzxbq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero		; AVX512F-NEXT: vpmovzxbq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
; AVX512F-NEXT: vpsllw %xmm2, %ymm0, %ymm3		; AVX512F-NEXT: vpsllw %xmm2, %ymm0, %ymm3
; AVX512F-NEXT: vpcmpeqd %xmm4, %xmm4, %xmm4		; AVX512F-NEXT: vpcmpeqd %xmm4, %xmm4, %xmm4
; AVX512F-NEXT: vpsllw %xmm2, %xmm4, %xmm2		; AVX512F-NEXT: vpsllw %xmm2, %xmm4, %xmm2
; AVX512F-NEXT: vpbroadcastb %xmm2, %ymm2		; AVX512F-NEXT: vpbroadcastb %xmm2, %ymm2
; AVX512F-NEXT: vpand %ymm2, %ymm3, %ymm2		; AVX512F-NEXT: vpand %ymm2, %ymm3, %ymm2
; AVX512F-NEXT: vmovdqa {{.*#+}} xmm3 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]		; AVX512F-NEXT: vmovdqa {{.*#+}} xmm3 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]
; AVX512F-NEXT: vpsubb %xmm1, %xmm3, %xmm1		; AVX512F-NEXT: vpsubb %xmm1, %xmm3, %xmm1
; AVX512F-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero		; AVX512F-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
; AVX512F-NEXT: vpsrlw %xmm1, %ymm0, %ymm0		; AVX512F-NEXT: vpsrlw %xmm1, %ymm0, %ymm0
; AVX512F-NEXT: vpsrlw %xmm1, %xmm4, %xmm1		; AVX512F-NEXT: vpsrlw %xmm1, %xmm4, %xmm1
; AVX512F-NEXT: vpsrlw $8, %xmm1, %xmm1		; AVX512F-NEXT: vpsrlw $8, %xmm1, %xmm1
; AVX512F-NEXT: vpbroadcastb %xmm1, %ymm1		; AVX512F-NEXT: vpbroadcastb %xmm1, %ymm1
; AVX512F-NEXT: vpand %ymm1, %ymm0, %ymm0		; AVX512F-NEXT: vpand %ymm1, %ymm0, %ymm0
; AVX512F-NEXT: vpor %ymm0, %ymm2, %ymm0		; AVX512F-NEXT: vpor %ymm0, %ymm2, %ymm0
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; AVX512VL-LABEL: splatvar_funnnel_v32i8:		; AVX512VL-LABEL: splatvar_funnnel_v32i8:
; AVX512VL: # %bb.0:		; AVX512VL: # %bb.0:
; AVX512VL-NEXT: vpbroadcastb %xmm1, %xmm1
; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2		; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512VL-NEXT: vpsubb %xmm1, %xmm2, %xmm1		; AVX512VL-NEXT: vpsubb %xmm1, %xmm2, %xmm1
; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX512VL-NEXT: vpmovzxbq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero		; AVX512VL-NEXT: vpmovzxbq {{.*#+}} xmm2 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
; AVX512VL-NEXT: vpsllw %xmm2, %ymm0, %ymm3		; AVX512VL-NEXT: vpsllw %xmm2, %ymm0, %ymm3
; AVX512VL-NEXT: vpcmpeqd %xmm4, %xmm4, %xmm4		; AVX512VL-NEXT: vpcmpeqd %xmm4, %xmm4, %xmm4
; AVX512VL-NEXT: vpsllw %xmm2, %xmm4, %xmm2		; AVX512VL-NEXT: vpsllw %xmm2, %xmm4, %xmm2
; AVX512VL-NEXT: vpbroadcastb %xmm2, %ymm2		; AVX512VL-NEXT: vpbroadcastb %xmm2, %ymm2
▲ Show 20 Lines • Show All 870 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-rotate-256.ll

	Show First 20 Lines • Show All 632 Lines • ▼ Show 20 Lines
	; AVX512VLVBMI2-LABEL: splatvar_rotate_v4i64:			; AVX512VLVBMI2-LABEL: splatvar_rotate_v4i64:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpbroadcastq %xmm1, %ymm1			; AVX512VLVBMI2-NEXT: vpbroadcastq %xmm1, %ymm1
	; AVX512VLVBMI2-NEXT: vprolvq %ymm1, %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vprolvq %ymm1, %ymm0, %ymm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOPAVX1-LABEL: splatvar_rotate_v4i64:			; XOPAVX1-LABEL: splatvar_rotate_v4i64:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
	; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
				; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]
	; XOPAVX1-NEXT: vprotq %xmm1, %xmm2, %xmm2			; XOPAVX1-NEXT: vprotq %xmm1, %xmm2, %xmm2
	; XOPAVX1-NEXT: vprotq %xmm1, %xmm0, %xmm0			; XOPAVX1-NEXT: vprotq %xmm1, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX1-NEXT: retq			; XOPAVX1-NEXT: retq
	;			;
	; XOPAVX2-LABEL: splatvar_rotate_v4i64:			; XOPAVX2-LABEL: splatvar_rotate_v4i64:
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	; XOPAVX2-NEXT: vpbroadcastq %xmm1, %xmm1
	; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2			; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2
				; XOPAVX2-NEXT: vpbroadcastq %xmm1, %xmm1
	; XOPAVX2-NEXT: vprotq %xmm1, %xmm2, %xmm2			; XOPAVX2-NEXT: vprotq %xmm1, %xmm2, %xmm2
	; XOPAVX2-NEXT: vprotq %xmm1, %xmm0, %xmm0			; XOPAVX2-NEXT: vprotq %xmm1, %xmm0, %xmm0
	; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	%splat = shufflevector <4 x i64> %b, <4 x i64> undef, <4 x i32> zeroinitializer			%splat = shufflevector <4 x i64> %b, <4 x i64> undef, <4 x i32> zeroinitializer
	%splat64 = sub <4 x i64> <i64 64, i64 64, i64 64, i64 64>, %splat			%splat64 = sub <4 x i64> <i64 64, i64 64, i64 64, i64 64>, %splat
	%shl = shl <4 x i64> %a, %splat			%shl = shl <4 x i64> %a, %splat
	%lshr = lshr <4 x i64> %a, %splat64			%lshr = lshr <4 x i64> %a, %splat64
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; AVX512VLVBMI2-LABEL: splatvar_rotate_v8i32:			; AVX512VLVBMI2-LABEL: splatvar_rotate_v8i32:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpbroadcastd %xmm1, %ymm1			; AVX512VLVBMI2-NEXT: vpbroadcastd %xmm1, %ymm1
	; AVX512VLVBMI2-NEXT: vprolvd %ymm1, %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vprolvd %ymm1, %ymm0, %ymm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOPAVX1-LABEL: splatvar_rotate_v8i32:			; XOPAVX1-LABEL: splatvar_rotate_v8i32:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
	; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
				; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
	; XOPAVX1-NEXT: vprotd %xmm1, %xmm2, %xmm2			; XOPAVX1-NEXT: vprotd %xmm1, %xmm2, %xmm2
	; XOPAVX1-NEXT: vprotd %xmm1, %xmm0, %xmm0			; XOPAVX1-NEXT: vprotd %xmm1, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX1-NEXT: retq			; XOPAVX1-NEXT: retq
	;			;
	; XOPAVX2-LABEL: splatvar_rotate_v8i32:			; XOPAVX2-LABEL: splatvar_rotate_v8i32:
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	; XOPAVX2-NEXT: vpbroadcastd %xmm1, %xmm1
	; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2			; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2
				; XOPAVX2-NEXT: vpbroadcastd %xmm1, %xmm1
	; XOPAVX2-NEXT: vprotd %xmm1, %xmm2, %xmm2			; XOPAVX2-NEXT: vprotd %xmm1, %xmm2, %xmm2
	; XOPAVX2-NEXT: vprotd %xmm1, %xmm0, %xmm0			; XOPAVX2-NEXT: vprotd %xmm1, %xmm0, %xmm0
	; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	%splat = shufflevector <8 x i32> %b, <8 x i32> undef, <8 x i32> zeroinitializer			%splat = shufflevector <8 x i32> %b, <8 x i32> undef, <8 x i32> zeroinitializer
	%splat32 = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, %splat			%splat32 = sub <8 x i32> <i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>, %splat
	%shl = shl <8 x i32> %a, %splat			%shl = shl <8 x i32> %a, %splat
	%lshr = lshr <8 x i32> %a, %splat32			%lshr = lshr <8 x i32> %a, %splat32
	▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; AVX512VLVBMI2-LABEL: splatvar_rotate_v16i16:			; AVX512VLVBMI2-LABEL: splatvar_rotate_v16i16:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpbroadcastw %xmm1, %ymm1			; AVX512VLVBMI2-NEXT: vpbroadcastw %xmm1, %ymm1
	; AVX512VLVBMI2-NEXT: vpshldvw %ymm1, %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vpshldvw %ymm1, %ymm0, %ymm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOPAVX1-LABEL: splatvar_rotate_v16i16:			; XOPAVX1-LABEL: splatvar_rotate_v16i16:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
				; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
	; XOPAVX1-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,0,0,0,4,5,6,7]			; XOPAVX1-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,0,0,0,4,5,6,7]
	; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]			; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
	; XOPAVX1-NEXT: vprotw %xmm1, %xmm2, %xmm2			; XOPAVX1-NEXT: vprotw %xmm1, %xmm2, %xmm2
	; XOPAVX1-NEXT: vprotw %xmm1, %xmm0, %xmm0			; XOPAVX1-NEXT: vprotw %xmm1, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX1-NEXT: retq			; XOPAVX1-NEXT: retq
	;			;
	; XOPAVX2-LABEL: splatvar_rotate_v16i16:			; XOPAVX2-LABEL: splatvar_rotate_v16i16:
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	; XOPAVX2-NEXT: vpbroadcastw %xmm1, %xmm1
	; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2			; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2
				; XOPAVX2-NEXT: vpbroadcastw %xmm1, %xmm1
	; XOPAVX2-NEXT: vprotw %xmm1, %xmm2, %xmm2			; XOPAVX2-NEXT: vprotw %xmm1, %xmm2, %xmm2
	; XOPAVX2-NEXT: vprotw %xmm1, %xmm0, %xmm0			; XOPAVX2-NEXT: vprotw %xmm1, %xmm0, %xmm0
	; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	%splat = shufflevector <16 x i16> %b, <16 x i16> undef, <16 x i32> zeroinitializer			%splat = shufflevector <16 x i16> %b, <16 x i16> undef, <16 x i32> zeroinitializer
	%splat16 = sub <16 x i16> <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>, %splat			%splat16 = sub <16 x i16> <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>, %splat
	%shl = shl <16 x i16> %a, %splat			%shl = shl <16 x i16> %a, %splat
	%lshr = lshr <16 x i16> %a, %splat16			%lshr = lshr <16 x i16> %a, %splat16
	▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
	; XOPAVX1-NEXT: vprotb %xmm1, %xmm2, %xmm2			; XOPAVX1-NEXT: vprotb %xmm1, %xmm2, %xmm2
	; XOPAVX1-NEXT: vprotb %xmm1, %xmm0, %xmm0			; XOPAVX1-NEXT: vprotb %xmm1, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX1-NEXT: retq			; XOPAVX1-NEXT: retq
	;			;
	; XOPAVX2-LABEL: splatvar_rotate_v32i8:			; XOPAVX2-LABEL: splatvar_rotate_v32i8:
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	; XOPAVX2-NEXT: vpbroadcastb %xmm1, %xmm1
	; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2			; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2
				; XOPAVX2-NEXT: vpbroadcastb %xmm1, %xmm1
	; XOPAVX2-NEXT: vprotb %xmm1, %xmm2, %xmm2			; XOPAVX2-NEXT: vprotb %xmm1, %xmm2, %xmm2
	; XOPAVX2-NEXT: vprotb %xmm1, %xmm0, %xmm0			; XOPAVX2-NEXT: vprotb %xmm1, %xmm0, %xmm0
	; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	%splat = shufflevector <32 x i8> %b, <32 x i8> undef, <32 x i32> zeroinitializer			%splat = shufflevector <32 x i8> %b, <32 x i8> undef, <32 x i32> zeroinitializer
	%splat8 = sub <32 x i8> <i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8>, %splat			%splat8 = sub <32 x i8> <i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8>, %splat
	%shl = shl <32 x i8> %a, %splat			%shl = shl <32 x i8> %a, %splat
	%lshr = lshr <32 x i8> %a, %splat8			%lshr = lshr <32 x i8> %a, %splat8
	▲ Show 20 Lines • Show All 1,155 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shift-ashr-256.ll

	Show First 20 Lines • Show All 671 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpsrlq %xmm1, %ymm2, %ymm2			; AVX2-NEXT: vpsrlq %xmm1, %ymm2, %ymm2
	; AVX2-NEXT: vpsrlq %xmm1, %ymm0, %ymm0			; AVX2-NEXT: vpsrlq %xmm1, %ymm0, %ymm0
	; AVX2-NEXT: vpxor %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpxor %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpsubq %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpsubq %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; XOPAVX1-LABEL: splatvar_shift_v4i64:			; XOPAVX1-LABEL: splatvar_shift_v4i64:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
	; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]
	; XOPAVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2			; XOPAVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; XOPAVX1-NEXT: vpsubq %xmm1, %xmm2, %xmm1			; XOPAVX1-NEXT: vpsubq %xmm1, %xmm2, %xmm1
				; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
	; XOPAVX1-NEXT: vpshaq %xmm1, %xmm2, %xmm2			; XOPAVX1-NEXT: vpshaq %xmm1, %xmm2, %xmm2
	; XOPAVX1-NEXT: vpshaq %xmm1, %xmm0, %xmm0			; XOPAVX1-NEXT: vpshaq %xmm1, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX1-NEXT: retq			; XOPAVX1-NEXT: retq
	;			;
	; XOPAVX2-LABEL: splatvar_shift_v4i64:			; XOPAVX2-LABEL: splatvar_shift_v4i64:
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	▲ Show 20 Lines • Show All 1,045 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shift-shl-256.ll

	Show First 20 Lines • Show All 663 Lines • ▼ Show 20 Lines
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
	; XOPAVX1-NEXT: vpshlb %xmm1, %xmm2, %xmm2			; XOPAVX1-NEXT: vpshlb %xmm1, %xmm2, %xmm2
	; XOPAVX1-NEXT: vpshlb %xmm1, %xmm0, %xmm0			; XOPAVX1-NEXT: vpshlb %xmm1, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX1-NEXT: retq			; XOPAVX1-NEXT: retq
	;			;
	; XOPAVX2-LABEL: splatvar_shift_v32i8:			; XOPAVX2-LABEL: splatvar_shift_v32i8:
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	; XOPAVX2-NEXT: vpbroadcastb %xmm1, %xmm1
	; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2			; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm2
				; XOPAVX2-NEXT: vpbroadcastb %xmm1, %xmm1
	; XOPAVX2-NEXT: vpshlb %xmm1, %xmm2, %xmm2			; XOPAVX2-NEXT: vpshlb %xmm1, %xmm2, %xmm2
	; XOPAVX2-NEXT: vpshlb %xmm1, %xmm0, %xmm0			; XOPAVX2-NEXT: vpshlb %xmm1, %xmm0, %xmm0
	; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; XOPAVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	;			;
	; AVX512DQ-LABEL: splatvar_shift_v32i8:			; AVX512DQ-LABEL: splatvar_shift_v32i8:
	; AVX512DQ: # %bb.0:			; AVX512DQ: # %bb.0:
	; AVX512DQ-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero			; AVX512DQ-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
	▲ Show 20 Lines • Show All 673 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-trunc.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=SSE --check-prefix=SSE2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=SSE --check-prefix=SSE2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+ssse3 \| FileCheck %s --check-prefix=SSE --check-prefix=SSSE3			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+ssse3 \| FileCheck %s --check-prefix=SSE --check-prefix=SSSE3
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=SSE --check-prefix=SSE41			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=SSE --check-prefix=SSE41
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=AVX --check-prefix=AVX1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=AVX --check-prefix=AVX1
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX,AVX2,AVX2-SLOW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX,AVX2,AVX2-SLOW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+fast-variable-crosslane-shuffle,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX,AVX2,AVX2-FAST,AVX2-FAST-ALL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+fast-variable-crosslane-shuffle,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX,AVX2,AVX2-FAST-ALL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX,AVX2,AVX2-FAST,AVX2-FAST-PERLANE			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX,AVX2,AVX2-FAST-PERLANE
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefix=AVX512 --check-prefix=AVX512F			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefix=AVX512 --check-prefix=AVX512F
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+fast-variable-crosslane-shuffle,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512VL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+fast-variable-crosslane-shuffle,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512VL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512VL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512VL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+fast-variable-crosslane-shuffle,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+fast-variable-crosslane-shuffle,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+fast-variable-crosslane-shuffle,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BWVL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+fast-variable-crosslane-shuffle,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BWVL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BWVL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BWVL

	▲ Show 20 Lines • Show All 1,882 Lines • ▼ Show 20 Lines

	define <8 x i16> @PR32160(<8 x i32> %x) {			define <8 x i16> @PR32160(<8 x i32> %x) {
	; SSE-LABEL: PR32160:			; SSE-LABEL: PR32160:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,4,4,4]			; SSE-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,4,4,4]
	; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,2,2,2]			; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,2,2,2]
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: PR32160:			; AVX-LABEL: PR32160:
	; AVX1: # %bb.0:			; AVX: # %bb.0:
	; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[8,9,8,9,8,9,8,9,8,9,8,9,8,9,8,9]			; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[8,9,8,9,8,9,8,9,8,9,8,9,8,9,8,9]
	; AVX1-NEXT: vzeroupper			; AVX-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX-NEXT: retq
	;
	; AVX2-SLOW-LABEL: PR32160:
	; AVX2-SLOW: # %bb.0:
	; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,2,2,2]
	; AVX2-SLOW-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[2,2,2,2,4,5,6,7]
	; AVX2-SLOW-NEXT: vpbroadcastd %xmm0, %xmm0
	; AVX2-SLOW-NEXT: vzeroupper
	; AVX2-SLOW-NEXT: retq
	;
	; AVX2-FAST-LABEL: PR32160:
	; AVX2-FAST: # %bb.0:
	; AVX2-FAST-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[8,9,8,9,8,9,8,9,8,9,8,9,8,9,8,9]
	; AVX2-FAST-NEXT: vzeroupper
	; AVX2-FAST-NEXT: retq
	;			;
	; AVX512F-LABEL: PR32160:			; AVX512F-LABEL: PR32160:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0			; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
	; AVX512F-NEXT: vpmovdw %zmm0, %ymm0			; AVX512F-NEXT: vpmovdw %zmm0, %ymm0
	; AVX512F-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[2,2,2,2,4,5,6,7]			; AVX512F-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[2,2,2,2,4,5,6,7]
	; AVX512F-NEXT: vpbroadcastd %xmm0, %xmm0			; AVX512F-NEXT: vpbroadcastd %xmm0, %xmm0
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	▲ Show 20 Lines • Show All 266 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask')ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 361526

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/ARM/crash-on-pow2-shufflevector.ll

llvm/test/CodeGen/ARM/vext.ll

llvm/test/CodeGen/X86/vector-fshl-rot-256.ll

llvm/test/CodeGen/X86/vector-fshr-rot-256.ll

llvm/test/CodeGen/X86/vector-rotate-256.ll

llvm/test/CodeGen/X86/vector-shift-ashr-256.ll

llvm/test/CodeGen/X86/vector-shift-shl-256.ll

llvm/test/CodeGen/X86/vector-trunc.ll

[DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask')
ClosedPublic