This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
3/7
DAGCombiner.cpp
-
Target/
-
AArch64/
1
AArch64ISelLowering.cpp
-
X86/
-
X86ISelLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
arm64-neon-copy.ll
-
X86/
-
2012-04-26-sdglue.ll
-
avx-intrinsics-x86-upgrade.ll
-
avx-vperm2x128.ll
-
pr34592.ll
1
vector-shuffle-512-v16.ll

Differential D107068

[DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns.
ClosedPublic

Authored by RKSimon on Jul 29 2021, 6:31 AM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
pengfei
dmgreen
t.p.northover
fhahn
lebedev.ri

Commits

rG2cbf9fd402af: [DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns

Summary

IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.

This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.

This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.

Fixes PR50053

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RKSimon created this revision.Jul 29 2021, 6:31 AM

Herald added subscribers: ecnelises, hiraditya, kristof.beyls. · View Herald TranscriptJul 29 2021, 6:31 AM

RKSimon requested review of this revision.Jul 29 2021, 6:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2021, 6:31 AM

Harbormaster completed remote builds in B116962: Diff 362754.Jul 29 2021, 7:27 AM

craig.topper added inline comments.Aug 1 2021, 10:48 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21343	What is the last index in the span is -1, we won't reach this.
llvm/lib/Target/AArch64/AArch64InstrInfo.td
7907 ↗	(On Diff #362754)	Why does AArch64 have a mix of index types? Shouldn't everything be using getVectorIdxTy?

Vector element mov is generally the same cost as zip1 on aarch64; zip1 is preferable because we can specify the destination register.

https://github.com/llvm/llvm-project/blob/56e7b6c3924d7ba8db70c38235a77ed8208795eb/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L10552 is supposed to ensure that only INSERT_SUBVECTOR with index 0 is legal, I think? Maybe it broke somehow.

In D107068#2918791, @efriedma wrote:

Vector element mov is generally the same cost as zip1 on aarch64; zip1 is preferable because we can specify the destination register.

https://github.com/llvm/llvm-project/blob/56e7b6c3924d7ba8db70c38235a77ed8208795eb/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L10552 is supposed to ensure that only INSERT_SUBVECTOR with index 0 is legal, I think? Maybe it broke somehow.

That only seems to be setup for scalable vectors - would it be better if I added a aarch64 combine that folds these to a CONCAT_VECTORS pattern?

RKSimon added inline comments.Aug 1 2021, 12:44 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21343	Nice catch - I'll fix this tomorrow

Match subvectors with trailing undef shuffle mask elements

Any thoughts on what we should do for the missing aarch64 insert_subvector isel?

RKSimon marked an inline comment as done.Aug 2 2021, 8:02 AM

Harbormaster completed remote builds in B117453: Diff 363479.Aug 2 2021, 8:37 AM

craig.topper added inline comments.Aug 2 2021, 9:14 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21342	Why can't this be after the loop and all of the breaks replaced with return SDValue()?

RKSimon added inline comments.Aug 2 2021, 10:32 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21342	Because we have an outer loop at the top trying all possible positions for SubIdx

craig.topper added inline comments.Aug 2 2021, 11:03 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21342	Oh, right forgot we were in a nested loop.

In D107068#2918827, @RKSimon wrote:

In D107068#2918791, @efriedma wrote:

Vector element mov is generally the same cost as zip1 on aarch64; zip1 is preferable because we can specify the destination register.

https://github.com/llvm/llvm-project/blob/56e7b6c3924d7ba8db70c38235a77ed8208795eb/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L10552 is supposed to ensure that only INSERT_SUBVECTOR with index 0 is legal, I think? Maybe it broke somehow.

That only seems to be setup for scalable vectors - would it be better if I added a aarch64 combine that folds these to a CONCAT_VECTORS pattern?

Oh, right.

I guess saying NEON INSERT_SUBVECTOR is legal is fine. We definitely do want to optimize to CONCAT_VECTORS where applicable, though.

In D107068#2920603, @efriedma wrote:

In D107068#2918827, @RKSimon wrote:

In D107068#2918791, @efriedma wrote:

Vector element mov is generally the same cost as zip1 on aarch64; zip1 is preferable because we can specify the destination register.

https://github.com/llvm/llvm-project/blob/56e7b6c3924d7ba8db70c38235a77ed8208795eb/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L10552 is supposed to ensure that only INSERT_SUBVECTOR with index 0 is legal, I think? Maybe it broke somehow.

That only seems to be setup for scalable vectors - would it be better if I added a aarch64 combine that folds these to a CONCAT_VECTORS pattern?

Oh, right.

I guess saying NEON INSERT_SUBVECTOR is legal is fine. We definitely do want to optimize to CONCAT_VECTORS where applicable, though.

OK, I'll drop the isel patterns and add a aarch64 combine to CONCAT_VECTORS instead.

Replaced aarch64 isel patterns with a insert_subvector -> concat_vectors combine

Harbormaster completed remote builds in B117612: Diff 363706.Aug 3 2021, 6:11 AM

What about insertion of multiple subvectors?

This looks complicated. Can't we do this simpler?
If we want to keep the current general logic,
can't you just generate a shuffle mask that you'd need
when inserting n'th subvector of RHS into identity LHS,
and compare that the mask matches the actual mask? (modulo undefs)

Otherwise, can't you just go through the mask, record the elements that are identity,
then go through the elements that aren't identity, see that they are sequential,
that there are only enough of them for a single subvector insertion,
and decode which subvector is being inserted?

In D107068#2922214, @lebedev.ri wrote:

What about insertion of multiple subvectors?

Do you have an example? visitINSERT_SUBVECTOR already does some canonicalizations and can fold to CONCAT_VECTORS

This looks complicated. Can't we do this simpler?
If we want to keep the current general logic, can't you just generate a shuffle mask that you'd need when inserting n'th subvector of RHS into identity LHS, and compare that the mask matches the actual mask? (modulo undefs)

I'll look at this - but an earlier prototype tried something similar and it ended up being rather messy.

use candidate shuffle mask creation + comparison - this is a lot easier now that I've moved the getNode() inside the nested for-loops.

Harbormaster completed remote builds in B117654: Diff 363755.Aug 3 2021, 9:15 AM

In D107068#2922316, @RKSimon wrote:

In D107068#2922214, @lebedev.ri wrote:

What about insertion of multiple subvectors?

Do you have an example? visitINSERT_SUBVECTOR already does some canonicalizations and can fold to CONCAT_VECTORS

I mean something along the lines of

// e.g. v2i32 into v8i32:
// shuffle(lhs,concat(rhs0,rhs1,rhs2,rhs3),0,1,2,3,10,11,12,13).
// --> insert_subvector(insert_subvector(lhs,rhs1,4),rhs2,6).

and

// e.g. v2i32 into v8i32:
// shuffle(lhs,concat(rhs0,rhs1,rhs2,rhs3),8,9,2,3,4,5,12,13).
// --> insert_subvector(insert_subvector(lhs,rhs0,0),rhs2,6).

I haven't checked what happens currently, i only saw that the patch only handles a single insertion, iff the shuffle goes away.

This looks complicated. Can't we do this simpler?
If we want to keep the current general logic, can't you just generate a shuffle mask that you'd need when inserting n'th subvector of RHS into identity LHS, and compare that the mask matches the actual mask? (modulo undefs)

I'll look at this - but an earlier prototype tried something similar and it ended up being rather messy.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21308–21310	+ `Undef's are okay.` I believe we can't have `-2` (zero) mask elements yet here?

I like this version much more! :)
I do wonder if the other could be less computationally-intensive, but is is good too.

RKSimon added inline comments.Aug 3 2021, 9:46 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21308–21310	Yeah - SelectionDAG::getVectorShuffle asserts: "M < (NElts * 2) && M >= -1;" I'll add a comment making to clear undefs are ok

Update unary shuffle match comment to mention undef elts

(i'm waiting on a reply regarding multiple subvector insertions before providing any further feedback)

RKSimon mentioned this in rG14b71efd979c: [X86][AVX] Add some multiple/nested subvector insertion shuffle tests.Aug 3 2021, 10:34 AM

In D107068#2922817, @lebedev.ri wrote:

(i'm waiting on a reply regarding multiple subvector insertions before providing any further feedback)

Looking at this now - rG14b71efd979ce3dacf6b3d9913df8e4f063224c5

Harbormaster completed remote builds in B117676: Diff 363785.Aug 3 2021, 10:37 AM

rebase with nested concat_vectors tests

RKSimon added inline comments.Aug 3 2021, 12:56 PM

llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
580	we still miss this test case - we need to add some form of concat(concat(x,y),concat(z,w)) -> concat(x,y,z,w) fold.

Harbormaster completed remote builds in B117720: Diff 363843.Aug 3 2021, 1:22 PM

Do you want to deal with that first, or afterwards?

In D107068#2923931, @lebedev.ri wrote:

Do you want to deal with that first, or afterwards?

The concat(concat,concat) fold is orthogonal to this patch, it just helps expose more opportunities for it, so I can work on it in parallel.

I think this patch is mainly waiting on feedback from aarch64 gurus - @efriedma @dmgreen @t.p.northover any comments?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13648	'subector' typo - I'll fix

fix typo

Harbormaster completed remote builds in B117859: Diff 364031.Aug 4 2021, 4:06 AM

Ignoring AArch64 change (@efriedma @dmgreen @t.p.northover ?), this looks fine to me.
I might experiment with an partial/iterative approach, but later.
Thanks.

This revision is now accepted and ready to land.Aug 4 2021, 3:50 PM

I was trying out some SVE fixed length lowering, which appears to be fine as we don't mark INSERT_SUBREG as legal/custom at the moment. Seems OK from what I can tell.

This revision was landed with ongoing or failed builds.Aug 5 2021, 7:41 AM

Closed by commit rG2cbf9fd402af: [DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns (authored by RKSimon). · Explain Why

This revision was automatically updated to reflect the committed changes.

RKSimon added a commit: rG2cbf9fd402af: [DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns.

lebedev.ri mentioned this in D107572: [DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise chain of INSERT_SUBVECTOR patterns.Aug 5 2021, 8:00 AM

RKSimon mentioned this in D107597: [DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b).Aug 5 2021, 1:12 PM

I have tracked down an assertion failure in a Halide pipeline that targets armeabi-v7a to this change. I don't think it is Halide per-se, but rather codegen that happens to be exposed by this pipeline. I'm working on extracting it. My sense is that this target needs work similar to the others, but to get started, here is the assertion:

LLVM ERROR: Cannot select: 0x47547d4f7820: v4f32 = insert_subvector 0x47547d8b43a8, 0x47547d8b6000, Constant:i32<0>
  0x47547d8b43a8: v4f32 = fmul nnan ninf nsz contract afn reassoc 0x47547d8b6068, 0x47547d8b6d00
    0x47547d8b6068: v4f32 = ARMISD::VDUP 0x47547d8b6478
      0x47547d8b6478: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8ba958, 0x47547d8b64e0
        0x47547d8ba958: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba270, 0x47547d8ba208
          0x47547d8ba270: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8b4000
            0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
              0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278
                  0x47547da92bc8: i32 = Register %278
                0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199
                  0x47547d8ba340: i32 = Register %199
              0x47547d7e0f08: i32 = undef
            0x47547d8b4000: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8b68f0, 0x47547d8b6888
              0x47547d8b68f0: f32,ch = load<(load (s32) from %ir.lsr.iv902906, !tbaa !48)> 0x47547fd271e8, 0x47547d4d8888, undef:i32
                0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278
                  0x47547da92bc8: i32 = Register %278
                0x47547d7e0f08: i32 = undef
              0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                0x47547d8b4068: f32 = Register %1015
          0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
            0x47547d8ba9c0: f32 = Register %1016
        0x47547d8b64e0: f32 = sint_to_fp 0x47547d8b6f70
          0x47547d8b6f70: i32 = fp_to_sint 0x47547d4e53a8
            0x47547d4e53a8: f32 = bitcast 0x47547d4d86e8
              0x47547d4d86e8: i32,ch,glue = CopyFromReg 0x47547d7d6bc8, Register:i32 $r0, 0x47547d7d6bc8:1
                0x47547d8bae38: i32 = Register $r0
                0x47547d7d6bc8: ch,glue = callseq_end 0x47547d4e5750, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d4e5750:1
                  0x47547d8bc820: i32 = TargetConstant<0>
                  0x47547d8b8ea0: i32 = TargetConstant<-1>
                  0x47547d4e5750: ch,glue = ARMISD::CALL 0x47547d7d60d0, TargetExternalSymbol:i32'floorf', Register:i32 $r0, RegisterMask:Untyped, 0x47547d7d60d0:1




    0x47547d8b6d00: v4f32 = ARMISD::BUILD_VECTOR undef:f32, undef:f32, 0x47547d8bad68, 0x47547d8ba618
      0x47547d8bca90: f32 = undef
      0x47547d8bca90: f32 = undef
      0x47547d8bad68: f32 = fsub nnan ninf nsz contract afn reassoc ConstantFP:f32<1.000000e+00>, 0x47547d8ba618
        0x47547d8bac98: f32 = ConstantFP<1.000000e+00>
        0x47547d8ba618: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8b6750, 0x47547d8bcea0
          0x47547d8b6750: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba888, 0x47547d8ba208
            0x47547d8ba888: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8ba138
              0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
                0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199

                0x47547d7e0f08: i32 = undef
              0x47547d8ba138: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8ba8f0, 0x47547d8b6888
                0x47547d8ba8f0: f32,ch = load<(load (s32) from %ir.scevgep904905, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba1a0, undef:i32
                  0x47547d8ba1a0: i32 = add 0x47547d4d8888, 0x47547d8b6f08


                  0x47547d7e0f08: i32 = undef
                0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                  0x47547d8b4068: f32 = Register %1015
            0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
              0x47547d8ba9c0: f32 = Register %1016
          0x47547d8bcea0: f32 = sint_to_fp 0x47547d8b66e8
            0x47547d8b66e8: i32 = fp_to_sint 0x47547d6d02d8
              0x47547d6d02d8: f32 = bitcast 0x47547d8b8f70
                0x47547d8b8f70: i32,ch,glue = CopyFromReg 0x47547d8b8f08, Register:i32 $r0, 0x47547d8b8f08:1
                  0x47547d8bae38: i32 = Register $r0
                  0x47547d8b8f08: ch,glue = callseq_end 0x47547d8b6138, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d8b6138:1



      0x47547d8ba618: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8b6750, 0x47547d8bcea0
        0x47547d8b6750: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba888, 0x47547d8ba208
          0x47547d8ba888: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8ba138
            0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
              0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278
                  0x47547da92bc8: i32 = Register %278
                0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199
                  0x47547d8ba340: i32 = Register %199
              0x47547d7e0f08: i32 = undef
            0x47547d8ba138: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8ba8f0, 0x47547d8b6888
              0x47547d8ba8f0: f32,ch = load<(load (s32) from %ir.scevgep904905, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba1a0, undef:i32
                0x47547d8ba1a0: i32 = add 0x47547d4d8888, 0x47547d8b6f08
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d8b6f08: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %198

                0x47547d7e0f08: i32 = undef
              0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                0x47547d8b4068: f32 = Register %1015
          0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
            0x47547d8ba9c0: f32 = Register %1016
        0x47547d8bcea0: f32 = sint_to_fp 0x47547d8b66e8
          0x47547d8b66e8: i32 = fp_to_sint 0x47547d6d02d8
            0x47547d6d02d8: f32 = bitcast 0x47547d8b8f70
              0x47547d8b8f70: i32,ch,glue = CopyFromReg 0x47547d8b8f08, Register:i32 $r0, 0x47547d8b8f08:1
                0x47547d8bae38: i32 = Register $r0
                0x47547d8b8f08: ch,glue = callseq_end 0x47547d8b6138, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d8b6138:1
                  0x47547d8bc820: i32 = TargetConstant<0>
                  0x47547d8b8ea0: i32 = TargetConstant<-1>
                  0x47547d8b6138: ch,glue = ARMISD::CALL 0x47547d8bcd00, TargetExternalSymbol:i32'floorf', Register:i32 $r0, RegisterMask:Untyped, 0x47547d8bcd00:1




  0x47547d8b6000: v2f32 = fmul nnan ninf nsz contract afn reassoc 0x47547d4e3b60, 0x47547d7e02d8
    0x47547d4e3b60: v2f32 = ARMISD::VDUP 0x47547d8bcaf8
      0x47547d8bcaf8: f32 = fsub nnan ninf nsz contract afn reassoc ConstantFP:f32<1.000000e+00>, 0x47547d8b6478
        0x47547d8bac98: f32 = ConstantFP<1.000000e+00>
        0x47547d8b6478: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8ba958, 0x47547d8b64e0
          0x47547d8ba958: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba270, 0x47547d8ba208
            0x47547d8ba270: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8b4000
              0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
                0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199

                0x47547d7e0f08: i32 = undef
              0x47547d8b4000: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8b68f0, 0x47547d8b6888
                0x47547d8b68f0: f32,ch = load<(load (s32) from %ir.lsr.iv902906, !tbaa !48)> 0x47547fd271e8, 0x47547d4d8888, undef:i32
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d7e0f08: i32 = undef
                0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                  0x47547d8b4068: f32 = Register %1015
            0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
              0x47547d8ba9c0: f32 = Register %1016
          0x47547d8b64e0: f32 = sint_to_fp 0x47547d8b6f70
            0x47547d8b6f70: i32 = fp_to_sint 0x47547d4e53a8
              0x47547d4e53a8: f32 = bitcast 0x47547d4d86e8
                0x47547d4d86e8: i32,ch,glue = CopyFromReg 0x47547d7d6bc8, Register:i32 $r0, 0x47547d7d6bc8:1
                  0x47547d8bae38: i32 = Register $r0
                  0x47547d7d6bc8: ch,glue = callseq_end 0x47547d4e5750, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d4e5750:1



    0x47547d7e02d8: v2f32 = ARMISD::BUILD_VECTOR 0x47547d8bad68, 0x47547d8ba618
      0x47547d8bad68: f32 = fsub nnan ninf nsz contract afn reassoc ConstantFP:f32<1.000000e+00>, 0x47547d8ba618
        0x47547d8bac98: f32 = ConstantFP<1.000000e+00>
        0x47547d8ba618: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8b6750, 0x47547d8bcea0
          0x47547d8b6750: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba888, 0x47547d8ba208
            0x47547d8ba888: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8ba138
              0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
                0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199

                0x47547d7e0f08: i32 = undef
              0x47547d8ba138: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8ba8f0, 0x47547d8b6888
                0x47547d8ba8f0: f32,ch = load<(load (s32) from %ir.scevgep904905, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba1a0, undef:i32
                  0x47547d8ba1a0: i32 = add 0x47547d4d8888, 0x47547d8b6f08


                  0x47547d7e0f08: i32 = undef
                0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                  0x47547d8b4068: f32 = Register %1015
            0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
              0x47547d8ba9c0: f32 = Register %1016
          0x47547d8bcea0: f32 = sint_to_fp 0x47547d8b66e8
            0x47547d8b66e8: i32 = fp_to_sint 0x47547d6d02d8
              0x47547d6d02d8: f32 = bitcast 0x47547d8b8f70
                0x47547d8b8f70: i32,ch,glue = CopyFromReg 0x47547d8b8f08, Register:i32 $r0, 0x47547d8b8f08:1
                  0x47547d8bae38: i32 = Register $r0
                  0x47547d8b8f08: ch,glue = callseq_end 0x47547d8b6138, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d8b6138:1



      0x47547d8ba618: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8b6750, 0x47547d8bcea0
        0x47547d8b6750: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba888, 0x47547d8ba208
          0x47547d8ba888: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8ba138
            0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
              0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278
                  0x47547da92bc8: i32 = Register %278
                0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199
                  0x47547d8ba340: i32 = Register %199
              0x47547d7e0f08: i32 = undef
            0x47547d8ba138: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8ba8f0, 0x47547d8b6888
              0x47547d8ba8f0: f32,ch = load<(load (s32) from %ir.scevgep904905, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba1a0, undef:i32
                0x47547d8ba1a0: i32 = add 0x47547d4d8888, 0x47547d8b6f08
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d8b6f08: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %198

                0x47547d7e0f08: i32 = undef
              0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                0x47547d8b4068: f32 = Register %1015
          0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
            0x47547d8ba9c0: f32 = Register %1016
        0x47547d8bcea0: f32 = sint_to_fp 0x47547d8b66e8
          0x47547d8b66e8: i32 = fp_to_sint 0x47547d6d02d8
            0x47547d6d02d8: f32 = bitcast 0x47547d8b8f70
              0x47547d8b8f70: i32,ch,glue = CopyFromReg 0x47547d8b8f08, Register:i32 $r0, 0x47547d8b8f08:1
                0x47547d8bae38: i32 = Register $r0
                0x47547d8b8f08: ch,glue = callseq_end 0x47547d8b6138, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d8b6138:1
                  0x47547d8bc820: i32 = TargetConstant<0>
                  0x47547d8b8ea0: i32 = TargetConstant<-1>
                  0x47547d8b6138: ch,glue = ARMISD::CALL 0x47547d8bcd00, TargetExternalSymbol:i32'floorf', Register:i32 $r0, RegisterMask:Untyped, 0x47547d8bcd00:1

Do you have a .ll file we can try as a reproducer?

It looks like ARM is missing the equivalent combine to concat_vectors as AArch64 (or maybe it'll be easier to add isel patterns for neon - I'll have to look), But a test case would be useful please!

Test case attached. Crashes with

$ llc < bugpoint-reduced-simplified.ll -mtriple=armv7a-linux-gnu
LLVM ERROR: Cannot select: t73: v4i32 = insert_subvector t71, t69,
Constant:i32<0>

t71: v4i32 = ARMISD::VMOVIMM TargetConstant:i32<0>
  t28: i32 = TargetConstant<0>
t69: v2i32 = ARMISD::VMOVIMM TargetConstant:i32<0>
  t28: i32 = TargetConstant<0>
t3: i32 = Constant<0>

- {F18392564, layout=link}

Cheers - looking at this now

RKSimon mentioned this in rGdbce6a8d9d7c: [ARM] Fold insert_subvector to concat_vectors.Aug 6 2021, 3:22 AM

@saugustine rGdbce6a8d9d7c78e6aee989db8a9f29ad8b7cf0a7 should solve the regression

RKSimon mentioned this in rGd6fe8d37c68d: [DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) ->….Aug 16 2021, 8:07 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

64 lines

Target/

AArch64/

AArch64ISelLowering.cpp

45 lines

X86/

X86ISelLowering.cpp

23 lines

test/

CodeGen/

AArch64/

arm64-neon-copy.ll

2 lines

X86/

2012-04-26-sdglue.ll

2 lines

avx-intrinsics-x86-upgrade.ll

2 lines

avx-vperm2x128.ll

6 lines

pr34592.ll

28 lines

vector-shuffle-512-v16.ll

11 lines

Diff 364466

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 21,293 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::CONCAT_VECTORS && N1.isUndef() &&
if (TLI.isShuffleMaskLegal(NewMask, VT)) {		if (TLI.isShuffleMaskLegal(NewMask, VT)) {
SDValue UndefVec = DAG.getUNDEF(N0.getOperand(0).getValueType());		SDValue UndefVec = DAG.getUNDEF(N0.getOperand(0).getValueType());
SDValue NewCat = DAG.getNode(ISD::CONCAT_VECTORS, SDLoc(N), VT,		SDValue NewCat = DAG.getNode(ISD::CONCAT_VECTORS, SDLoc(N), VT,
N0.getOperand(0), UndefVec);		N0.getOperand(0), UndefVec);
return DAG.getVectorShuffle(VT, SDLoc(N), NewCat, N1, NewMask);		return DAG.getVectorShuffle(VT, SDLoc(N), NewCat, N1, NewMask);
}		}
}		}

		// See if we can replace a shuffle with an insert_subvector.
		// e.g. v2i32 into v8i32:
		// shuffle(lhs,concat(rhs0,rhs1,rhs2,rhs3),0,1,2,3,10,11,6,7).
		// --> insert_subvector(lhs,rhs1,4).
		if (Level < AfterLegalizeVectorOps && TLI.isTypeLegal(VT) &&
		TLI.isOperationLegalOrCustom(ISD::INSERT_SUBVECTOR, VT)) {
		auto ShuffleToInsert = [&](SDValue LHS, SDValue RHS, ArrayRef<int> Mask) {
		// Ensure RHS subvectors are legal.
		assert(RHS.getOpcode() == ISD::CONCAT_VECTORS && "Can't find subvectors");
		lebedev.riUnsubmitted Not Done Reply Inline Actions + `Undef's are okay.` I believe we can't have `-2` (zero) mask elements yet here? lebedev.ri: +` Undef's are okay.` I believe we can't have `-2` (zero) mask elements yet here?
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Yeah - SelectionDAG::getVectorShuffle asserts: "M < (NElts * 2) && M >= -1;" I'll add a comment making to clear undefs are ok RKSimon: Yeah - SelectionDAG::getVectorShuffle asserts: "M < (NElts * 2) && M >= -1;" I'll add a…
		EVT SubVT = RHS.getOperand(0).getValueType();
		int NumSubVecs = RHS.getNumOperands();
		int NumSubElts = SubVT.getVectorNumElements();
		assert((NumElts % NumSubElts) == 0 && "Subvector mismatch");
		if (!TLI.isTypeLegal(SubVT))
		return SDValue();

		// Don't bother if we have an unary shuffle (matches undef + LHS elts).
		if (all_of(Mask, [NumElts](int M) { return M < (int)NumElts; }))
		return SDValue();

		// Search [NumSubElts] spans for RHS sequence.
		// TODO: Can we avoid nested loops to increase performance?
		SmallVector<int> InsertionMask(NumElts);
		for (int SubVec = 0; SubVec != NumSubVecs; ++SubVec) {
		for (int SubIdx = 0; SubIdx != (int)NumElts; SubIdx += NumSubElts) {
		// Reset mask to identity.
		std::iota(InsertionMask.begin(), InsertionMask.end(), 0);

		// Add subvector insertion.
		std::iota(InsertionMask.begin() + SubIdx,
		InsertionMask.begin() + SubIdx + NumSubElts,
		NumElts + (SubVec * NumSubElts));

		// See if the shuffle mask matches the reference insertion mask.
		bool MatchingShuffle = true;
		for (int i = 0; i != (int)NumElts; ++i) {
		int ExpectIdx = InsertionMask[i];
		int ActualIdx = Mask[i];
		if (0 <= ActualIdx && ExpectIdx != ActualIdx) {
		MatchingShuffle = false;
		break;
		craig.topperUnsubmitted Not Done Reply Inline Actions Why can't this be after the loop and all of the breaks replaced with return SDValue()? craig.topper: Why can't this be after the loop and all of the breaks replaced with return SDValue()?
		RKSimonAuthorUnsubmitted Done Reply Inline Actions Because we have an outer loop at the top trying all possible positions for SubIdx RKSimon: Because we have an outer loop at the top trying all possible positions for SubIdx
		craig.topperUnsubmitted Not Done Reply Inline Actions Oh, right forgot we were in a nested loop. craig.topper: Oh, right forgot we were in a nested loop.
		}
		craig.topperUnsubmitted Done Reply Inline Actions What is the last index in the span is -1, we won't reach this. craig.topper: What is the last index in the span is -1, we won't reach this.
		RKSimonAuthorUnsubmitted Done Reply Inline Actions Nice catch - I'll fix this tomorrow RKSimon: Nice catch - I'll fix this tomorrow
		}

		if (MatchingShuffle)
		return DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), VT, LHS,
		RHS.getOperand(SubVec),
		DAG.getVectorIdxConstant(SubIdx, SDLoc(N)));
		}
		}
		return SDValue();
		};
		ArrayRef<int> Mask = SVN->getMask();
		if (N1.getOpcode() == ISD::CONCAT_VECTORS)
		if (SDValue InsertN1 = ShuffleToInsert(N0, N1, Mask))
		return InsertN1;
		if (N0.getOpcode() == ISD::CONCAT_VECTORS) {
		SmallVector<int> CommuteMask(Mask.begin(), Mask.end());
		ShuffleVectorSDNode::commuteMask(CommuteMask);
		if (SDValue InsertN0 = ShuffleToInsert(N1, N0, CommuteMask))
		return InsertN0;
		}
		}

// Attempt to combine a shuffle of 2 inputs of 'scalar sources' -		// Attempt to combine a shuffle of 2 inputs of 'scalar sources' -
// BUILD_VECTOR or SCALAR_TO_VECTOR into a single BUILD_VECTOR.		// BUILD_VECTOR or SCALAR_TO_VECTOR into a single BUILD_VECTOR.
if (Level < AfterLegalizeDAG && TLI.isTypeLegal(VT))		if (Level < AfterLegalizeDAG && TLI.isTypeLegal(VT))
if (SDValue Res = combineShuffleOfScalars(SVN, DAG, TLI))		if (SDValue Res = combineShuffleOfScalars(SVN, DAG, TLI))
return Res;		return Res;

// If this shuffle only has a single input that is a bitcasted shuffle,		// If this shuffle only has a single input that is a bitcasted shuffle,
// attempt to merge the 2 shuffles and suitably bitcast the inputs/output		// attempt to merge the 2 shuffles and suitably bitcast the inputs/output
▲ Show 20 Lines • Show All 2,133 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 899 Lines • ▼ Show 20 Lines	#undef LCALLNAME5

setTargetDAGCombine(ISD::ANY_EXTEND);		setTargetDAGCombine(ISD::ANY_EXTEND);
setTargetDAGCombine(ISD::ZERO_EXTEND);		setTargetDAGCombine(ISD::ZERO_EXTEND);
setTargetDAGCombine(ISD::SIGN_EXTEND);		setTargetDAGCombine(ISD::SIGN_EXTEND);
setTargetDAGCombine(ISD::VECTOR_SPLICE);		setTargetDAGCombine(ISD::VECTOR_SPLICE);
setTargetDAGCombine(ISD::SIGN_EXTEND_INREG);		setTargetDAGCombine(ISD::SIGN_EXTEND_INREG);
setTargetDAGCombine(ISD::TRUNCATE);		setTargetDAGCombine(ISD::TRUNCATE);
setTargetDAGCombine(ISD::CONCAT_VECTORS);		setTargetDAGCombine(ISD::CONCAT_VECTORS);
		setTargetDAGCombine(ISD::INSERT_SUBVECTOR);
setTargetDAGCombine(ISD::STORE);		setTargetDAGCombine(ISD::STORE);
if (Subtarget->supportsAddressTopByteIgnored())		if (Subtarget->supportsAddressTopByteIgnored())
setTargetDAGCombine(ISD::LOAD);		setTargetDAGCombine(ISD::LOAD);

setTargetDAGCombine(ISD::MUL);		setTargetDAGCombine(ISD::MUL);

setTargetDAGCombine(ISD::SELECT);		setTargetDAGCombine(ISD::SELECT);
setTargetDAGCombine(ISD::VSELECT);		setTargetDAGCombine(ISD::VSELECT);
▲ Show 20 Lines • Show All 12,696 Lines • ▼ Show 20 Lines	static SDValue performConcatVectorsCombine(SDNode *N,
MVT ConcatTy = MVT::getVectorVT(RHSTy.getVectorElementType(),		MVT ConcatTy = MVT::getVectorVT(RHSTy.getVectorElementType(),
RHSTy.getVectorNumElements() * 2);		RHSTy.getVectorNumElements() * 2);
return DAG.getNode(ISD::BITCAST, dl, VT,		return DAG.getNode(ISD::BITCAST, dl, VT,
DAG.getNode(ISD::CONCAT_VECTORS, dl, ConcatTy,		DAG.getNode(ISD::CONCAT_VECTORS, dl, ConcatTy,
DAG.getNode(ISD::BITCAST, dl, RHSTy, N0),		DAG.getNode(ISD::BITCAST, dl, RHSTy, N0),
RHS));		RHS));
}		}

		static SDValue
		performInsertSubvectorCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
		SelectionDAG &DAG) {
		SDValue Vec = N->getOperand(0);
		SDValue SubVec = N->getOperand(1);
		uint64_t IdxVal = N->getConstantOperandVal(2);
		EVT VecVT = Vec.getValueType();
		EVT SubVT = SubVec.getValueType();

		// Only do this for legal fixed vector types.
		if (!VecVT.isFixedLengthVector() \|\|
		!DAG.getTargetLoweringInfo().isTypeLegal(VecVT) \|\|
		!DAG.getTargetLoweringInfo().isTypeLegal(SubVT))
		return SDValue();

		// Ignore widening patterns.
		if (IdxVal == 0 && Vec.isUndef())
		return SDValue();

		// Subvector must be half the width and an "aligned" insertion.
		unsigned NumSubElts = SubVT.getVectorNumElements();
		if ((SubVT.getSizeInBits() * 2) != VecVT.getSizeInBits() \|\|
		(IdxVal != 0 && IdxVal != NumSubElts))
		return SDValue();

		// Fold insert_subvector -> concat_vectors
		// insert_subvector(Vec,Sub,lo) -> concat_vectors(Sub,extract(Vec,hi))
		// insert_subvector(Vec,Sub,hi) -> concat_vectors(extract(Vec,lo),Sub)
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions 'subector' typo - I'll fix RKSimon: 'subector' typo - I'll fix
		SDLoc DL(N);
		SDValue Lo, Hi;
		if (IdxVal == 0) {
		Lo = SubVec;
		Hi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVT, Vec,
		DAG.getVectorIdxConstant(NumSubElts, DL));
		} else {
		Lo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVT, Vec,
		DAG.getVectorIdxConstant(0, DL));
		Hi = SubVec;
		}
		return DAG.getNode(ISD::CONCAT_VECTORS, DL, VecVT, Lo, Hi);
		}

static SDValue tryCombineFixedPointConvert(SDNode *N,		static SDValue tryCombineFixedPointConvert(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
// Wait until after everything is legalized to try this. That way we have		// Wait until after everything is legalized to try this. That way we have
// legal vector types and such.		// legal vector types and such.
if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();
// Transform a scalar conversion of a value from a lane extract into a		// Transform a scalar conversion of a value from a lane extract into a
▲ Show 20 Lines • Show All 3,040 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
return performExtendCombine(N, DCI, DAG);		return performExtendCombine(N, DCI, DAG);
case ISD::SIGN_EXTEND_INREG:		case ISD::SIGN_EXTEND_INREG:
return performSignExtendInRegCombine(N, DCI, DAG);		return performSignExtendInRegCombine(N, DCI, DAG);
case ISD::TRUNCATE:		case ISD::TRUNCATE:
return performVectorTruncateCombine(N, DCI, DAG);		return performVectorTruncateCombine(N, DCI, DAG);
case ISD::CONCAT_VECTORS:		case ISD::CONCAT_VECTORS:
return performConcatVectorsCombine(N, DCI, DAG);		return performConcatVectorsCombine(N, DCI, DAG);
		case ISD::INSERT_SUBVECTOR:
		return performInsertSubvectorCombine(N, DCI, DAG);
case ISD::SELECT:		case ISD::SELECT:
return performSelectCombine(N, DCI);		return performSelectCombine(N, DCI);
case ISD::VSELECT:		case ISD::VSELECT:
return performVSelectCombine(N, DCI.DAG);		return performVSelectCombine(N, DCI.DAG);
case ISD::SETCC:		case ISD::SETCC:
return performSETCCCombine(N, DAG);		return performSETCCCombine(N, DAG);
case ISD::LOAD:		case ISD::LOAD:
if (performTBISimplification(N->getOperand(1), DCI, DAG))		if (performTBISimplification(N->getOperand(1), DCI, DAG))
▲ Show 20 Lines • Show All 2,109 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,200 Lines • ▼ Show 20 Lines	if (Vec.isUndef()) {
assert(IdxVal != 0 && "Unexpected index");		assert(IdxVal != 0 && "Unexpected index");
SubVec = DAG.getNode(X86ISD::KSHIFTL, dl, WideOpVT, SubVec,		SubVec = DAG.getNode(X86ISD::KSHIFTL, dl, WideOpVT, SubVec,
DAG.getTargetConstant(IdxVal, dl, MVT::i8));		DAG.getTargetConstant(IdxVal, dl, MVT::i8));
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, OpVT, SubVec, ZeroIdx);		return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, OpVT, SubVec, ZeroIdx);
}		}

if (ISD::isBuildVectorAllZeros(Vec.getNode())) {		if (ISD::isBuildVectorAllZeros(Vec.getNode())) {
assert(IdxVal != 0 && "Unexpected index");		assert(IdxVal != 0 && "Unexpected index");
		// If upper elements of Vec are known undef, then just shift into place.
		if (llvm::all_of(Vec->ops().slice(IdxVal + SubVecNumElems),
		[](SDValue V) { return V.isUndef(); })) {
		SubVec = DAG.getNode(X86ISD::KSHIFTL, dl, WideOpVT, SubVec,
		DAG.getTargetConstant(IdxVal, dl, MVT::i8));
		} else {
NumElems = WideOpVT.getVectorNumElements();		NumElems = WideOpVT.getVectorNumElements();
unsigned ShiftLeft = NumElems - SubVecNumElems;		unsigned ShiftLeft = NumElems - SubVecNumElems;
unsigned ShiftRight = NumElems - SubVecNumElems - IdxVal;		unsigned ShiftRight = NumElems - SubVecNumElems - IdxVal;
SubVec = DAG.getNode(X86ISD::KSHIFTL, dl, WideOpVT, SubVec,		SubVec = DAG.getNode(X86ISD::KSHIFTL, dl, WideOpVT, SubVec,
DAG.getTargetConstant(ShiftLeft, dl, MVT::i8));		DAG.getTargetConstant(ShiftLeft, dl, MVT::i8));
if (ShiftRight != 0)		if (ShiftRight != 0)
SubVec = DAG.getNode(X86ISD::KSHIFTR, dl, WideOpVT, SubVec,		SubVec = DAG.getNode(X86ISD::KSHIFTR, dl, WideOpVT, SubVec,
DAG.getTargetConstant(ShiftRight, dl, MVT::i8));		DAG.getTargetConstant(ShiftRight, dl, MVT::i8));
		}
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, OpVT, SubVec, ZeroIdx);		return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, OpVT, SubVec, ZeroIdx);
}		}

// Simple case when we put subvector in the upper part		// Simple case when we put subvector in the upper part
if (IdxVal + SubVecNumElems == NumElems) {		if (IdxVal + SubVecNumElems == NumElems) {
SubVec = DAG.getNode(X86ISD::KSHIFTL, dl, WideOpVT, SubVec,		SubVec = DAG.getNode(X86ISD::KSHIFTL, dl, WideOpVT, SubVec,
DAG.getTargetConstant(IdxVal, dl, MVT::i8));		DAG.getTargetConstant(IdxVal, dl, MVT::i8));
if (SubVecNumElems * 2 == NumElems) {		if (SubVecNumElems * 2 == NumElems) {
▲ Show 20 Lines • Show All 46,443 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-neon-copy.ll

Show First 20 Lines • Show All 1,788 Lines • ▼ Show 20 Lines	entry:
%vecinit2 = shufflevector <2 x i64> %vecinit, <2 x i64> %y, <2 x i32> <i32 0, i32 2>		%vecinit2 = shufflevector <2 x i64> %vecinit, <2 x i64> %y, <2 x i32> <i32 0, i32 2>
ret <2 x i64> %vecinit2		ret <2 x i64> %vecinit2
}		}

define <2 x i64> @test_concat_v2i64_v2i64_v1i64(<2 x i64> %x, <1 x i64> %y) #0 {		define <2 x i64> @test_concat_v2i64_v2i64_v1i64(<2 x i64> %x, <1 x i64> %y) #0 {
; CHECK-LABEL: test_concat_v2i64_v2i64_v1i64:		; CHECK-LABEL: test_concat_v2i64_v2i64_v1i64:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1		; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
; CHECK-NEXT: zip1 v0.2d, v0.2d, v1.2d		; CHECK-NEXT: mov v0.d[1], v1.d[0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%vecext = extractelement <2 x i64> %x, i32 0		%vecext = extractelement <2 x i64> %x, i32 0
%vecinit = insertelement <2 x i64> undef, i64 %vecext, i32 0		%vecinit = insertelement <2 x i64> undef, i64 %vecext, i32 0
%vecext1 = extractelement <1 x i64> %y, i32 0		%vecext1 = extractelement <1 x i64> %y, i32 0
%vecinit2 = insertelement <2 x i64> %vecinit, i64 %vecext1, i32 1		%vecinit2 = insertelement <2 x i64> %vecinit, i64 %vecext1, i32 1
ret <2 x i64> %vecinit2		ret <2 x i64> %vecinit2
}		}
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/2012-04-26-sdglue.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=core-avx2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=core-avx2 \| FileCheck %s

	; rdar://11314175: SD Scheduler, BuildSchedUnits assert:			; rdar://11314175: SD Scheduler, BuildSchedUnits assert:
	; N->getNodeId() == -1 && "Node already inserted!			; N->getNodeId() == -1 && "Node already inserted!

	define void @func(<4 x float> %a, <16 x i8> %b, <16 x i8> %c, <8 x float> %d, <8 x float> %e, <8 x float>* %f) nounwind ssp {			define void @func(<4 x float> %a, <16 x i8> %b, <16 x i8> %c, <8 x float> %d, <8 x float> %e, <8 x float>* %f) nounwind ssp {
	; CHECK-LABEL: func:			; CHECK-LABEL: func:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: vmovdqu 0, %xmm0			; CHECK-NEXT: vmovdqu 0, %xmm0
	; CHECK-NEXT: vpalignr {{.*#+}} xmm1 = xmm0[4,5,6,7,8,9,10,11,12,13,14,15],xmm1[0,1,2,3]			; CHECK-NEXT: vpalignr {{.*#+}} xmm1 = xmm0[4,5,6,7,8,9,10,11,12,13,14,15],xmm1[0,1,2,3]
	; CHECK-NEXT: vmulps %xmm1, %xmm1, %xmm1			; CHECK-NEXT: vmulps %xmm1, %xmm1, %xmm1
	; CHECK-NEXT: vmulps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vmulps %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: vaddps %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vaddps %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: vaddps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vaddps %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: vmulps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vmulps %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: vperm2f128 {{.*#+}} ymm0 = zero,zero,ymm0[0,1]
	; CHECK-NEXT: vxorps %xmm1, %xmm1, %xmm1			; CHECK-NEXT: vxorps %xmm1, %xmm1, %xmm1
				; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; CHECK-NEXT: vaddps %ymm0, %ymm0, %ymm0			; CHECK-NEXT: vaddps %ymm0, %ymm0, %ymm0
	; CHECK-NEXT: vhaddps %ymm4, %ymm0, %ymm0			; CHECK-NEXT: vhaddps %ymm4, %ymm0, %ymm0
	; CHECK-NEXT: vsubps %ymm0, %ymm0, %ymm0			; CHECK-NEXT: vsubps %ymm0, %ymm0, %ymm0
	; CHECK-NEXT: vhaddps %ymm0, %ymm1, %ymm0			; CHECK-NEXT: vhaddps %ymm0, %ymm1, %ymm0
	; CHECK-NEXT: vmovaps %ymm0, (%rdi)			; CHECK-NEXT: vmovaps %ymm0, (%rdi)
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%tmp = load <4 x float>, <4 x float>* null, align 1			%tmp = load <4 x float>, <4 x float>* null, align 1
	Show All 30 Lines

llvm/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll

	Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines

	; Verify that high bits of the immediate are masked off. This should be the equivalent			; Verify that high bits of the immediate are masked off. This should be the equivalent
	; of a vinsertf128 $0 which should be optimized into a blend, so just check that it's			; of a vinsertf128 $0 which should be optimized into a blend, so just check that it's
	; not a vinsertf128 $1.			; not a vinsertf128 $1.
	define <8 x i32> @test_x86_avx_vinsertf128_si_256_2(<8 x i32> %a0, <4 x i32> %a1) {			define <8 x i32> @test_x86_avx_vinsertf128_si_256_2(<8 x i32> %a0, <4 x i32> %a1) {
	; CHECK-LABEL: test_x86_avx_vinsertf128_si_256_2:			; CHECK-LABEL: test_x86_avx_vinsertf128_si_256_2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: # kill: def $xmm1 killed $xmm1 def $ymm1			; CHECK-NEXT: # kill: def $xmm1 killed $xmm1 def $ymm1
	; CHECK-NEXT: vblendps $240, %ymm0, %ymm1, %ymm0 # encoding: [0xc4,0xe3,0x75,0x0c,0xc0,0xf0]			; CHECK-NEXT: vblendps $15, %ymm1, %ymm0, %ymm0 # encoding: [0xc4,0xe3,0x7d,0x0c,0xc1,0x0f]
	; CHECK-NEXT: # ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7]			; CHECK-NEXT: # ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]			; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <8 x i32> @llvm.x86.avx.vinsertf128.si.256(<8 x i32> %a0, <4 x i32> %a1, i8 2)			%res = call <8 x i32> @llvm.x86.avx.vinsertf128.si.256(<8 x i32> %a0, <4 x i32> %a1, i8 2)
	ret <8 x i32> %res			ret <8 x i32> %res
	}			}
	declare <8 x i32> @llvm.x86.avx.vinsertf128.si.256(<8 x i32>, <4 x i32>, i8) nounwind readnone			declare <8 x i32> @llvm.x86.avx.vinsertf128.si.256(<8 x i32>, <4 x i32>, i8) nounwind readnone

	; We don't check any vextractf128 variant with immediate 0 because that's just a move.			; We don't check any vextractf128 variant with immediate 0 because that's just a move.
	▲ Show 20 Lines • Show All 884 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx-vperm2x128.ll

Show First 20 Lines • Show All 689 Lines • ▼ Show 20 Lines	entry:
%res = add <8 x i32> %shuffle, <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>		%res = add <8 x i32> %shuffle, <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
ret <8 x i32> %res		ret <8 x i32> %res
}		}

define void @PR50053(<4 x i64>* nocapture %0, <4 x i64>* nocapture readonly %1) {		define void @PR50053(<4 x i64>* nocapture %0, <4 x i64>* nocapture readonly %1) {
; ALL-LABEL: PR50053:		; ALL-LABEL: PR50053:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: vmovaps (%rsi), %ymm0		; ALL-NEXT: vmovaps (%rsi), %ymm0
; ALL-NEXT: vmovaps 32(%rsi), %xmm1		; ALL-NEXT: vinsertf128 $1, 32(%rsi), %ymm0, %ymm1
; ALL-NEXT: vmovaps 48(%rsi), %xmm2		; ALL-NEXT: vinsertf128 $0, 48(%rsi), %ymm0, %ymm0
; ALL-NEXT: vperm2f128 {{.*#+}} ymm1 = ymm0[0,1],ymm1[0,1]
; ALL-NEXT: vmovaps %ymm1, (%rdi)		; ALL-NEXT: vmovaps %ymm1, (%rdi)
; ALL-NEXT: vblendps {{.*#+}} ymm0 = ymm2[0,1,2,3],ymm0[4,5,6,7]
; ALL-NEXT: vmovaps %ymm0, 32(%rdi)		; ALL-NEXT: vmovaps %ymm0, 32(%rdi)
; ALL-NEXT: vzeroupper		; ALL-NEXT: vzeroupper
; ALL-NEXT: retq		; ALL-NEXT: retq
%3 = load <4 x i64>, <4 x i64>* %1, align 32		%3 = load <4 x i64>, <4 x i64>* %1, align 32
%4 = getelementptr inbounds <4 x i64>, <4 x i64>* %1, i64 1		%4 = getelementptr inbounds <4 x i64>, <4 x i64>* %1, i64 1
%5 = bitcast <4 x i64>* %4 to <2 x i64>*		%5 = bitcast <4 x i64>* %4 to <2 x i64>*
%6 = load <2 x i64>, <2 x i64>* %5, align 16		%6 = load <2 x i64>, <2 x i64>* %5, align 16
%7 = getelementptr inbounds <2 x i64>, <2 x i64>* %5, i64 1		%7 = getelementptr inbounds <2 x i64>, <2 x i64>* %5, i64 1
Show All 27 Lines

llvm/test/CodeGen/X86/pr34592.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2 -O0 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2 -O0 \| FileCheck %s

	define <16 x i64> @pluto(<16 x i64> %arg, <16 x i64> %arg1, <16 x i64> %arg2, <16 x i64> %arg3, <16 x i64> %arg4) {			define <16 x i64> @pluto(<16 x i64> %arg, <16 x i64> %arg1, <16 x i64> %arg2, <16 x i64> %arg3, <16 x i64> %arg4) {
	; CHECK-LABEL: pluto:			; CHECK-LABEL: pluto:
	; CHECK: # %bb.0: # %bb			; CHECK: # %bb.0: # %bb
	; CHECK-NEXT: pushq %rbp			; CHECK-NEXT: pushq %rbp
	; CHECK-NEXT: .cfi_def_cfa_offset 16			; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: .cfi_offset %rbp, -16			; CHECK-NEXT: .cfi_offset %rbp, -16
	; CHECK-NEXT: movq %rsp, %rbp			; CHECK-NEXT: movq %rsp, %rbp
	; CHECK-NEXT: .cfi_def_cfa_register %rbp			; CHECK-NEXT: .cfi_def_cfa_register %rbp
	; CHECK-NEXT: andq $-32, %rsp			; CHECK-NEXT: andq $-32, %rsp
	; CHECK-NEXT: subq $32, %rsp			; CHECK-NEXT: subq $32, %rsp
	; CHECK-NEXT: vmovaps %ymm4, %ymm10			; CHECK-NEXT: vmovaps %ymm4, %ymm10
	; CHECK-NEXT: vmovaps %ymm3, %ymm9			; CHECK-NEXT: vmovaps %ymm3, %ymm9
	; CHECK-NEXT: vmovaps %ymm1, %ymm8			; CHECK-NEXT: vmovaps %ymm1, %ymm8
	; CHECK-NEXT: vmovaps %ymm0, %ymm3			; CHECK-NEXT: vmovaps %ymm0, %ymm4
	; CHECK-NEXT: vmovaps 240(%rbp), %ymm1			; CHECK-NEXT: vmovaps 240(%rbp), %ymm1
	; CHECK-NEXT: vmovaps 208(%rbp), %ymm4			; CHECK-NEXT: vmovaps 208(%rbp), %ymm3
	; CHECK-NEXT: vmovaps 176(%rbp), %ymm0			; CHECK-NEXT: vmovaps 176(%rbp), %ymm0
	; CHECK-NEXT: vmovaps 144(%rbp), %ymm0			; CHECK-NEXT: vmovaps 144(%rbp), %ymm0
	; CHECK-NEXT: vmovaps 112(%rbp), %ymm11			; CHECK-NEXT: vmovaps 112(%rbp), %ymm11
	; CHECK-NEXT: vmovaps 80(%rbp), %ymm11			; CHECK-NEXT: vmovaps 80(%rbp), %ymm11
	; CHECK-NEXT: vmovaps 48(%rbp), %ymm11			; CHECK-NEXT: vmovaps 48(%rbp), %ymm11
	; CHECK-NEXT: vmovaps 16(%rbp), %ymm11			; CHECK-NEXT: vmovaps 16(%rbp), %ymm11
	; CHECK-NEXT: vpblendd {{.*#+}} ymm3 = ymm6[0,1,2,3,4,5],ymm2[6,7]			; CHECK-NEXT: vpblendd {{.*#+}} ymm4 = ymm6[0,1,2,3,4,5],ymm2[6,7]
	; CHECK-NEXT: vmovaps %xmm4, %xmm6			; CHECK-NEXT: vmovaps %xmm3, %xmm6
	; CHECK-NEXT: # implicit-def: $ymm2			; CHECK-NEXT: # implicit-def: $ymm2
	; CHECK-NEXT: vinserti128 $1, %xmm6, %ymm2, %ymm2			; CHECK-NEXT: vinserti128 $1, %xmm6, %ymm2, %ymm2
	; CHECK-NEXT: vpalignr {{.*#+}} ymm0 = ymm3[8,9,10,11,12,13,14,15],ymm0[0,1,2,3,4,5,6,7],ymm3[24,25,26,27,28,29,30,31],ymm0[16,17,18,19,20,21,22,23]			; CHECK-NEXT: vpalignr {{.*#+}} ymm0 = ymm4[8,9,10,11,12,13,14,15],ymm0[0,1,2,3,4,5,6,7],ymm4[24,25,26,27,28,29,30,31],ymm0[16,17,18,19,20,21,22,23]
	; CHECK-NEXT: vpermq {{.*#+}} ymm0 = ymm0[2,3,2,0]			; CHECK-NEXT: vpermq {{.*#+}} ymm0 = ymm0[2,3,2,0]
	; CHECK-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm2[4,5],ymm0[6,7]			; CHECK-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm2[4,5],ymm0[6,7]
	; CHECK-NEXT: vextracti128 $1, %ymm7, %xmm2			; CHECK-NEXT: vextracti128 $1, %ymm7, %xmm2
	; CHECK-NEXT: vmovq {{.*#+}} xmm6 = xmm2[0],zero			; CHECK-NEXT: vmovq {{.*#+}} xmm6 = xmm2[0],zero
	; CHECK-NEXT: # implicit-def: $ymm2			; CHECK-NEXT: # implicit-def: $ymm2
	; CHECK-NEXT: vmovaps %xmm6, %xmm2			; CHECK-NEXT: vmovaps %xmm6, %xmm2
	; CHECK-NEXT: # kill: def $xmm3 killed $xmm3 killed $ymm3			; CHECK-NEXT: # kill: def $xmm4 killed $xmm4 killed $ymm4
	; CHECK-NEXT: vinserti128 $1, %xmm3, %ymm2, %ymm2			; CHECK-NEXT: vinserti128 $1, %xmm4, %ymm2, %ymm2
	; CHECK-NEXT: vmovaps %xmm7, %xmm3			; CHECK-NEXT: vmovaps %xmm7, %xmm4
	; CHECK-NEXT: vpslldq {{.*#+}} xmm6 = zero,zero,zero,zero,zero,zero,zero,zero,xmm3[0,1,2,3,4,5,6,7]			; CHECK-NEXT: vpslldq {{.*#+}} xmm6 = zero,zero,zero,zero,zero,zero,zero,zero,xmm4[0,1,2,3,4,5,6,7]
	; CHECK-NEXT: # implicit-def: $ymm3			; CHECK-NEXT: # implicit-def: $ymm4
	; CHECK-NEXT: vmovaps %xmm6, %xmm3			; CHECK-NEXT: vmovaps %xmm6, %xmm4
	; CHECK-NEXT: vpalignr {{.*#+}} ymm4 = ymm4[8,9,10,11,12,13,14,15],ymm5[0,1,2,3,4,5,6,7],ymm4[24,25,26,27,28,29,30,31],ymm5[16,17,18,19,20,21,22,23]			; CHECK-NEXT: vpalignr {{.*#+}} ymm3 = ymm3[8,9,10,11,12,13,14,15],ymm5[0,1,2,3,4,5,6,7],ymm3[24,25,26,27,28,29,30,31],ymm5[16,17,18,19,20,21,22,23]
	; CHECK-NEXT: vpermq {{.*#+}} ymm4 = ymm4[0,1,0,3]			; CHECK-NEXT: vpermq {{.*#+}} ymm3 = ymm3[0,1,0,3]
	; CHECK-NEXT: vpblendd {{.*#+}} ymm3 = ymm3[0,1,2,3],ymm4[4,5,6,7]			; CHECK-NEXT: vblendps {{.*#+}} ymm3 = ymm4[0,1,2,3],ymm3[4,5,6,7]
	; CHECK-NEXT: vpblendd {{.*#+}} ymm1 = ymm7[0,1],ymm1[2,3],ymm7[4,5,6,7]			; CHECK-NEXT: vpblendd {{.*#+}} ymm1 = ymm7[0,1],ymm1[2,3],ymm7[4,5,6,7]
	; CHECK-NEXT: vpermq {{.*#+}} ymm1 = ymm1[2,1,1,3]			; CHECK-NEXT: vpermq {{.*#+}} ymm1 = ymm1[2,1,1,3]
	; CHECK-NEXT: vpshufd {{.*#+}} ymm4 = ymm5[0,1,0,1,4,5,4,5]			; CHECK-NEXT: vpshufd {{.*#+}} ymm4 = ymm5[0,1,0,1,4,5,4,5]
	; CHECK-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0,1,2,3,4,5],ymm4[6,7]			; CHECK-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0,1,2,3,4,5],ymm4[6,7]
	; CHECK-NEXT: movq %rbp, %rsp			; CHECK-NEXT: movq %rbp, %rsp
	; CHECK-NEXT: popq %rbp			; CHECK-NEXT: popq %rbp
	; CHECK-NEXT: .cfi_def_cfa %rsp, 8			; CHECK-NEXT: .cfi_def_cfa %rsp, 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	bb:			bb:
	%tmp = select <16 x i1> <i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false>, <16 x i64> %arg, <16 x i64> %arg1			%tmp = select <16 x i1> <i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false>, <16 x i64> %arg, <16 x i64> %arg1
	%tmp5 = select <16 x i1> <i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x i64> %arg2, <16 x i64> zeroinitializer			%tmp5 = select <16 x i1> <i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x i64> %arg2, <16 x i64> zeroinitializer
	%tmp6 = select <16 x i1> <i1 false, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 true, i1 true>, <16 x i64> %arg3, <16 x i64> %tmp5			%tmp6 = select <16 x i1> <i1 false, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 true, i1 true>, <16 x i64> %arg3, <16 x i64> %tmp5
	%tmp7 = shufflevector <16 x i64> %tmp, <16 x i64> %tmp6, <16 x i32> <i32 11, i32 18, i32 24, i32 9, i32 14, i32 29, i32 29, i32 6, i32 14, i32 28, i32 8, i32 9, i32 22, i32 12, i32 25, i32 6>			%tmp7 = shufflevector <16 x i64> %tmp, <16 x i64> %tmp6, <16 x i32> <i32 11, i32 18, i32 24, i32 9, i32 14, i32 29, i32 29, i32 6, i32 14, i32 28, i32 8, i32 9, i32 22, i32 12, i32 25, i32 6>
	ret <16 x i64> %tmp7			ret <16 x i64> %tmp7
	}			}

llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll

Show First 20 Lines • Show All 557 Lines • ▼ Show 20 Lines	; ALL-NEXT: retq
%sub1234 = shufflevector <8 x float> %sub12, <8 x float> %sub34, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%sub1234 = shufflevector <8 x float> %sub12, <8 x float> %sub34, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%res = shufflevector <16 x float> %base, <16 x float> %sub1234, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%res = shufflevector <16 x float> %base, <16 x float> %sub1234, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x float> %res		ret <16 x float> %res
}		}

define <16 x float> @insert_sub1_12(<16 x float> %base, <4 x float> %sub1, <4 x float> %sub2, <4 x float> %sub3, <4 x float> %sub4) {		define <16 x float> @insert_sub1_12(<16 x float> %base, <4 x float> %sub1, <4 x float> %sub2, <4 x float> %sub3, <4 x float> %sub4) {
; ALL-LABEL: insert_sub1_12:		; ALL-LABEL: insert_sub1_12:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: vinsertf32x4 $1, %xmm2, %zmm0, %zmm1		; ALL-NEXT: vinsertf32x4 $3, %xmm2, %zmm0, %zmm0
; ALL-NEXT: vmovapd {{.*#+}} zmm2 = [0,1,2,3,4,5,10,11]
; ALL-NEXT: vpermt2pd %zmm1, %zmm2, %zmm0
; ALL-NEXT: retq		; ALL-NEXT: retq
%sub12 = shufflevector <4 x float> %sub1, <4 x float> %sub2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%sub12 = shufflevector <4 x float> %sub1, <4 x float> %sub2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%sub34 = shufflevector <4 x float> %sub3, <4 x float> %sub4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%sub34 = shufflevector <4 x float> %sub3, <4 x float> %sub4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%sub1234 = shufflevector <8 x float> %sub12, <8 x float> %sub34, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%sub1234 = shufflevector <8 x float> %sub12, <8 x float> %sub34, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%res = shufflevector <16 x float> %base, <16 x float> %sub1234, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 20, i32 21, i32 22, i32 23>		%res = shufflevector <16 x float> %base, <16 x float> %sub1234, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 20, i32 21, i32 22, i32 23>
ret <16 x float> %res		ret <16 x float> %res
}		}

define <16 x float> @insert_sub2_4(<16 x float> %base, <4 x float> %sub1, <4 x float> %sub2, <4 x float> %sub3, <4 x float> %sub4) {		define <16 x float> @insert_sub2_4(<16 x float> %base, <4 x float> %sub1, <4 x float> %sub2, <4 x float> %sub3, <4 x float> %sub4) {
; ALL-LABEL: insert_sub2_4:		; ALL-LABEL: insert_sub2_4:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: vinsertf32x4 $2, %xmm3, %zmm0, %zmm1		; ALL-NEXT: vinsertf32x4 $2, %xmm3, %zmm0, %zmm1
; ALL-NEXT: vmovapd {{.*#+}} zmm2 = [0,1,12,13,4,5,6,7]		; ALL-NEXT: vmovapd {{.*#+}} zmm2 = [0,1,12,13,4,5,6,7]
; ALL-NEXT: vpermt2pd %zmm1, %zmm2, %zmm0		; ALL-NEXT: vpermt2pd %zmm1, %zmm2, %zmm0
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions we still miss this test case - we need to add some form of concat(concat(x,y),concat(z,w)) -> concat(x,y,z,w) fold. RKSimon: we still miss this test case - we need to add some form of concat(concat(x,y),concat(z,w)) ->…
; ALL-NEXT: retq		; ALL-NEXT: retq
%sub12 = shufflevector <4 x float> %sub1, <4 x float> %sub2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%sub12 = shufflevector <4 x float> %sub1, <4 x float> %sub2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%sub34 = shufflevector <4 x float> %sub3, <4 x float> %sub4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%sub34 = shufflevector <4 x float> %sub3, <4 x float> %sub4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%sub1234 = shufflevector <8 x float> %sub12, <8 x float> %sub34, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%sub1234 = shufflevector <8 x float> %sub12, <8 x float> %sub34, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%res = shufflevector <16 x float> %base, <16 x float> %sub1234, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 24, i32 25, i32 26, i32 27, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%res = shufflevector <16 x float> %base, <16 x float> %sub1234, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 24, i32 25, i32 26, i32 27, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x float> %res		ret <16 x float> %res
}		}

define <16 x float> @insert_sub01_8(<16 x float> %base, <4 x float> %sub1, <4 x float> %sub2, <4 x float> %sub3, <4 x float> %sub4) {		define <16 x float> @insert_sub01_8(<16 x float> %base, <4 x float> %sub1, <4 x float> %sub2, <4 x float> %sub3, <4 x float> %sub4) {
; ALL-LABEL: insert_sub01_8:		; ALL-LABEL: insert_sub01_8:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1		; ALL-NEXT: # kill: def $xmm1 killed $xmm1 def $ymm1
; ALL-NEXT: vinsertf32x4 $1, %xmm2, %zmm1, %zmm1		; ALL-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
; ALL-NEXT: vinsertf64x4 $1, %ymm1, %zmm0, %zmm0		; ALL-NEXT: vinsertf64x4 $1, %ymm1, %zmm0, %zmm0
; ALL-NEXT: retq		; ALL-NEXT: retq
%sub12 = shufflevector <4 x float> %sub1, <4 x float> %sub2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%sub12 = shufflevector <4 x float> %sub1, <4 x float> %sub2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%sub34 = shufflevector <4 x float> %sub3, <4 x float> %sub4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%sub34 = shufflevector <4 x float> %sub3, <4 x float> %sub4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%sub1234 = shufflevector <8 x float> %sub12, <8 x float> %sub34, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%sub1234 = shufflevector <8 x float> %sub12, <8 x float> %sub34, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%res = shufflevector <16 x float> %base, <16 x float> %sub1234, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>		%res = shufflevector <16 x float> %base, <16 x float> %sub1234, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
ret <16 x float> %res		ret <16 x float> %res
}		}

define <16 x float> @insert_sub23_0(<16 x float> %base, <4 x float> %sub1, <4 x float> %sub2, <4 x float> %sub3, <4 x float> %sub4) {		define <16 x float> @insert_sub23_0(<16 x float> %base, <4 x float> %sub1, <4 x float> %sub2, <4 x float> %sub3, <4 x float> %sub4) {
; ALL-LABEL: insert_sub23_0:		; ALL-LABEL: insert_sub23_0:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: # kill: def $xmm3 killed $xmm3 def $ymm3		; ALL-NEXT: # kill: def $xmm3 killed $xmm3 def $ymm3
; ALL-NEXT: vinsertf128 $1, %xmm4, %ymm3, %ymm1		; ALL-NEXT: vinsertf128 $1, %xmm4, %ymm3, %ymm1
; ALL-NEXT: vinsertf64x4 $1, %ymm1, %zmm0, %zmm1		; ALL-NEXT: vinsertf64x4 $0, %ymm1, %zmm0, %zmm0
; ALL-NEXT: vshuff64x2 {{.*#+}} zmm0 = zmm1[4,5,6,7],zmm0[4,5,6,7]
; ALL-NEXT: retq		; ALL-NEXT: retq
%sub12 = shufflevector <4 x float> %sub1, <4 x float> %sub2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%sub12 = shufflevector <4 x float> %sub1, <4 x float> %sub2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%sub34 = shufflevector <4 x float> %sub3, <4 x float> %sub4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%sub34 = shufflevector <4 x float> %sub3, <4 x float> %sub4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%sub1234 = shufflevector <8 x float> %sub12, <8 x float> %sub34, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%sub1234 = shufflevector <8 x float> %sub12, <8 x float> %sub34, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%res = shufflevector <16 x float> %base, <16 x float> %sub1234, <16 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%res = shufflevector <16 x float> %base, <16 x float> %sub1234, <16 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x float> %res		ret <16 x float> %res
}		}

▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 364466

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/AArch64/arm64-neon-copy.ll

llvm/test/CodeGen/X86/2012-04-26-sdglue.ll

llvm/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll

llvm/test/CodeGen/X86/avx-vperm2x128.ll

llvm/test/CodeGen/X86/pr34592.ll

llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll

[DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns.
ClosedPublic