This is an archive of the discontinued LLVM Phabricator instance.

[DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns.
ClosedPublic

Authored by RKSimon on Jul 29 2021, 6:31 AM.

Details

Summary

IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.

This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.

This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.

Fixes PR50053

Diff Detail

Event Timeline

RKSimon created this revision.Jul 29 2021, 6:31 AM
RKSimon requested review of this revision.Jul 29 2021, 6:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2021, 6:31 AM
craig.topper added inline comments.Aug 1 2021, 10:48 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21413

What is the last index in the span is -1, we won't reach this.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
7919

Why does AArch64 have a mix of index types? Shouldn't everything be using getVectorIdxTy?

Vector element mov is generally the same cost as zip1 on aarch64; zip1 is preferable because we can specify the destination register.

https://github.com/llvm/llvm-project/blob/56e7b6c3924d7ba8db70c38235a77ed8208795eb/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L10552 is supposed to ensure that only INSERT_SUBVECTOR with index 0 is legal, I think? Maybe it broke somehow.

Vector element mov is generally the same cost as zip1 on aarch64; zip1 is preferable because we can specify the destination register.

https://github.com/llvm/llvm-project/blob/56e7b6c3924d7ba8db70c38235a77ed8208795eb/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L10552 is supposed to ensure that only INSERT_SUBVECTOR with index 0 is legal, I think? Maybe it broke somehow.

That only seems to be setup for scalable vectors - would it be better if I added a aarch64 combine that folds these to a CONCAT_VECTORS pattern?

RKSimon added inline comments.Aug 1 2021, 12:44 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21413

Nice catch - I'll fix this tomorrow

RKSimon updated this revision to Diff 363479.Aug 2 2021, 8:02 AM

Match subvectors with trailing undef shuffle mask elements

Any thoughts on what we should do for the missing aarch64 insert_subvector isel?

RKSimon marked an inline comment as done.Aug 2 2021, 8:02 AM
craig.topper added inline comments.Aug 2 2021, 9:14 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21342

Why can't this be after the loop and all of the breaks replaced with return SDValue()?

RKSimon added inline comments.Aug 2 2021, 10:32 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21342

Because we have an outer loop at the top trying all possible positions for SubIdx

craig.topper added inline comments.Aug 2 2021, 11:03 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21342

Oh, right forgot we were in a nested loop.

Vector element mov is generally the same cost as zip1 on aarch64; zip1 is preferable because we can specify the destination register.

https://github.com/llvm/llvm-project/blob/56e7b6c3924d7ba8db70c38235a77ed8208795eb/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L10552 is supposed to ensure that only INSERT_SUBVECTOR with index 0 is legal, I think? Maybe it broke somehow.

That only seems to be setup for scalable vectors - would it be better if I added a aarch64 combine that folds these to a CONCAT_VECTORS pattern?

Oh, right.

I guess saying NEON INSERT_SUBVECTOR is legal is fine. We definitely do want to optimize to CONCAT_VECTORS where applicable, though.

Vector element mov is generally the same cost as zip1 on aarch64; zip1 is preferable because we can specify the destination register.

https://github.com/llvm/llvm-project/blob/56e7b6c3924d7ba8db70c38235a77ed8208795eb/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L10552 is supposed to ensure that only INSERT_SUBVECTOR with index 0 is legal, I think? Maybe it broke somehow.

That only seems to be setup for scalable vectors - would it be better if I added a aarch64 combine that folds these to a CONCAT_VECTORS pattern?

Oh, right.

I guess saying NEON INSERT_SUBVECTOR is legal is fine. We definitely do want to optimize to CONCAT_VECTORS where applicable, though.

OK, I'll drop the isel patterns and add a aarch64 combine to CONCAT_VECTORS instead.

RKSimon updated this revision to Diff 363706.Aug 3 2021, 5:27 AM
RKSimon edited the summary of this revision. (Show Details)

Replaced aarch64 isel patterns with a insert_subvector -> concat_vectors combine

What about insertion of multiple subvectors?

This looks complicated. Can't we do this simpler?
If we want to keep the current general logic,
can't you just generate a shuffle mask that you'd need
when inserting n'th subvector of RHS into identity LHS,
and compare that the mask matches the actual mask? (modulo undefs)

Otherwise, can't you just go through the mask, record the elements that are identity,
then go through the elements that aren't identity, see that they are sequential,
that there are only enough of them for a single subvector insertion,
and decode which subvector is being inserted?

What about insertion of multiple subvectors?

Do you have an example? visitINSERT_SUBVECTOR already does some canonicalizations and can fold to CONCAT_VECTORS

This looks complicated. Can't we do this simpler?
If we want to keep the current general logic, can't you just generate a shuffle mask that you'd need when inserting n'th subvector of RHS into identity LHS, and compare that the mask matches the actual mask? (modulo undefs)

I'll look at this - but an earlier prototype tried something similar and it ended up being rather messy.

RKSimon updated this revision to Diff 363755.Aug 3 2021, 8:41 AM

use candidate shuffle mask creation + comparison - this is a lot easier now that I've moved the getNode() inside the nested for-loops.

What about insertion of multiple subvectors?

Do you have an example? visitINSERT_SUBVECTOR already does some canonicalizations and can fold to CONCAT_VECTORS

I mean something along the lines of

// e.g. v2i32 into v8i32:
// shuffle(lhs,concat(rhs0,rhs1,rhs2,rhs3),0,1,2,3,10,11,12,13).
// --> insert_subvector(insert_subvector(lhs,rhs1,4),rhs2,6).

and

// e.g. v2i32 into v8i32:
// shuffle(lhs,concat(rhs0,rhs1,rhs2,rhs3),8,9,2,3,4,5,12,13).
// --> insert_subvector(insert_subvector(lhs,rhs0,0),rhs2,6).

I haven't checked what happens currently, i only saw that the patch only handles a single insertion, iff the shuffle goes away.

This looks complicated. Can't we do this simpler?
If we want to keep the current general logic, can't you just generate a shuffle mask that you'd need when inserting n'th subvector of RHS into identity LHS, and compare that the mask matches the actual mask? (modulo undefs)

I'll look at this - but an earlier prototype tried something similar and it ended up being rather messy.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21308–21310

+ Undef's are okay.

I believe we can't have -2 (zero) mask elements yet here?

I like this version much more! :)
I do wonder if the other could be less computationally-intensive, but is is good too.

RKSimon added inline comments.Aug 3 2021, 9:46 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21308–21310

Yeah - SelectionDAG::getVectorShuffle asserts: "M < (NElts * 2) && M >= -1;"

I'll add a comment making to clear undefs are ok

RKSimon updated this revision to Diff 363785.Aug 3 2021, 9:55 AM

Update unary shuffle match comment to mention undef elts

(i'm waiting on a reply regarding multiple subvector insertions before providing any further feedback)

(i'm waiting on a reply regarding multiple subvector insertions before providing any further feedback)

Looking at this now - rG14b71efd979ce3dacf6b3d9913df8e4f063224c5

RKSimon updated this revision to Diff 363843.Aug 3 2021, 12:54 PM

rebase with nested concat_vectors tests

RKSimon added inline comments.Aug 3 2021, 12:56 PM
llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
580 ↗(On Diff #363843)

we still miss this test case - we need to add some form of concat(concat(x,y),concat(z,w)) -> concat(x,y,z,w) fold.

Do you want to deal with that first, or afterwards?

Do you want to deal with that first, or afterwards?

The concat(concat,concat) fold is orthogonal to this patch, it just helps expose more opportunities for it, so I can work on it in parallel.

I think this patch is mainly waiting on feedback from aarch64 gurus - @efriedma @dmgreen @t.p.northover any comments?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13648 ↗(On Diff #363843)

'subector' typo - I'll fix

RKSimon updated this revision to Diff 364031.Aug 4 2021, 3:47 AM

fix typo

lebedev.ri accepted this revision.Aug 4 2021, 3:50 PM

Ignoring AArch64 change (@efriedma @dmgreen @t.p.northover ?), this looks fine to me.
I might experiment with an partial/iterative approach, but later.
Thanks.

This revision is now accepted and ready to land.Aug 4 2021, 3:50 PM

I was trying out some SVE fixed length lowering, which appears to be fine as we don't mark INSERT_SUBREG as legal/custom at the moment. Seems OK from what I can tell.

This revision was landed with ongoing or failed builds.Aug 5 2021, 7:41 AM
This revision was automatically updated to reflect the committed changes.

I have tracked down an assertion failure in a Halide pipeline that targets armeabi-v7a to this change. I don't think it is Halide per-se, but rather codegen that happens to be exposed by this pipeline. I'm working on extracting it. My sense is that this target needs work similar to the others, but to get started, here is the assertion:

LLVM ERROR: Cannot select: 0x47547d4f7820: v4f32 = insert_subvector 0x47547d8b43a8, 0x47547d8b6000, Constant:i32<0>
  0x47547d8b43a8: v4f32 = fmul nnan ninf nsz contract afn reassoc 0x47547d8b6068, 0x47547d8b6d00
    0x47547d8b6068: v4f32 = ARMISD::VDUP 0x47547d8b6478
      0x47547d8b6478: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8ba958, 0x47547d8b64e0
        0x47547d8ba958: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba270, 0x47547d8ba208
          0x47547d8ba270: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8b4000
            0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
              0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278
                  0x47547da92bc8: i32 = Register %278
                0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199
                  0x47547d8ba340: i32 = Register %199
              0x47547d7e0f08: i32 = undef
            0x47547d8b4000: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8b68f0, 0x47547d8b6888
              0x47547d8b68f0: f32,ch = load<(load (s32) from %ir.lsr.iv902906, !tbaa !48)> 0x47547fd271e8, 0x47547d4d8888, undef:i32
                0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278
                  0x47547da92bc8: i32 = Register %278
                0x47547d7e0f08: i32 = undef
              0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                0x47547d8b4068: f32 = Register %1015
          0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
            0x47547d8ba9c0: f32 = Register %1016
        0x47547d8b64e0: f32 = sint_to_fp 0x47547d8b6f70
          0x47547d8b6f70: i32 = fp_to_sint 0x47547d4e53a8
            0x47547d4e53a8: f32 = bitcast 0x47547d4d86e8
              0x47547d4d86e8: i32,ch,glue = CopyFromReg 0x47547d7d6bc8, Register:i32 $r0, 0x47547d7d6bc8:1
                0x47547d8bae38: i32 = Register $r0
                0x47547d7d6bc8: ch,glue = callseq_end 0x47547d4e5750, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d4e5750:1
                  0x47547d8bc820: i32 = TargetConstant<0>
                  0x47547d8b8ea0: i32 = TargetConstant<-1>
                  0x47547d4e5750: ch,glue = ARMISD::CALL 0x47547d7d60d0, TargetExternalSymbol:i32'floorf', Register:i32 $r0, RegisterMask:Untyped, 0x47547d7d60d0:1




    0x47547d8b6d00: v4f32 = ARMISD::BUILD_VECTOR undef:f32, undef:f32, 0x47547d8bad68, 0x47547d8ba618
      0x47547d8bca90: f32 = undef
      0x47547d8bca90: f32 = undef
      0x47547d8bad68: f32 = fsub nnan ninf nsz contract afn reassoc ConstantFP:f32<1.000000e+00>, 0x47547d8ba618
        0x47547d8bac98: f32 = ConstantFP<1.000000e+00>
        0x47547d8ba618: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8b6750, 0x47547d8bcea0
          0x47547d8b6750: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba888, 0x47547d8ba208
            0x47547d8ba888: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8ba138
              0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
                0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199

                0x47547d7e0f08: i32 = undef
              0x47547d8ba138: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8ba8f0, 0x47547d8b6888
                0x47547d8ba8f0: f32,ch = load<(load (s32) from %ir.scevgep904905, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba1a0, undef:i32
                  0x47547d8ba1a0: i32 = add 0x47547d4d8888, 0x47547d8b6f08


                  0x47547d7e0f08: i32 = undef
                0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                  0x47547d8b4068: f32 = Register %1015
            0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
              0x47547d8ba9c0: f32 = Register %1016
          0x47547d8bcea0: f32 = sint_to_fp 0x47547d8b66e8
            0x47547d8b66e8: i32 = fp_to_sint 0x47547d6d02d8
              0x47547d6d02d8: f32 = bitcast 0x47547d8b8f70
                0x47547d8b8f70: i32,ch,glue = CopyFromReg 0x47547d8b8f08, Register:i32 $r0, 0x47547d8b8f08:1
                  0x47547d8bae38: i32 = Register $r0
                  0x47547d8b8f08: ch,glue = callseq_end 0x47547d8b6138, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d8b6138:1



      0x47547d8ba618: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8b6750, 0x47547d8bcea0
        0x47547d8b6750: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba888, 0x47547d8ba208
          0x47547d8ba888: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8ba138
            0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
              0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278
                  0x47547da92bc8: i32 = Register %278
                0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199
                  0x47547d8ba340: i32 = Register %199
              0x47547d7e0f08: i32 = undef
            0x47547d8ba138: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8ba8f0, 0x47547d8b6888
              0x47547d8ba8f0: f32,ch = load<(load (s32) from %ir.scevgep904905, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba1a0, undef:i32
                0x47547d8ba1a0: i32 = add 0x47547d4d8888, 0x47547d8b6f08
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d8b6f08: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %198

                0x47547d7e0f08: i32 = undef
              0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                0x47547d8b4068: f32 = Register %1015
          0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
            0x47547d8ba9c0: f32 = Register %1016
        0x47547d8bcea0: f32 = sint_to_fp 0x47547d8b66e8
          0x47547d8b66e8: i32 = fp_to_sint 0x47547d6d02d8
            0x47547d6d02d8: f32 = bitcast 0x47547d8b8f70
              0x47547d8b8f70: i32,ch,glue = CopyFromReg 0x47547d8b8f08, Register:i32 $r0, 0x47547d8b8f08:1
                0x47547d8bae38: i32 = Register $r0
                0x47547d8b8f08: ch,glue = callseq_end 0x47547d8b6138, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d8b6138:1
                  0x47547d8bc820: i32 = TargetConstant<0>
                  0x47547d8b8ea0: i32 = TargetConstant<-1>
                  0x47547d8b6138: ch,glue = ARMISD::CALL 0x47547d8bcd00, TargetExternalSymbol:i32'floorf', Register:i32 $r0, RegisterMask:Untyped, 0x47547d8bcd00:1




  0x47547d8b6000: v2f32 = fmul nnan ninf nsz contract afn reassoc 0x47547d4e3b60, 0x47547d7e02d8
    0x47547d4e3b60: v2f32 = ARMISD::VDUP 0x47547d8bcaf8
      0x47547d8bcaf8: f32 = fsub nnan ninf nsz contract afn reassoc ConstantFP:f32<1.000000e+00>, 0x47547d8b6478
        0x47547d8bac98: f32 = ConstantFP<1.000000e+00>
        0x47547d8b6478: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8ba958, 0x47547d8b64e0
          0x47547d8ba958: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba270, 0x47547d8ba208
            0x47547d8ba270: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8b4000
              0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
                0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199

                0x47547d7e0f08: i32 = undef
              0x47547d8b4000: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8b68f0, 0x47547d8b6888
                0x47547d8b68f0: f32,ch = load<(load (s32) from %ir.lsr.iv902906, !tbaa !48)> 0x47547fd271e8, 0x47547d4d8888, undef:i32
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d7e0f08: i32 = undef
                0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                  0x47547d8b4068: f32 = Register %1015
            0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
              0x47547d8ba9c0: f32 = Register %1016
          0x47547d8b64e0: f32 = sint_to_fp 0x47547d8b6f70
            0x47547d8b6f70: i32 = fp_to_sint 0x47547d4e53a8
              0x47547d4e53a8: f32 = bitcast 0x47547d4d86e8
                0x47547d4d86e8: i32,ch,glue = CopyFromReg 0x47547d7d6bc8, Register:i32 $r0, 0x47547d7d6bc8:1
                  0x47547d8bae38: i32 = Register $r0
                  0x47547d7d6bc8: ch,glue = callseq_end 0x47547d4e5750, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d4e5750:1



    0x47547d7e02d8: v2f32 = ARMISD::BUILD_VECTOR 0x47547d8bad68, 0x47547d8ba618
      0x47547d8bad68: f32 = fsub nnan ninf nsz contract afn reassoc ConstantFP:f32<1.000000e+00>, 0x47547d8ba618
        0x47547d8bac98: f32 = ConstantFP<1.000000e+00>
        0x47547d8ba618: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8b6750, 0x47547d8bcea0
          0x47547d8b6750: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba888, 0x47547d8ba208
            0x47547d8ba888: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8ba138
              0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
                0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199

                0x47547d7e0f08: i32 = undef
              0x47547d8ba138: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8ba8f0, 0x47547d8b6888
                0x47547d8ba8f0: f32,ch = load<(load (s32) from %ir.scevgep904905, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba1a0, undef:i32
                  0x47547d8ba1a0: i32 = add 0x47547d4d8888, 0x47547d8b6f08


                  0x47547d7e0f08: i32 = undef
                0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                  0x47547d8b4068: f32 = Register %1015
            0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
              0x47547d8ba9c0: f32 = Register %1016
          0x47547d8bcea0: f32 = sint_to_fp 0x47547d8b66e8
            0x47547d8b66e8: i32 = fp_to_sint 0x47547d6d02d8
              0x47547d6d02d8: f32 = bitcast 0x47547d8b8f70
                0x47547d8b8f70: i32,ch,glue = CopyFromReg 0x47547d8b8f08, Register:i32 $r0, 0x47547d8b8f08:1
                  0x47547d8bae38: i32 = Register $r0
                  0x47547d8b8f08: ch,glue = callseq_end 0x47547d8b6138, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d8b6138:1



      0x47547d8ba618: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8b6750, 0x47547d8bcea0
        0x47547d8b6750: f32 = fdiv nnan ninf nsz contract afn reassoc 0x47547d8ba888, 0x47547d8ba208
          0x47547d8ba888: f32 = fsub nnan ninf nsz contract afn reassoc 0x47547d8baa28, 0x47547d8ba138
            0x47547d8baa28: f32,ch = load<(load (s32) from %ir.scevgep907908, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba2d8, undef:i32
              0x47547d8ba2d8: i32 = add 0x47547d4d8888, 0x47547d8baa90
                0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278
                  0x47547da92bc8: i32 = Register %278
                0x47547d8baa90: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %199
                  0x47547d8ba340: i32 = Register %199
              0x47547d7e0f08: i32 = undef
            0x47547d8ba138: f32 = fadd nnan ninf nsz contract afn reassoc 0x47547d8ba8f0, 0x47547d8b6888
              0x47547d8ba8f0: f32,ch = load<(load (s32) from %ir.scevgep904905, !tbaa !48)> 0x47547fd271e8, 0x47547d8ba1a0, undef:i32
                0x47547d8ba1a0: i32 = add 0x47547d4d8888, 0x47547d8b6f08
                  0x47547d4d8888: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %278

                  0x47547d8b6f08: i32,ch = CopyFromReg 0x47547fd271e8, Register:i32 %198

                0x47547d7e0f08: i32 = undef
              0x47547d8b6888: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1015
                0x47547d8b4068: f32 = Register %1015
          0x47547d8ba208: f32,ch = CopyFromReg 0x47547fd271e8, Register:f32 %1016
            0x47547d8ba9c0: f32 = Register %1016
        0x47547d8bcea0: f32 = sint_to_fp 0x47547d8b66e8
          0x47547d8b66e8: i32 = fp_to_sint 0x47547d6d02d8
            0x47547d6d02d8: f32 = bitcast 0x47547d8b8f70
              0x47547d8b8f70: i32,ch,glue = CopyFromReg 0x47547d8b8f08, Register:i32 $r0, 0x47547d8b8f08:1
                0x47547d8bae38: i32 = Register $r0
                0x47547d8b8f08: ch,glue = callseq_end 0x47547d8b6138, TargetConstant:i32<0>, TargetConstant:i32<-1>, 0x47547d8b6138:1
                  0x47547d8bc820: i32 = TargetConstant<0>
                  0x47547d8b8ea0: i32 = TargetConstant<-1>
                  0x47547d8b6138: ch,glue = ARMISD::CALL 0x47547d8bcd00, TargetExternalSymbol:i32'floorf', Register:i32 $r0, RegisterMask:Untyped, 0x47547d8bcd00:1

Do you have a .ll file we can try as a reproducer?

RKSimon added a comment.EditedAug 6 2021, 1:08 AM

It looks like ARM is missing the equivalent combine to concat_vectors as AArch64 (or maybe it'll be easier to add isel patterns for neon - I'll have to look), But a test case would be useful please!

Test case attached. Crashes with

$ llc < bugpoint-reduced-simplified.ll -mtriple=armv7a-linux-gnu
LLVM ERROR: Cannot select: t73: v4i32 = insert_subvector t71, t69,
Constant:i32<0>

t71: v4i32 = ARMISD::VMOVIMM TargetConstant:i32<0>
  t28: i32 = TargetConstant<0>
t69: v2i32 = ARMISD::VMOVIMM TargetConstant:i32<0>
  t28: i32 = TargetConstant<0>
t3: i32 = Constant<0>

- {F18392564, layout=link}

Cheers - looking at this now