This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR
ClosedPublic

Authored by sdesmalen on Jun 30 2020, 1:19 PM.

Details

Summary

There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics,
that shouldn't really have been there (I suspect this was a remnant from when
we expected the wider vector always to have come from a vector CONCAT).

When I tried to create a more minimal reproducer, I found a bug in
DAGCombiner where it drops the scalable flag when trying to fold:

extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')

This patch fixes both issues.

Diff Detail

Event Timeline

sdesmalen created this revision.Jun 30 2020, 1:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2020, 1:19 PM
efriedma added inline comments.Jun 30 2020, 1:51 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19221–19222

While we're here, do we need to change these getVectorNumElements() calls to getVectorElementCount()?

19238–19239

Does this math work correctly if we're extracting a fixed vector from a scalable vector?

sdesmalen updated this revision to Diff 274824.Jul 1 2020, 8:26 AM
sdesmalen marked an inline comment as done.
  • Removed uses of getVectorNumElements in favour of getVectorMinNumElements
  • Removed another warning from IRTranslator, so the test can have a CHECK line for no warnings (coming from TypeSize)
sdesmalen added inline comments.Jul 1 2020, 8:29 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19238–19239

Yes. To be sure, I tested this with some intrinsic that maps to EXTRACT_SUBVECTOR:

define <2 x i64> @extract_2i64_nxv16i8(<vscale x 16 x i8> %z0) {
  %z0_bc = bitcast <vscale x 16 x i8> %z0 to <vscale x 2 x i64>
  %ext = call <2 x i64> @llvm.experimental.extractsubvec.v2i64.nxv2i64(<vscale x 2 x i64> %z0_bc, i32 2)
  ret <2 x i64> %ext
}
=>
Optimized lowered selection DAG: %bb.0 'extract_2i64_nxv16i8:'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
        t2: nxv16i8,ch = CopyFromReg t0, Register:nxv16i8 %0
      t10: v16i8 = extract_subvector t2, Constant:i64<16>
    t11: v2i64 = bitcast t10
  t7: ch,glue = CopyToReg t0, Register:v2i64 $q0, t11
  t8: ch = AArch64ISD::RET_FLAG t7, Register:v2i64 $q0, t7:1

and for the other:

define <16 x i8> @extract_16i8_nxv2i64(<vscale x 2 x i64> %z0) {
  %z0_bc = bitcast <vscale x 2 x i64> %z0 to <vscale x 16 x i8>
  %ext = call <16 x i8> @llvm.experimental.extractsubvec.v16i8.nxv16i8(<vscale x 16 x i8> %z0_bc, i32 16)
  ret <16 x i8> %ext
}
=>
Optimized lowered selection DAG: %bb.0 'extract_16i8_nxv2i64:'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
        t2: nxv2i64,ch = CopyFromReg t0, Register:nxv2i64 %0
      t10: v2i64 = extract_subvector t2, Constant:i64<2>
    t11: v16i8 = bitcast t10
  t7: ch,glue = CopyToReg t0, Register:v16i8 $q0, t11
  t8: ch = AArch64ISD::RET_FLAG t7, Register:v16i8 $q0, t7:1

I wasn't planning to add intrinsic as part of this patch to test the behaviour.

efriedma added inline comments.Jul 1 2020, 12:11 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19238–19239

I'm specifically concerned about cases where the number of lanes in the output fixed vector is greater than the number of lanes in the input scalable vector.

efriedma accepted this revision.Jul 1 2020, 12:16 PM

LGTM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19238–19239

Actually, hmm, I think that's fine; the math operations doesn't care about the total number of elements in the output.

This revision is now accepted and ready to land.Jul 1 2020, 12:16 PM
This revision was automatically updated to reflect the committed changes.