This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fix issues with subvector intrinsic index types
ClosedPublic

Authored by frasercrmck on Feb 25 2021, 3:39 AM.

Details

Summary

This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.

Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.

The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.

Diff Detail

Event Timeline

frasercrmck created this revision.Feb 25 2021, 3:39 AM
frasercrmck requested review of this revision.Feb 25 2021, 3:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2021, 3:39 AM

Btw, there's a line in AArch64InstrInfo.td which might now be redundant: defm : InsertSubvectorUndef<i32>; since I take it with this patch there's not going to be i32 indices being thrown around.

Hi @frasercrmck thanks for fixing this, this seems like an oversight when adding the intrinsic.

llvm/include/llvm/CodeGen/ISDOpcodes.h
1049

nit: unrelated change.

llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll
104

Probably not something that's caused by your patch, but I would have expected that elements v1.b[12], v1.b[14] and v1.b[15] to also be set to -1?

frasercrmck added inline comments.Feb 25 2021, 4:44 AM
llvm/include/llvm/CodeGen/ISDOpcodes.h
1049

True. It's something arcanist prompted me about when creating the patch. Should probably just be pre-committed?

llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll
104

Perhaps. I'll admit I don't know much about AArch64 but that makes sense. My patch has taken us from:

t0: ch = EntryToken
t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
t29: v8i8 = extract_subvector t21, Constant:i64<0>
t84: v16i8 = insert_subvector undef:v16i8, t29, Constant:i32<0>
t86: i32 = extract_vector_elt t84, Constant:i64<6>
// and other extracts from t84 of indices < 8
// use t21 elsewhere

to

t0: ch = EntryToken
t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
t86: i32 = extract_vector_elt t2, Constant:i64<6>
// and other extracts from t2 of indices < 8
// use t21 elsewhere

Which I think is a good thing, since there was previously a redundant extract/insert going on, and a dependency on a vector whose elements different from t2 weren't demanded.

Now as to *why* this is happening, there's this code in SelectionDAG.cpp when creating an ISD::INSERT_SUBVECTOR:

// If this is an insert of an extracted vector into an undef vector, we
// can just use the input to the extract.
if (N1.isUndef() && N2.getOpcode() == ISD::EXTRACT_SUBVECTOR &&
    N2.getOperand(1) == N3 && N2.getOperand(0).getValueType() == VT)
  return N2.getOperand(0);

And I think it would be here N2.getOperand(1) == N3 failed because you were mixing i32 and i64 types between insertion and extraction.

frasercrmck added inline comments.Feb 25 2021, 4:46 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8018

I've just noticed that you may now be able to use SelectionDAG::WidenVector here? That's definitely outside the scope of this patch, though.

frasercrmck added inline comments.Feb 25 2021, 5:06 AM
llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll
104

Why the insert to element 12 is being removed, I'm not sure.

Replacing.2 t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>

With: t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>

That's pretty odd, but then again there's nothing in elements 8, 14, or 15. It seems to be an extension of what is happening to the other 3 elements before my patch:

Replacing.2 t25: v16i8 = insert_vector_elt t23, Constant:i32<-1>, Constant:i64<15>

With: t23: v16i8 = insert_vector_elt t21, Constant:i32<-1>, Constant:i64<14>

Since the test is doing v8i8 and of the low half of a vector with this upper half filled with -1s, presumably it can tell at some point that it's redundant and it can just use the low half. I don't know why it doesn't remove all of the -1s though.

sdesmalen accepted this revision.Feb 25 2021, 7:27 AM

That one test is either already broken, or I'm missing some important detail. In any case, the changes in this patch look good to me.

llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll
104

If I visualise what SelectionDAG was/is trying to do here:

reduce.and(<e0, e1, e2, e3, e4, e5, e6, e7, e8, ??, ??, ??, ??, ??, ??, ??>)
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^
                                           must each be -1 for the AND reduction
=>
           <e0, e1, e2, e3, e4, e5, e6, e7, e8, -1, -1, -1, -1, -1, -1, -1>

=> this is somehow optimized to:
           <e0, e1, e2, e3, e4, e5, e6, e7, e8, -1, -1, -1, ??, -1, ??, ??>

it then extracts the Lo/Hi part of the vector,:
           Lo = <e0, e1, e2, e3, e4, e5, e6, e7>
           Hi = <e8, -1, -1, -1, ??, -1, ??, ??>

Performs a vector-AND:
       AND(<e0, e1, e2, e3, e4, e5, e6, e7>,
           <e8, -1, -1, -1, ??, -1, ??, ??>)
       <=>
           <aa, bb, cc, dd, ??, ff, ??, ??>

And then does an element-wise reduction:
  ...
    AND(??,       
      AND(dd,
        AND(cc,
          AND(aa, bb)

The ?? elements could still be anything, so if any of these elements is 0, then the reduction result is also 0. So unless I'm missing something, there seems to be something broken here, already before your patch.

The inserts of -1 into element 14 and 15 were removed by https://github.com/llvm/llvm-project/commit/fc2b4a02b1a82c40ac1459cd15b9911ebfc78acc.

(the output from selectiondag before this patch:

Vector/type-legalized selection DAG: %bb.0 'test_v9i8:'
SelectionDAG has 48 nodes:
  t0: ch = EntryToken
                t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
              t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
            t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
          t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
        t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
      t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
    t23: v16i8 = insert_vector_elt t21, Constant:i32<-1>, Constant:i64<14>
  t25: v16i8 = insert_vector_elt t23, Constant:i32<-1>, Constant:i64<15>
                  t56: i32 = extract_vector_elt t32, Constant:i64<0>
                  t57: i32 = extract_vector_elt t32, Constant:i64<1>
                t58: i32 = and t56, t57
                t59: i32 = extract_vector_elt t32, Constant:i64<2>
              t60: i32 = and t58, t59
              t61: i32 = extract_vector_elt t32, Constant:i64<3>
            t62: i32 = and t60, t61
            t63: i32 = extract_vector_elt t32, Constant:i64<4>
          t64: i32 = and t62, t63
          t65: i32 = extract_vector_elt t32, Constant:i64<5>
        t66: i32 = and t64, t65
        t67: i32 = extract_vector_elt t32, Constant:i64<6>
      t68: i32 = and t66, t67
      t69: i32 = extract_vector_elt t32, Constant:i64<7>
    t70: i32 = and t68, t69
  t8: ch,glue = CopyToReg t0, Register:i32 $w0, t70
    t29: v8i8 = extract_subvector t25, Constant:i64<0>
    t31: v8i8 = extract_subvector t25, Constant:i64<8>
  t32: v8i8 = and t29, t31
  t9: ch = AArch64ISD::RET_FLAG t8, Register:i32 $w0, t8:1


Optimized vector-legalized selection DAG: %bb.0 'test_v9i8:'
SelectionDAG has 44 nodes:
  t0: ch = EntryToken
            t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
          t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
        t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
      t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
    t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
  t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
                  t56: i32 = extract_vector_elt t32, Constant:i64<0>
                  t57: i32 = extract_vector_elt t32, Constant:i64<1>
                t58: i32 = and t56, t57
                t59: i32 = extract_vector_elt t32, Constant:i64<2>
              t60: i32 = and t58, t59
              t61: i32 = extract_vector_elt t32, Constant:i64<3>
            t62: i32 = and t60, t61
            t75: i32 = extract_vector_elt t29, Constant:i64<4>
          t64: i32 = and t62, t75
          t65: i32 = extract_vector_elt t32, Constant:i64<5>
        t66: i32 = and t64, t65
        t72: i32 = extract_vector_elt t29, Constant:i64<6>
      t68: i32 = and t66, t72
      t71: i32 = extract_vector_elt t29, Constant:i64<7>
    t70: i32 = and t68, t71
  t8: ch,glue = CopyToReg t0, Register:i32 $w0, t70
  t29: v8i8 = extract_subvector t21, Constant:i64<0>
    t74: v8i8 = extract_subvector t21, Constant:i64<8>
  t32: v8i8 = and t29, t74
  t9: ch = AArch64ISD::RET_FLAG t8, Register:i32 $w0, t8:1
This revision is now accepted and ready to land.Feb 25 2021, 7:27 AM
sdesmalen added inline comments.Feb 25 2021, 7:44 AM
llvm/include/llvm/CodeGen/ISDOpcodes.h
1049

Either that, or you can just ignore it.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8018

Yes, you're probably right, good catch.

  • rebase on pre-committed unrelated format change
frasercrmck marked 2 inline comments as done.Feb 25 2021, 8:06 AM
frasercrmck edited the summary of this revision. (Show Details)Feb 26 2021, 8:43 AM
This revision was landed with ongoing or failed builds.Mar 1 2021, 2:34 AM
This revision was automatically updated to reflect the committed changes.