This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add support loads, stores, and splats of vXi1 fixed vectors.
ClosedPublic

Authored by craig.topper on Feb 10 2021, 4:32 PM.

Details

Summary

This refines how we determine which masks types are legal and adds
support for loads, stores, and all ones/zeros splats.

I left a fixme in store handling where I think we need to zero
extra bits if the type isn't a multiple of a byte. If I remember
right from X86 there was some case we could have a store of a
1, 2, or 4 bit mask and have a scalar zextload that then expected the
bits to be 0. Its tricky to zero the bits with RVV. We need to do
something like round VL up, zero a register, lower the VL back down,
then do a tail undisturbed move into the zero register. Another
option might be to generate a mask of 1/2/4 bits set with a VL of 8
and use that to mask off the bits.

Diff Detail

Event Timeline

craig.topper created this revision.Feb 10 2021, 4:32 PM
craig.topper requested review of this revision.Feb 10 2021, 4:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2021, 4:32 PM
Herald added a subscriber: MaskRay. · View Herald Transcript
frasercrmck accepted this revision.Feb 11 2021, 8:08 AM

LGTM. I'm a little surprised by this possible requirement to zero the bytes though, unless I'm not understanding the exact conditions. Are you able to find a testcase?

This revision is now accepted and ready to land.Feb 11 2021, 8:08 AM
This revision was landed with ongoing or failed builds.Feb 11 2021, 9:17 AM
This revision was automatically updated to reflect the committed changes.

LGTM. I'm a little surprised by this possible requirement to zero the bytes though, unless I'm not understanding the exact conditions. Are you able to find a testcase?

define zeroext i2 @seteq_vv_v16i8(<2 x i8>* %x, <2 x i8>* %y) {
  %a = load <2 x i8>, <2 x i8>* %x
  %b = load <2 x i8>, <2 x i8>* %y                                                          
  %c = icmp eq <2 x i8> %a, %b
  %d = bitcast <2 x i1> %c to i2
  ret i2 %d
}

becomes this after type legalization. The bitcast is turned into a store to stack and a load.

SelectionDAG has 18 nodes:
  t0: ch = EntryToken
              t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t7: v2i8,ch = load<(load 2 from %ir.x)> t0, t2, undef:i64
              t4: i64,ch = CopyFromReg t0, Register:i64 %1                                                            
            t8: v2i8,ch = load<(load 2 from %ir.y)> t0, t4, undef:i64                                                                                 
          t10: v2i1 = setcc t7, t8, seteq:ch
        t17: ch = store<(store 1 into %stack.0, align 2)> t0, t10, FrameIndex:i64<0>, undef:i64
      t22: i64,ch = load<(load 1 from %stack.0, align 2), anyext from i2> t17, FrameIndex:i64<0>, undef:i64
    t21: i64 = and t22, Constant:i64<3>
  t14: ch,glue = CopyToReg t0, Register:i64 $x10, t21
  t15: ch = RISCVISD::RET_FLAG t14, Register:i64 $x10, t14:1

After DAG combine the anyext load becomes a zextload and the t21 'and' is removed

SelectionDAG has 16 nodes:                                                                                                                                                                                                                                                           
  t0: ch = EntryToken                                                                                                                                                                                                                                                                
            t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t7: v2i8,ch = load<(load 2 from %ir.x)> t0, t2, undef:i64
            t4: i64,ch = CopyFromReg t0, Register:i64 %1
          t8: v2i8,ch = load<(load 2 from %ir.y)> t0, t4, undef:i64
        t10: v2i1 = setcc t7, t8, seteq:ch
      t17: ch = store<(store 1 into %stack.0, align 2)> t0, t10, FrameIndex:i64<0>, undef:i64
    t23: i64,ch = load<(load 1 from %stack.0, align 2), zext from i2> t17, FrameIndex:i64<0>, undef:i64
  t14: ch,glue = CopyToReg t0, Register:i64 $x10, t23
  t15: ch = RISCVISD::RET_FLAG t14, Register:i64 $x10, t14:1

After op legalization the zext from i2 load becomes zext from i8, but no additional code was added to put zeros in the other 6 bits. It's just assumed they are zero.

SelectionDAG has 20 nodes:                                                                                                                                                                                                                                                           
  t0: ch = EntryToken
              t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t40: nxv8i8,ch = RISCVISD::VLE_VL<(load 2 from %ir.x)> t0, t2, Constant:i64<2>
              t4: i64,ch = CopyFromReg t0, Register:i64 %1
            t37: nxv8i8,ch = RISCVISD::VLE_VL<(load 2 from %ir.y)> t0, t4, Constant:i64<2>                                                   
            t29: nxv8i1 = RISCVISD::VMSET_VL Constant:i64<2>
          t30: nxv8i1 = RISCVISD::SETCC_VL t40, t37, seteq:ch, t29, Constant:i64<2>
        t36: ch = RISCVISD::VSE_VL<(store 1 into %stack.0, align 2)> t0, t30, FrameIndex:i64<0>, Constant:i64<2>
      t32: i64,ch = load<(load 1 from %stack.0, align 2), zext from i8> t36, FrameIndex:i64<0>, undef:i64
    t34: i64 = AssertZext t32, ValueType:ch:i2                                                                                                                                     
  t14: ch,glue = CopyToReg t0, Register:i64 $x10, t34
  t15: ch = RISCVISD::RET_FLAG t14, Register:i64 $x10, t14:1

Legalization of an i2 truncstore does put zeros in those 6 bits which is why zextload has this expectation.