This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Use vslide1up for inserting bottom element into splat vector
AbandonedPublic

Authored by reames on Oct 12 2022, 10:24 AM.

Details

Summary

This patch adds a DAG combine to replace a vmv.s.x into a splat vector with a vslide1up instead. This relies on the fact that we can shift a splat without changing any of the active lanes, and vslide1up has separate source and destination vector registers. This allows vslide1up to be tail agnostic whereas vmv.s.x has to be tail undisturbed. This in turn avoids the need for a vsetvli toggle.

One downside to this conversation is that vslide1up has a restriction that the source and destination vector registers can't overlap. This increases register pressure locally, and particularly at very high LMUL, could force an additional spill for a value live over the vslide1up. I think this is net worthwhile, but I'm curious what others think.

There are several TODOs noted in the patch. I plan on implementing the vmv.s.f and narrower element types in a follow up patch. I don't plan to bother with the wider VL one.

Diff Detail

Event Timeline

reames created this revision.Oct 12 2022, 10:24 AM
reames requested review of this revision.Oct 12 2022, 10:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2022, 10:24 AM
reames added inline comments.Oct 12 2022, 2:14 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9735

Noticed when glancing through other code that I hadn't handled the vmv.v.i case here. Consider that added to the todo list above.

The slide1up may be more expensive than vmv.s.x as LMUL increases. The upper elements will get shifted even though they are all the same. Despite vmv.s.x having an LMUL typed result in SelectionDAG and MachineIR it only reads and writes one LMUL==1 vector register.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9735

There is no VMV_V_I_VL. So doesn't this already handle vmv.v.i?

9736

I think the VT match is guaranteed by the rules we have defined for RISCVISD::VSLIDE1UP_VL.

def SDTRVVSlide1 : SDTypeProfile<1, 5, [                                         
  SDTCisVec<0>, SDTCisSameAs<1, 0>, SDTCisSameAs<2, 0>, SDTCisInt<0>,            
  SDTCisVT<3, XLenVT>, SDTCVecEltisVT<4, i1>, SDTCisSameNumEltsAs<0, 4>,         
  SDTCisVT<5, XLenVT>                                                            
]>;

The slide1up may be more expensive than vmv.s.x as LMUL increases. The upper elements will get shifted even though they are all the same. Despite vmv.s.x having an LMUL typed result in SelectionDAG and MachineIR it only reads and writes one LMUL==1 vector register.

Had not considered this point. Would it be reasonable to restrict this to LMUL1 types?

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9735

Well, empirically no. I see examples of vmv.v.i patterns which would match this conceptually, but aren't being caught by this one. Haven't looked into why yet.

The slide1up may be more expensive than vmv.s.x as LMUL increases. The upper elements will get shifted even though they are all the same. Despite vmv.s.x having an LMUL typed result in SelectionDAG and MachineIR it only reads and writes one LMUL==1 vector register.

Had not considered this point. Would it be reasonable to restrict this to LMUL1 types?

I think that's going to depend on the microarchitecture. If the ALU width is less than VLEN bits, a vslide1up could still require more ALU cycles than vmv.s.x.

The cases that replace a vmv and a vmv.s.x are interesting. On an architecture without renaming or move elimination, the vmv might be considered an ALU op.

reames abandoned this revision.Oct 24 2022, 9:51 AM

Going to set this aside for now. I'd been thinking of this as a broadly applicable canonicalization to address usage of TU, but as Craig has pointed out, this is likely to be much narrower in scope due to micro-architectural differences, and the lmul=1 restriction. Given that, likely impact isn't enough to justify pushing this right now, will return to it once larger items are addressed.