This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases.
ClosedPublic

Authored by craig.topper on Jul 22 2021, 2:51 PM.

Details

Summary

For unit-stride and strided load/stores we set the SEW operand of
the pseudo instruction equal the EEW in the opcode. The LMUL
of the pseudo instruction is the LMUL we want.

These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use
this to avoid changing vtype if the SEW/LMUL of the previous
vtype matches the EEW/EMUL ratio we need for the instruction.

Due to how the global analysis works, we can only do this
optimization when the previous vsetvli was produced in the block
containing the store. We need to know in the first phase if the
vsetvli will be inserted so we can propagate information to
the successors in the second phase correctly. This means we can't
depend on predecessors.

Diff Detail

Event Timeline

craig.topper created this revision.Jul 22 2021, 2:51 PM
craig.topper requested review of this revision.Jul 22 2021, 2:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2021, 2:51 PM
Herald added a subscriber: MaskRay. · View Herald Transcript

I'm curious: why can't we apply a similar approach to loads as well? Don't they compute the EEW and EMUL in a similar way?

Also I think this could be applied to non-unit stride. I understand indexed memory accesses are the odd ones here.

vsetivli zero, 2, e32, mf2, ta, mu
vle32.v v25, (a0)
vfwcvt.rtz.xu.f.v v26, v25
vsetvli zero, zero, e64, m1, ta, mu
# the previous vsetvli can be removed because
# a vle64 under SEW=32 and LMUL=1/2
# will be executed as
# EEW=64
# EMUL=(EEW/SEW) * LMUL=(64/32)*(1/2)=1
vle64.v v26, (a1)

-Add loads
-Add strided loads/stores
-Add missing check for AVL being the same.

craig.topper retitled this revision from [RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit strided stores in some cases. to [RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases..Jul 28 2021, 9:54 AM
craig.topper edited the summary of this revision. (Show Details)

I'm curious: why can't we apply a similar approach to loads as well? Don't they compute the EEW and EMUL in a similar way?

I initially didn't do it because I figured in most cases you'd still end up with a vsetvli after the load anyway. But there do seem to be some improvements in the test changes so I've added it.

Also I think this could be applied to non-unit stride. I understand indexed memory accesses are the odd ones here.

You're right. I've added those now. I think we could also do segment load/stores, but I think the switch statement would become even more ridiculous and we should move to TSFlags. So I'd like to look at that as a follow up.

vsetivli zero, 2, e32, mf2, ta, mu
vle32.v v25, (a0)
vfwcvt.rtz.xu.f.v v26, v25
vsetvli zero, zero, e64, m1, ta, mu
# the previous vsetvli can be removed because
# a vle64 under SEW=32 and LMUL=1/2
# will be executed as
# EEW=64
# EMUL=(EEW/SEW) * LMUL=(64/32)*(1/2)=1
vle64.v v26, (a1)
rogfer01 accepted this revision.Aug 11 2021, 11:15 PM

Sorry for the delay. LGTM.

You're right. I've added those now. I think we could also do segment load/stores, but I think the switch statement would become even more ridiculous and we should move to TSFlags. So I'd like to look at that as a follow up.

Sure, thanks!

This revision is now accepted and ready to land.Aug 11 2021, 11:15 PM
rogfer01 added inline comments.Aug 11 2021, 11:18 PM
llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
752

I think this comment block may need updating storeload/store.

908

Ditto.