This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Reduce LMUL for vector extracts
ClosedPublic

Authored by reames on Aug 21 2023, 2:52 PM.

Details

Summary

If we have a known (or bounded) index which definitely fits in a smaller LMUL register group size, we can reduce the LMUL of the slide and extract instructions. This loosens constraints on register allocation, and allows the hardware to do less work, at the potential cost of some additional VTYPE toggles. In practice, we appear (after prior patches) to do a decent job of eliminating the additional VTYPE toggles in most cases.

A couple of side notes:

  1. I stopped at m1 here. For machines with a DLEN < VLEN, we should probably be doing mf2, but we need to make that change a bit more globally as well.
  2. Arguably, we should be narrowing the LMUL of *most* operations which are provably don't care in their input and outputs. We've got a few selected cases, but maybe it's time to generalize something more general? (Definitely future work!)

Diff Detail

Event Timeline

reames created this revision.Aug 21 2023, 2:52 PM
reames requested review of this revision.Aug 21 2023, 2:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 2:52 PM
This revision is now accepted and ready to land.Aug 21 2023, 11:51 PM
luke accepted this revision.Aug 22 2023, 3:21 AM

LGTM

This revision was landed with ongoing or failed builds.Aug 22 2023, 7:36 AM
This revision was automatically updated to reflect the committed changes.