This is consistent with the RVV intrinsic patterns. This has been
shown to prevent some "ran out of registers" errors in our internal
testing.
Unfortunately, there are some regressions on LMUL=8 tests in here.
I think the lack of registers with LMUL=8 just makes it very hard
to schedule correctly.