This is a follow up to patch discussion on D139656. As noted there, M2/M4/M8 versions of these instructions don't actually exist, and using them results in overly constrained register allocation.
In that review, we'd talked about moving towards a variant of the instructions which ignored LMUL. I decided to see what happened if we just stopped generating the high LMUL variants, and the results are surprisingly neutral. I only see one minor thing which looks like a real regression among all the churn. I think this is worth doing now to loosen register allocation constraints, and avoid digging our hole around these instructions deeper while thinking about the right model change.
Hoist XLenVT above the if to share?