This patch legalizes the `v256i1` and `v512i1` types that will be used for MMA.
It implements loads and stores of these types.
`v256i1` is a pair of VSX registers, so for this type, we load/store the two underlying registers.
`v512i1` is used for MMA accumulators. So in addition to loading and storing the 4 associated VSX registers, we generate instructions to prime (copy the VSX registers to the accumulator) after loading and unprime (copy the accumulator back to the VSX registers) before storing.
We also add the UACC register class. This class represents accumulator in their unprimed form. This class is necessary to allow the distinction between primed and unprimed accumulators to avoid invalid copies of the VSX registers associated with primed accumulators.
Depends on D84847