This adds sign/zero extending scalar loads/stores to the MVE instructions added in D77813, allowing us to pick up more post-inc situations. These are comparatively simple, compared to LDR/STR (which may be better turned into an LDRD/LDM), but still require some additions over MVE instructions. Because there are i12 and i8 variants of the offset loads/stores dealing with different signs, we may need to convert an i12 address to a i8 negative instruction.
LGTM. Have you tested whether this approach is faster than doing the pre-indexing LSR method? I certainly like the idea of this transform instead of being beholden to the filtering gods.
I think we are still at the behests of LSR's cost modeling I'm afraid. This can just do slightly better at fixing up the results afterwards, it can slightly improve things in case ISel comes up with something unoptimal.