This tweaks TTI hook isLegalMaskedLoadr, which has 2 use cases: it is queried
for vector types, which is what the current implementation supports, but the
vectorizer also queries it supplying scalar types. The latter wasn't taken into
account, and scalar types get rejected as legal masked loads because their size
!= 128 bits.
On MVE, with most instructions being VPT Block compatible, you could argue that
everything can be legally masked (modulo the exceptions), but this new
implementation rejects double-word ints and floats, which seems to be
reasonable for now. This seems to improve codegen for some existing cases, and
I will follow this up with loop vectorization tests that also query this.
This seems to be doing a 4xi32 cmp, followed by a 16xi8 load using that predicate, and then sign extending the wrong registers in place. It should be sign extending the first 4 value, not the 0,4,8,12 values. If I'm reading this correctly.
I'm surprised it's doing this exactly though. You may be finding some other bug here.