We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for ext-loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting+extending so that combineToExtendBoolVectorInReg can then use it.
What's curious is how much we're using MOVD (VMOVDI2PDIrm) scalar_to_vector loads directly for smaller (<i32) memory sources - I don't THINK this is something I've introduced but its something that looks potentially concerning - @craig.topper any ideas?