Try to combine N short vector cast ops into 1 wide vector cast op:
concat (cast X), (cast Y)... -> cast (concat X, Y...)
This is part of solving PR45794:
https://bugs.llvm.org/show_bug.cgi?id=45794
As noted in the code comment, this is uglier than I was hoping because the opcode determines whether we pass the source or destination type to isOperationLegalOrCustom(). Also IIUC, there's no way to validate what the other (dest or src) type is. Without the extra legality check on that, there's an ARM regression test in test/CodeGen/ARM/isel-v8i32-crash.ll that will crash trying to lower an unsupported v8f32 to v8i16.
Do we do the best thing if the shl is used by another operation that needs to be split? Do we keep the vcvttps2dq split?