This adds support for generating VHADDs (vector half add) and VHSUBs (vector half sub) from a VSHR (vector shift right) and VADD/VSUB (vector add/sub).
This is done from auto vectorising a loop with a pattern like C[i] = (A[i] + B[i])/2.
The instruction supports both signed and unsigned ints.
I don't think you will need the casts, if you just use the instruction names directly.