Under the softfp calling convention, we are often left with VMOVRRD(extract(bitcast(build_vector(a, b, c, d)))) for the return value of the function. These can be simplified to a,b or c,d directly, depending on the value of the extract.
Big endian is a little different because the bitcast switches the lanes around, meaning we end up with b,a or d,c.
Do we need a BE run here too?