SPE passes doubles the same as soft-float, in register pairs as i32
types. This is all handled by the target-independent layer. However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.
For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:
evstdd 5, X(1) lwz 3, X(1) lwz 4, X+4(1)
Likewise, to form a double into r5 from args in r3 and r4:
stw 3, X(1) stw 4, X+4(1) evldd 5, X(1)
This optimizes the fence to use SPE instructions. Now, to pass a double
to a function:
mr 4, 5 evmergehi 3, 5, 5
And to form a double into r5 from args in r3 and r4:
evmergelo 5, 3, 4
This is comparable to the way that gcc generates the double splits.
This also fixes expanding of builtins to libcalls, where the LowerCallTo() code path was generating intermediate illegal type nodes.
Why not just have EXTRACT_SPE_OP and have it take a constant operand that determines Hi/Lo? Also, for both build and extract, it would be good to add a comment that these correspond almost exactly to BUILD_PAIR and EXTRACT_VECTOR_ELT nodes except that the input types are floating point since i64 isn't a legal type for the target.