We had special case handling here, but it uses a scalar any_extend for the
promotion then bitcasts to the final type. This won't split up the input data
into multiple promoted elements like we need.
This patch falls back to doing the conversion through memory.
Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll
changes. The changes to vector-half-conversions.ll are fixing a previously
unknown miscompile from this issue.
I wonder if there's anything else which would be reasonable to do on a target-independent basis...
I guess on x86, for a bitcast like bitcast <64 x i1> %1 to <2 x i32>, the optimal lowering is actually involves splitting the operation: you then have two bitcast <32 x i1> %1 to i32, which has an existing efficient custom lowering. Then you use a BUILD_VECTOR to turn the two results into a <2 x i32>. But I'm not sure that generalizes in a useful way.