Fixes PR23464: one way to use the broadcast intrinsics is something like:
_mm256_broadcastw_epi16(_mm_cvtsi32_si128(*(int*)src));
We don't currently fold these, but we can look through one bitcast to find the broadcast scalar.
Note that I added a file specifically for these tests. I can add these to the various vector-shuffle-* files; I don't have an argument/preference for either.