In the test based on PR46586:
https://bugs.llvm.org/show_bug.cgi?id=46586
...we are inserting 16-bits into the high element of the vector, shuffling it to element 0, and extracting 32-bits. But xmm1 was never initialized, so the top 16-bits of the extract are undef without this patch.
(It seems like we could do better than this by recognizing that we only demand a subsection of the build vector, but I want to make sure we fix the miscompile 1st.)
This path is only used for pre-SSE4.1, and simpler patterns get squashed somewhere along the way, so the test still includes a 'urem' as it did in the original test from the bug report.