[resending as new patch because I forgot to add llvm-commits initially]
As shown in InstCombine: sext ( zext ( x ) ) -> zext ( x ).
The code in X86ISelLowering that this patch proposes to remove mistakenly implemented the transform as:
sext ( zext ( x ) ) -> sext ( x )
Ie, what should be a zext output was turned into a sext.
I want to believe that the logic in the original patch has some value as an optimization for some other case and it's just not in the right place here. But the test cases from:
don't provide any evidence.
The testcases that I've added here confirm that we (1) don't remove a zext op that is necessary and (2) generate a pmovz instead of punpck if SSE4.1 is available. Although pmovz is 1 byte longer, it allows folding of the load, and so saves 3 bytes overall.
We don't need to call LowerVectorIntExtend() from LowerSIGN_EXTEND_INREG() to get this codegen either - that is already handled in NormalizeVectorShuffle().