This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes.
e.g.
%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)
In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) | 15) - giving a zero result.
In addition, this review adds support for the SSE2/AVX2 ashr shift-by-constant and also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821). I can commit these changes separately if necessary.
I would probably be more specific and explicitly quote the architecture manual which says: "only the first 64-bits of a 128-bit count operand are checked to compute the count".
But it is up to you, That comment is probably already good enough :-).