Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this.
I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code. it means that SimplifyX86immshift now (re)decodes the type of shift.
Could this code also be replaced by a call to SimplifyDemandedVectorElts(), to remove duplication? Or does it do something smarter?