As-is, this conflicts with D78728, but I'm posting because we get some test diffs from using shouldSinkOperands() instead of the custom CGP code that was in optimizeShuffleVectorInst(). Assuming D78728 gets pushed first, I'll update the CGP part of this patch.
The last codegen/IR test diff shows what I suspected could happen - we were sinking all splat shift operands into a loop. But that's not what we want in general; we only want to sink the *shift amount* operand if it is a splat.