This is based on the example/discussion in PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
Proper vector shift instructions don't appear until AVX2, so we may generate several extra instructions within a loop trying to compensate for that. It's difficult to recover from that expansion later than this, so use the existing TLI hook and splat analysis to enable better codegen.