Based off a discussion on D89281 - where the aarch64 implementations were being replaced to use funnel shifts.
Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.
I've generalized the x86 implementation and moved it to TargetLowering - so far I've found that aarch64 and amdgpu benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).
clang-tidy: warning: invalid case style for variable 'dl' [readability-identifier-naming]
not useful