Based off a discussion on D89281 - where the aarch64 implementations were being replaced to use funnel shifts.
Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.
I've generalized the x86 implementation and moved it to TargetLowering - so far I've found that aarch64 and amdgpu benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).
Nit: it's not just that we "can't rely on the results of FSHL/FSHR"; both halves of the result have to be calculated differently for large shift amounts.