The previous DAG combiner-based approach had an issue with infinite loops between the target-dependent and target-independent combiner logic (see PR40333). Although this was worked around in rL351806, the combiner-based approach is still potentially brittle and can fail to select the 32-bit shift variant when profitable to do so, as demonstrated in the pr40333.ll test case.
This patch instead introduces target-specific SelectionDAG nodes for SHLW/SRLW/SRAW and custom-lowers variable i32 shifts to them. pr40333.ll is a good example of how this approach can improve codegen.
There are codegen changes in atomic-rmw.ll and atomic-cmpxchg.ll but the new instruction sequences are semantically equivalent.
It likely makes sense to replace the 32-bit sdiv/udiv/srem combining logic in a similar way, but that belongs in a separate patch.
Probably worth explicitly noting whether the shift amount is modulo. If it is, you might want to implement SimplifyDemandedBitsForTargetNode.