The truncated rotate-by-variable patterns elude all of the existing transforms because of multiple uses and knowledge about demanded bits and knownbits that doesn't exist without the whole pattern. So we need an unfortunately large pattern match. But by simplifying this pattern in IR, the backend is already able to generate rolb/rolw/rorb/rorw for x86 using its existing rotate matching logic. Note that rotate-by-constant doesn't have this problem - smaller folds should already produce the narrow IR ops.
For the motivating cases from the bug report, in addition to using narrow ops, we have a net win of two less instructions (kill 3 zext/trunc but add a mask op). I initially forgot that we need that mask, but Alive confirms that it would be wrong to leave the mask of the opposite shift amount off: