This takes advantage of the work that's been done on funnel shifts to make the implementation more readable.
I think we could shave off one more instruction in the funnel shift lowering: if we do the shifts in the opposite order, we can fold the shift by one into the orr. Mostly orthogonal to this patch, though.
The implementation here isn't really target-specific; maybe we should be doing this in target-independent code?