In that case, the cost for i32 and i64 should be 1 (a single EXTR
instruction). For v4i32 and v2i64 it should be 3 (USHR + SHL + ORR).
Integer types smaller than i32 will get legalized to i32 and share its
cost.
This recovers a SLP regression revealed by D140392.
I would just combine fshr into a single patch I think. They seem to be essentially the same.