The motivating case for this is a long way from here:
...but I think this is where we have to start.
We need to canonicalize/optimize sequences of shift and logic to ease pattern matching for things like bswap and improve perf in general. But without the artificial limit of '!LegalTypes' (early combining), there are a lot of test diffs, and not all are good.
In the minimal tests added for this proposal, x86 should have better throughput in all cases. AArch64 is neutral because it can fold shifts into bitwise logic ops.