We canonicalize patterns like:
%s = lshr i32 %a0, 1 %t = trunc i32 %s to i1
to:
%a = and i32 %a0, 2 %c = icmp ne i32 %a, 0
...in IR, but the bit-shifting original sequence may be better for x86 vector codegen.
I tried several variants of the transform, and it's tricky to not induce regressions. In particular, I did not find a way to cleanly handle non-splat constants, so I've left that as a TODO item here (negative tests for those are included here). AVX512 resulted in some diffs, but didn't look meaningful, so I left that out too. Some of the 256-bit AVX1 diffs are questionable, but close enough that it's probably not meaningful.