xorl + setcc is generally the preferred sequence due to the partial register stall setcc + movzbl suffers from. It also encodes one byte smaller. llvm.org/PR28146 and the other associated PRs have more details on this.
Unfortunately, this can not be handled in DAG ISel, because of how X86 SETCC is modeled. And changing the way SETCC is modeled does not seem too attractive. So, we need to clean this up post-ISel. Dave Kreitzer suggested using pseudos to represent a zexted setcc, which is cleaner in some sense, but dirtier in others, and on the balance, I think I prefer this patch. Dave, if you (or anyone else) feel strongly that we should be using pseudos - or some other solution - instead, we can continue the discussion here.
Note that this is not a win in 100% of the cases - for example, some pcmpstri test cases below get pessimized. This happens because the register allocator is over-constrained due to the extended value being forced into %eax. So we should not see this sort of thing inside hot loops. Suggestions on how to resolve this are welcome, although I don't believe that should block this patch.
Go ahead and reorder separately?