Without sse4.2 a v2i64 setlt needs to expand into a pcmpgtd, pcmpeqd, 3 shuffles, and 2 logic ops. But if we're only interested in the sign bit of the i64 elements, we can just use one pcmpgtd and shuffle the odd elements to the even elements.
Details
Diff Detail
Event Timeline
llvm/test/CodeGen/X86/bitcast-vector-bool.ll | ||
---|---|---|
432 | The new code is simple enough that simplify demanded bits was able to get through it. The movmskb only needs the sign bits from its input and packss doesn't alter sign bits so it was able to prove the compare unnecessary. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
21579 | Why? The code below generates 3 regular shuffles just like this. | |
21582 | I think PSRAD xmm, 31 is equivalent to the non-inverted case. I'm not sure which is better XOR+PCMPGT or PSRAD. There were more execution resources available for pcmpgt then psrad on SSE4.1 era CPU like Penryn. But we might not have handled the XOR in 0 cycles. |
Just skip the invert case. I don't think its as likely to occur. Other canonicalizations should have prevented it I think.
emit PSHUFD directly?