This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Convert PTEST to MOVMSK for allsign bits vector results.
ClosedPublic

Authored by RKSimon on May 26 2020, 8:21 AM.

Details

Summary

If we are using PTEST to check 'allsign bits' vector elements we can use MOVMSK to extract the signbits directly and perform the comparison on the scalar value.

For vXi16 cases, as we don't have a MOVMSK for this type, we must mask each signbit out of a PMOVMSKB v2Xi8 result, which folds into the TEST comparison.

If this allows us to remove a vector op (via the SimplifyMultipleUseDemandedBits call) this is consistently faster than a PTEST (https://godbolt.org/z/ziJUst).

I'm investigating whether we ever get regressions without the SimplifyMultipleUseDemandedBits call, even if this means we don't remove a vector op, but that has exposed some other poor codegen issues that I'm still investigating and would have to wait for a later patch.

Suggested on PR42035 to avoid unnecessary ashr(x,bw-1)/pcmpgt(0,x) sign splat patterns feeding into ptest.

Diff Detail

Event Timeline

RKSimon created this revision.May 26 2020, 8:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2020, 8:21 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper added inline comments.May 26 2020, 10:53 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
40178

Do we need getPMOVMSKB here or can we just use plain getNode? The only thing getPMOVMSKB does is handle AVX and BWI splitting right? But if we start from PTEST we should never need to split?

RKSimon marked an inline comment as done.May 26 2020, 11:13 AM
RKSimon added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
40178

AVX1 code can technically get here for v16i16/v32i8 cases depending how good a job SimplifyMultipleUseDemandedBits has managed. VPTEST is one of the rare 256-bit integer instructions that is available on AVX1!

craig.topper accepted this revision.May 27 2020, 12:32 AM

LGTM

llvm/lib/Target/X86/X86ISelLowering.cpp
40178

Ok thanks for the clarification.

This revision is now accepted and ready to land.May 27 2020, 12:32 AM
This revision was automatically updated to reflect the committed changes.