This is an archive of the discontinued LLVM Phabricator instance.

[X86] Prefer `vpternlog` instead of `blendv` for `vselect` on masks.
ClosedPublic

Authored by goldstein.w.n on Mar 2 2023, 11:43 PM.

Details

Summary

This rarely comes up because most vselect are lowered with actually
avx512 mask instructions, but is an improvement in the rare cases.

Diff Detail

Event Timeline

goldstein.w.n created this revision.Mar 2 2023, 11:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 11:43 PM
goldstein.w.n requested review of this revision.Mar 2 2023, 11:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 11:43 PM
craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
1036–1037

Update comment

craig.topper added inline comments.Mar 2 2023, 11:50 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
1047

Why is the PCMPGT check needed?

pengfei added inline comments.Mar 3 2023, 1:00 AM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
1046

It's enough to check single hasVLX.

RKSimon added inline comments.Mar 3 2023, 2:46 AM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
1047

Wouldn't a numsignbits check be better than PCMPGT?

goldstein.w.n added inline comments.Mar 3 2023, 9:12 AM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
1047

Why is the PCMPGT check needed?

blendv only uses the sign-bit of the control. For vpternlog to be a replacement, the control needs to be in mask form (elements either -1/0). pcmpgt is just a check for "is the control in mask form". Maybe (sext (setcc ...)) would also work but didn't see any codegen changes from it.

1047

What is numsignbits check?

craig.topper added inline comments.Mar 3 2023, 9:29 AM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
1047

CurDAG->computeNumSignBits or CurDAG->ComputeMaxSignificantBits

1047

I think an ISD::VSELECT is technically malformed if the elements of the condition aren't 0/-1, but we don't have a good way of proving we didn't mess that up.

goldstein.w.n marked 3 inline comments as done.Mar 3 2023, 2:02 PM
goldstein.w.n added inline comments.
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
1047

I think an ISD::VSELECT is technically malformed if the elements of the condition aren't 0/-1, but we don't have a good way of proving we didn't mess that up.

Ill use CurDAG->computeNumSignBits, if we are being correct that should prove it.

Use ComputeNumSignBits instead of PCMPGT check. Fix comment

RKSimon added inline comments.Mar 14 2023, 11:00 AM
llvm/test/CodeGen/X86/vselect-pcmp.ll
37

AVX512F basically means knights landing - and even though you'd have to use the zmm variant - vpternlogq is a LOT faster than vpblendvb on KNL

goldstein.w.n added inline comments.Mar 15 2023, 2:45 PM
llvm/test/CodeGen/X86/vselect-pcmp.ll
37

AVX512F basically means knights landing - and even though you'd have to use the zmm variant - vpternlogq is a LOT faster than vpblendvb on KNL

Could maybe see for ymm->zmm, but xmm->zmm will then req a vzeroupper, will require a stall for core to prepare zmm usage (if no zmm around), and increase license.

Also could potentially be dangerous if its SSE encoding around it.

RKSimon accepted this revision.Mar 28 2023, 5:57 AM

LGTM - cheers

This revision is now accepted and ready to land.Mar 28 2023, 5:57 AM
This revision was landed with ongoing or failed builds.Apr 4 2023, 8:35 AM
This revision was automatically updated to reflect the committed changes.