This rarely comes up because most vselect are lowered with actually
avx512 mask instructions, but is an improvement in the rare cases.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
1036–1037 | Update comment |
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
1045 | Why is the PCMPGT check needed? |
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
1044 | It's enough to check single hasVLX. |
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
1045 | Wouldn't a numsignbits check be better than PCMPGT? |
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
1045 |
blendv only uses the sign-bit of the control. For vpternlog to be a replacement, the control needs to be in mask form (elements either -1/0). pcmpgt is just a check for "is the control in mask form". Maybe (sext (setcc ...)) would also work but didn't see any codegen changes from it. | |
1045 | What is numsignbits check? |
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
1045 |
Ill use CurDAG->computeNumSignBits, if we are being correct that should prove it. |
llvm/test/CodeGen/X86/vselect-pcmp.ll | ||
---|---|---|
37 | AVX512F basically means knights landing - and even though you'd have to use the zmm variant - vpternlogq is a LOT faster than vpblendvb on KNL |
llvm/test/CodeGen/X86/vselect-pcmp.ll | ||
---|---|---|
37 |
Could maybe see for ymm->zmm, but xmm->zmm will then req a vzeroupper, will require a stall for core to prepare zmm usage (if no zmm around), and increase license. Also could potentially be dangerous if its SSE encoding around it. |
Update comment