On alderlake E-core, the latency of VMOVMSKPS is 5 for YMM/XMM. The
latency of VPTESTPS is 7 for YMM and is 5 for XMM. Since alderlake use
the P-core schedule model, we can't determine which one better based on
the latency information of schedule model. Alternatively we add an
tuning feature for alderlake and select VMOVMSKPS with the indication
for the tuning feature. In the case of "vmovmskps + test + jcc", the
test and jcc can be fused, while vtest and jcc can't.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/X86/X86.td | ||
---|---|---|
426 | Personally think slowvtest is a kind of confusing name b.c vtest also has a perf dropoff from SnB -> HSW. |
llvm/lib/Target/X86/X86.td | ||
---|---|---|
426 | +1 TuningPreferMovmskOverVTest explains the purpose of the tuning flag better | |
llvm/test/CodeGen/X86/combine-movmsk.ll | ||
6 | Add a variant that tests the tuning flag directly |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47913 | It seems this transform is good. If I revert the transform, I get below lit test change. -; AVX-LABEL: movmsk_or_v2i64: -; AVX: # %bb.0: -; AVX-NEXT: vpxor %xmm1, %xmm0, %xmm0 -; AVX-NEXT: vptest %xmm0, %xmm0 -; AVX-NEXT: setne %al -; AVX-NEXT: retq +; AVX1OR2-LABEL: movmsk_or_v2i64: +; AVX1OR2: # %bb.0: +; AVX1OR2-NEXT: vpcmpeqq %xmm1, %xmm0, %xmm0 +; AVX1OR2-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1 +; AVX1OR2-NEXT: vtestpd %xmm1, %xmm0 +; AVX1OR2-NEXT: setae %al +; AVX1OR2-NEXT: retq |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47973 | It seems there is not lit test failure if I disable this code. |
Should be Is?