This is an archive of the discontinued LLVM Phabricator instance.

[X86] Prefer vmovmsk instead of vtest for alderlake.
ClosedPublic

Authored by LuoYuanke on Jun 5 2023, 10:32 PM.

Details

Summary

On alderlake E-core, the latency of VMOVMSKPS is 5 for YMM/XMM. The
latency of VPTESTPS is 7 for YMM and is 5 for XMM. Since alderlake use
the P-core schedule model, we can't determine which one better based on
the latency information of schedule model. Alternatively we add an
tuning feature for alderlake and select VMOVMSKPS with the indication
for the tuning feature. In the case of "vmovmskps + test + jcc", the
test and jcc can be fused, while vtest and jcc can't.

Diff Detail

Event Timeline

LuoYuanke created this revision.Jun 5 2023, 10:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2023, 10:32 PM
LuoYuanke requested review of this revision.Jun 5 2023, 10:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2023, 10:32 PM
LuoYuanke edited the summary of this revision. (Show Details)Jun 5 2023, 11:03 PM
LuoYuanke edited the summary of this revision. (Show Details)
LuoYuanke retitled this revision from [X86] Prefer vtest to vmovmsk for alderlake. to [X86] Prefer vmovmsk instead of vtest for alderlake..Jun 5 2023, 11:15 PM
LuoYuanke updated this revision to Diff 528701.Jun 5 2023, 11:39 PM

Rebase and update test case.

goldstein.w.n added inline comments.
llvm/lib/Target/X86/X86.td
426

Personally think slowvtest is a kind of confusing name b.c vtest also has a perf dropoff from SnB -> HSW.
Maybe "PreferMovmskOverVTest" would be clearer?

LuoYuanke updated this revision to Diff 528714.Jun 6 2023, 12:01 AM

Rebase and update test cases.

LuoYuanke added inline comments.Jun 6 2023, 12:05 AM
llvm/lib/Target/X86/X86.td
426

I just follow the previous naming convention. I'm open to "PreferMovmskOverVTest". @RKSimon and @pengfei, what's your opinion?

pengfei added inline comments.Jun 6 2023, 1:23 AM
llvm/lib/Target/X86/X86.td
425

Should be Is?

426

I think we should start from Tuning, but TuningPreferMovmskOverVTest looks verbose..

LuoYuanke updated this revision to Diff 528743.Jun 6 2023, 1:50 AM

Address Phoebe's comments.

RKSimon added inline comments.Jun 6 2023, 2:35 AM
llvm/lib/Target/X86/X86.td
426

+1 TuningPreferMovmskOverVTest explains the purpose of the tuning flag better

llvm/test/CodeGen/X86/combine-movmsk.ll
6

Add a variant that tests the tuning flag directly
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+slow-vtest | FileCheck %s --check-prefix=ADL

RKSimon added inline comments.Jun 6 2023, 2:38 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47913

What about these folds?

47973

What about these folds?

LuoYuanke updated this revision to Diff 528775.Jun 6 2023, 3:41 AM

Address Simon's comments.

LuoYuanke added inline comments.Jun 6 2023, 3:43 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47913

I'll take a look at it and I prefer to implement it in a separate patch if MOVMSK is better.

47973

Ditto

LuoYuanke added inline comments.Jun 6 2023, 4:16 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47913

It seems this transform is good. If I revert the transform, I get below lit test change.

-; AVX-LABEL: movmsk_or_v2i64:
-; AVX:       # %bb.0:
-; AVX-NEXT:    vpxor %xmm1, %xmm0, %xmm0
-; AVX-NEXT:    vptest %xmm0, %xmm0
-; AVX-NEXT:    setne %al
-; AVX-NEXT:    retq
+; AVX1OR2-LABEL: movmsk_or_v2i64:
+; AVX1OR2:       # %bb.0:
+; AVX1OR2-NEXT:    vpcmpeqq %xmm1, %xmm0, %xmm0
+; AVX1OR2-NEXT:    vpcmpeqd %xmm1, %xmm1, %xmm1
+; AVX1OR2-NEXT:    vtestpd %xmm1, %xmm0
+; AVX1OR2-NEXT:    setae %al
+; AVX1OR2-NEXT:    retq
LuoYuanke added inline comments.Jun 6 2023, 4:41 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47973

It seems there is not lit test failure if I disable this code.

Hi,
Is there anything else that I need to update?

pengfei accepted this revision.Jun 7 2023, 6:39 PM

LGTM. Please wait for one more day for other reviewers.

This revision is now accepted and ready to land.Jun 7 2023, 6:39 PM
goldstein.w.n accepted this revision.Jun 7 2023, 8:22 PM

LGTM as well.

RKSimon accepted this revision.Jun 8 2023, 1:49 AM

LGTM - cheers

This revision was automatically updated to reflect the committed changes.