This is an archive of the discontinued LLVM Phabricator instance.

[X86][RFC] Add new option `-m[no-]evex512` to disable ZMM and 64-bit mask instructions for AVX512 features
ClosedPublic

Authored by pengfei on Aug 30 2023, 11:50 PM.

Details

Summary

This is an alternative of D157485 and a pre-feature to support AVX10.

AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267
AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343
RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661

Based on the feedbacks from LLVM and GCC community, we have agreed to
start from supporting -m[no-]evex512 on existing AVX512 features.
The option -mno-evex512 can be used with -mavx512xxx to build
binaries that can run on both legacy AVX512 targets and AVX10-256.

There're still arguments about what's the expected behavior when this
option as well as -mavx512xxx used together with -mavx10.1-256. We
decided to defer the support of -mavx10.1 after we made consensus.
Or furthermore, we start from supporting AVX10.2 and not providing any
AVX10.1 options.

Diff Detail

Event Timeline

pengfei created this revision.Aug 30 2023, 11:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2023, 11:50 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
pengfei requested review of this revision.Aug 30 2023, 11:50 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 30 2023, 11:50 PM
Matt added a subscriber: Matt.Aug 31 2023, 9:07 AM

Would it be possible to add function multiversioning tests to ensure the evex512 attribute would work with it?

clang/lib/CodeGen/Targets/X86.cpp
1517

typo in Callee256?

clang/test/CodeGen/X86/avx512-error.c
10

add __mmask64 test ? _knot_mask64 or _cvtmask64_u64 maybe?

pengfei updated this revision to Diff 555283.Sep 1 2023, 12:21 AM

Address review comments.

Would it be possible to add function multiversioning tests to ensure the evex512 attribute would work with it?

Function multiversioning is orthogonal to evex512 feature.

When user uses -mno-evex512 in command line, all the code generation, including function multiversioning are limited to 256-bit vector and 32-bit mask.

User is not suggested to use avx512xxx,evex512 in function attributes for function multiversioning, because EVEX512 is a software concept and the dispatcher cannot distinguish between avx512xxx and avx512xxx,evex512.

clang/lib/CodeGen/Targets/X86.cpp
1517

Good catch!

clang/test/CodeGen/X86/avx512-error.c
10

Good point! This exposed a design problem. We cannot only check for 512-bit vector, instead, we need to add evex512 to all ZMM or 64-bit mask builtin/intrinsic attribute list.

Would it be possible to add function multiversioning tests to ensure the evex512 attribute would work with it?

Function multiversioning is orthogonal to evex512 feature.

When user uses -mno-evex512 in command line, all the code generation, including function multiversioning are limited to 256-bit vector and 32-bit mask.

User is not suggested to use avx512xxx,evex512 in function attributes for function multiversioning, because EVEX512 is a software concept and the dispatcher cannot distinguish between avx512xxx and avx512xxx,evex512.

If the dispatcher is updated to take into account AVX10.1 CPUID, it could distinguish the different hardware support.

That is:

  • to check for AVX512xxx with evex512 enabled, the dispatcher need only check for the AVX512xxx CPUID bit, since according to the doc, a CPU which implements AVX10.1 with 512-bit register size will also set the corresponding AVX512 CPUID bits. No change there.
  • to check for AVX512xxx with evex512 disabled, the dispatcher function should check that either CPUID reports the AVX512xxx bit OR that the CPUID reports AVX10.1 with support for at least 256-bit register size. (But only for the 'AVX512xxx' features which are actually included in AVX10.1, of course).

Would it be possible to add function multiversioning tests to ensure the evex512 attribute would work with it?

Function multiversioning is orthogonal to evex512 feature.

When user uses -mno-evex512 in command line, all the code generation, including function multiversioning are limited to 256-bit vector and 32-bit mask.

User is not suggested to use avx512xxx,evex512 in function attributes for function multiversioning, because EVEX512 is a software concept and the dispatcher cannot distinguish between avx512xxx and avx512xxx,evex512.

If the dispatcher is updated to take into account AVX10.1 CPUID, it could distinguish the different hardware support.

That is:

  • to check for AVX512xxx with evex512 enabled, the dispatcher need only check for the AVX512xxx CPUID bit, since according to the doc, a CPU which implements AVX10.1 with 512-bit register size will also set the corresponding AVX512 CPUID bits. No change there.
  • to check for AVX512xxx with evex512 disabled, the dispatcher function should check that either CPUID reports the AVX512xxx bit OR that the CPUID reports AVX10.1 with support for at least 256-bit register size. (But only for the 'AVX512xxx' features which are actually included in AVX10.1, of course).

Let's not to mix evex512 with AVX10.1 512-bit register size enumeration bit. EVEX512 is intended for AVX512xxx only. It's not supposed to use for AVX10. And it conflicts with the functionality of the AVX10.1 bit in same way.
For example, to maintain backward compatibility, EVEX512 is designed to be a default by on feature. That says, if users don't disable EVEX512 in the command line explicitly, and use avx512xxx only in function attributes, compiler will attach a evex512 implicitly. If we map evex512 to the AVX10.1 bit, the function will never be dispatched on prior-AVX10 targets.

skan added inline comments.Sep 3 2023, 10:48 PM
llvm/lib/Target/X86/X86Subtarget.cpp
275

Missing f?

skan added inline comments.Sep 3 2023, 10:53 PM
llvm/lib/Target/X86/X86Subtarget.cpp
271

It seems the change in X86.cpp is redundant?

pengfei added inline comments.Sep 4 2023, 12:10 AM
llvm/lib/Target/X86/X86Subtarget.cpp
271

It's not. We need FeatureEVEX512 because it's independent of FeatureAVX512. We will have future AVX10-256 targets that have FeatureAVX512 only.
Here we handle old IR that don't set evex512 in function attributes.

275

No, it's intentional. Sometimes, feature attributes may not have a full set of AVX512 features. If user only use e.g., "avx512bw", we should make sure "evex512" attached too.

skan added inline comments.Sep 6 2023, 7:07 PM
llvm/lib/Target/X86/X86Subtarget.cpp
277

Well. It's a very tricky implementation, but I can find out a better way.

skan accepted this revision.Sep 6 2023, 7:07 PM

LGTM

This revision is now accepted and ready to land.Sep 6 2023, 7:07 PM
RKSimon accepted this revision.Sep 7 2023, 4:14 AM

LGTM with a couple of minors

clang/lib/CodeGen/Targets/X86.cpp
1493

Remove Feature argument and hardcode to "avx" now that it only has 1 (avx) caller?

llvm/lib/Target/X86/X86Subtarget.cpp
275

Please add a comment as it looks like a typo.

This revision was landed with ongoing or failed builds.Sep 7 2023, 6:38 AM
This revision was automatically updated to reflect the committed changes.
pengfei marked an inline comment as done.
pengfei added inline comments.Sep 7 2023, 6:38 AM
clang/lib/CodeGen/Targets/X86.cpp
1493

We still have call to "avx512" at line 1525.
We have strict rule for AVX512-256, but should not change the legacy check.

pengfei reopened this revision.Sep 7 2023, 6:59 AM
This revision is now accepted and ready to land.Sep 7 2023, 6:59 AM
pengfei added inline comments.Sep 10 2023, 7:18 PM
llvm/lib/Target/X86/X86Subtarget.cpp
271

It has affects to AVX10-256 targets too, we need to restrict it to default CPU, see https://github.com/llvm/llvm-project/pull/65920