This is an archive of the discontinued LLVM Phabricator instance.

[X86] Don't make 512-bit vectors legal when preferred vector width is 256 bits and 512 bits aren't required
ClosedPublic

Authored by craig.topper on Jan 30 2018, 5:46 PM.

Details

Summary

This patch is a replacement for D41341.

This patch adds a new function attribute "required-vector-width" that can be set by the frontend to indicate the maximum vector width present in the original source code. The idea is that this would be set based on ABI requirements, intrinsics or explicit vector types being used, maybe simd pragmas, etc. The backend will then use this information to determine if its save to make 512-bit vectors illegal when the preference is for 256-bit vectors.

For code that has no vectors in it originally and only get vectors through the loop and slp vectorizers this allows us to generate code largely similar to our AVX2 only output while still enabling AVX512 features like mask registers and gather/scatter. The loop vectorizer doesn't always obey TTI and will create oversized vectors with the expectation the backend will legalize it. In order to avoid changing the vectorizer and potentially harm our AVX2 codegen this patch tries to make the legalizer behavior similar.

This is restricted to CPUs that support AVX512F and AVX512VL so that we have good fallback options to use 128 and 256-bit vectors and still get masking.

I've qualified every place I could find in X86ISelLowering.cpp and added tests cases for many of them with 2 different values for the attribute to see the codegen differences.

We still need to do frontend work for the attribute and teach the inliner how to merge it, etc. But this gets the codegen layer ready for it.

Diff Detail

Event Timeline

craig.topper created this revision.Jan 30 2018, 5:46 PM

Add sad test cases as well.

What happens if 512-bit intrinsics are used? Do we have a decent error message?

lib/Target/X86/X86ISelLowering.cpp
1363

What happens with these? If we've disabled 512-bit vectors what is expected to happen?

For intrinsics, this patch relies on a function attribute "required-vector-width" that is not created anywhere today. We're trusting the accuracy of the attribute when it is is present and if its not present we'll assume 512-bit vectors should be allowed. I plan to add support to clang to set this attribute to 512 when any AVX512 intrinsics are used and to a smaller value if only narrower vectors are used. That will prevent the legalizer from disabling 512-bit support when it is needed. Similar will need to be done for any function arguments passed in ZMM registers. Probably for some of the pragmas as well.

lib/Target/X86/X86ISelLowering.cpp
1363

If AVX512F is enabled, we can only disable 512-bit vectors if VLX is also enabled. So KNL will always allow 512-bit vectors. It's not clear how much we could do on KNL without 512-bit vectors. We'd probably have to just fall all the way back to AVX2. The only AVX512 instructions that would work without 512-bit vectors would be things like KOR/KAND/KXOR on mask registers, but you wouldn't be able to use masks in any instructions.

lib/Target/X86/X86Subtarget.h
636

canExtendTo512DQ checks !VLX to always allow widening on KNL CPUs.

RKSimon accepted this revision.Feb 10 2018, 9:07 AM

LGTM as a first step - a few minors to explain the meaning of a lot of this.

lib/Target/X86/X86ISelLowering.cpp
1140

Comments explaining that we only legalize mask registers for hasAVX512, with the vector registers requiring useAVX512Regs

1156

Add tests for these since they touch 512-bit types.

1165

Add tests for these since they touch 512-bit types.

1447

Similar comment for 512-bit bw types and explain why v64i1 mask register isn't enabled with hasBWI

1533

This change looks independent? If so commit it separately.

This revision is now accepted and ready to land.Feb 10 2018, 9:07 AM
This revision was automatically updated to reflect the committed changes.

LGTM for now, but the set of function attributes being passed into the subtarget is getting a little awkward so we should come up with some better way to encompass them.

danilaml added inline comments.
llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
275 ↗(On Diff #133785)

@craig.topper Sorry for commenting on such an old review, but I was investigating some codegen differences for very similar IR and come across this code (the attribute later changed to the min-legal-vector-width but otherwise it's the same on main). Is RequiredVectorWidth intended to be initialized to UINT32_MAX? What is the rationale? It forces maximum vector width if the function is missing the attribute for some reason, ignoring the prefer-* attributes. To me it seems that the conservative approach would be to set it to 0 and increase according to the attribute, since zero length vectors are always legal/"required".

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 22 2023, 8:32 AM
craig.topper added inline comments.Aug 22 2023, 9:42 AM
llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
275 ↗(On Diff #133785)

It is intentionally set to UINT32_MAX if the attribute is missing. If the IR contains any 512 bit inline assembly, function arguments, returns, or X86 specific vector, the backend will crash or violate the ABI. The presence of the attribute indicates that those cases have been checked and nothing requires 512 bit vectors.

danilaml added inline comments.Aug 22 2023, 10:26 AM
llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
275 ↗(On Diff #133785)

Not sure I understand. Without the attribute the backend will ignore prefer-vector-width and generate avx512 asm regardless. How is setting default to 0 worse?

craig.topper added inline comments.Aug 22 2023, 10:37 AM
llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
275 ↗(On Diff #133785)

If there are 512-bit x86 intrinsics in the IR it will crash the compiler. I assume compiler crashes are worse than suboptimal code.

Prefer vector width is still checked in many other places to prevent aggressive use of 512 bit vectors. For example, X86TTIImpl::getRegisterBitWidth will still tell the vectorizer that the register width is 256 bits.

Are you finding the attribute missing in code compiled with clang or another frontend?

danilaml added inline comments.Aug 22 2023, 11:37 AM
llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
275 ↗(On Diff #133785)

Doesn't seem to crash (although I haven't tried inline assembly): https://llvm.godbolt.org/z/h6jEYrnaa

In my case it's another frontend (JIT compiler). I've noticed that compiler would generate suboptimal code using avx512, even though the target cpu has prefer256 tuning and found that the issue is missing attribute (I also noticed that target knows that expanding a certain intrinsic using av512 is more costly than using avx2, but still uses the highest ISA available, but that's another issue entirely). Now I'm wondering what to set it too.
Also, stuff like SLPVectorizer doesn't really care about getRegisterBitWidth since it usually just checks wether some operation/type is legal or not and about the cost returned by the target hooks.

craig.topper added inline comments.Aug 22 2023, 12:15 PM
llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
275 ↗(On Diff #133785)

It crashes if prefer-vector-width<=256 is also specified or a CPU that implies prefer vector width <=256 is used https://llvm.godbolt.org/z/G5458499K

danilaml added inline comments.Aug 22 2023, 12:30 PM
llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
275 ↗(On Diff #133785)

I see. This is counterintuitive. It appers that min-legal-vector-width is actually sort of max, but not really. So what is the intended usage by some non-C backend? What should it be set to to allow both avx512 intrinsics/inline asm when explicitely requested AND to keep the "prefer 256" semantics in most other palces? Should we mark every "regular" function (that doesn't use avx512 intrinsics or inline asm) with min-legal-vector-width=0 and that do - with =512 (we won't be passing/returning avx512 types, so ABI is not a concer AFAIU)?

craig.topper added inline comments.Aug 22 2023, 12:39 PM
llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
275 ↗(On Diff #133785)

I unfortunately named it from the perspective of the X86 backend where it is the minimum vector width that the backend must make a legal type to prevent crashes.

The clang frontend calculates it by taking the maximum value from all intrinsics, inline assembly, and function arguments/returns.

Setting it to 0 for functions without avx512 intrinsics or inline assembly should be fine today.