Page MenuHomePhabricator

[X86] Don't make 512-bit vectors legal when preferred vector width is 256 bits and 512 bits aren't required
ClosedPublic

Authored by craig.topper on Jan 30 2018, 5:46 PM.

Details

Summary

This patch is a replacement for D41341.

This patch adds a new function attribute "required-vector-width" that can be set by the frontend to indicate the maximum vector width present in the original source code. The idea is that this would be set based on ABI requirements, intrinsics or explicit vector types being used, maybe simd pragmas, etc. The backend will then use this information to determine if its save to make 512-bit vectors illegal when the preference is for 256-bit vectors.

For code that has no vectors in it originally and only get vectors through the loop and slp vectorizers this allows us to generate code largely similar to our AVX2 only output while still enabling AVX512 features like mask registers and gather/scatter. The loop vectorizer doesn't always obey TTI and will create oversized vectors with the expectation the backend will legalize it. In order to avoid changing the vectorizer and potentially harm our AVX2 codegen this patch tries to make the legalizer behavior similar.

This is restricted to CPUs that support AVX512F and AVX512VL so that we have good fallback options to use 128 and 256-bit vectors and still get masking.

I've qualified every place I could find in X86ISelLowering.cpp and added tests cases for many of them with 2 different values for the attribute to see the codegen differences.

We still need to do frontend work for the attribute and teach the inliner how to merge it, etc. But this gets the codegen layer ready for it.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Jan 30 2018, 5:46 PM

Add sad test cases as well.

What happens if 512-bit intrinsics are used? Do we have a decent error message?

lib/Target/X86/X86ISelLowering.cpp
1363 ↗(On Diff #132683)

What happens with these? If we've disabled 512-bit vectors what is expected to happen?

For intrinsics, this patch relies on a function attribute "required-vector-width" that is not created anywhere today. We're trusting the accuracy of the attribute when it is is present and if its not present we'll assume 512-bit vectors should be allowed. I plan to add support to clang to set this attribute to 512 when any AVX512 intrinsics are used and to a smaller value if only narrower vectors are used. That will prevent the legalizer from disabling 512-bit support when it is needed. Similar will need to be done for any function arguments passed in ZMM registers. Probably for some of the pragmas as well.

lib/Target/X86/X86ISelLowering.cpp
1363 ↗(On Diff #132683)

If AVX512F is enabled, we can only disable 512-bit vectors if VLX is also enabled. So KNL will always allow 512-bit vectors. It's not clear how much we could do on KNL without 512-bit vectors. We'd probably have to just fall all the way back to AVX2. The only AVX512 instructions that would work without 512-bit vectors would be things like KOR/KAND/KXOR on mask registers, but you wouldn't be able to use masks in any instructions.

lib/Target/X86/X86Subtarget.h
636 ↗(On Diff #132683)

canExtendTo512DQ checks !VLX to always allow widening on KNL CPUs.

RKSimon accepted this revision.Feb 10 2018, 9:07 AM

LGTM as a first step - a few minors to explain the meaning of a lot of this.

lib/Target/X86/X86ISelLowering.cpp
1140 ↗(On Diff #132683)

Comments explaining that we only legalize mask registers for hasAVX512, with the vector registers requiring useAVX512Regs

1156 ↗(On Diff #132683)

Add tests for these since they touch 512-bit types.

1165 ↗(On Diff #132683)

Add tests for these since they touch 512-bit types.

1447 ↗(On Diff #132683)

Similar comment for 512-bit bw types and explain why v64i1 mask register isn't enabled with hasBWI

1533 ↗(On Diff #132683)

This change looks independent? If so commit it separately.

This revision is now accepted and ready to land.Feb 10 2018, 9:07 AM
This revision was automatically updated to reflect the committed changes.

LGTM for now, but the set of function attributes being passed into the subtarget is getting a little awkward so we should come up with some better way to encompass them.