Page MenuHomePhabricator

[X86] Enable v16i8/v32i8/v64i8 rotation on AVX512 targets

Authored by RKSimon on Dec 6 2021, 1:12 PM.



We currently rely on generic promotion to vXi16/vXi32 types for rotation lowering on various AVX512 targets.

We can more efficiently perform this by making use of the shl(unpack(x,x),amt) style pattern that we already use for vXi8 rotation by splat amounts, either by widening to a larger vector type or unpacking lo/hi halves of the subvectors so we can access whatever vXi16/vXi32 per-element shifts are supported.

This uncovered an issue in the supportedVectorShiftWithImm/supportedVectorVarShift legality checkers which was using hasAVX512() instead of useAVX512Regs() to detect support for 512-bit vector shifts.

NOTE: I'm actually hoping to eventually reuse this code for shl(unpack(y,x),amt) funnel shift lowering (vXi8 and wider), but initially I just want to ensure we have efficient ISD::ROTL lowering for all targets.

Diff Detail

Event Timeline

RKSimon created this revision.Dec 6 2021, 1:12 PM
RKSimon requested review of this revision.Dec 6 2021, 1:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 6 2021, 1:12 PM
RKSimon updated this revision to Diff 392722.Dec 8 2021, 5:24 AM
RKSimon retitled this revision from [X86] Enable v16i8/v32i8 rotation on AVX512 targets to [X86] Enable v16i8/v32i8/v64i8 rotation on AVX512 targets.

Added v64i8 rotation handling as well

RKSimon updated this revision to Diff 392735.Dec 8 2021, 6:10 AM

Delay 512-bit vector splitting until after constant-splat early-out, which avoids the need for a lot of additional VPTERNLOG combines on non-AVX512VL targets.

LuoYuanke added inline comments.Dec 9 2021, 12:29 AM

I notice in line 29981, it is aext(x). Is it SIGN_EXTEND?

RKSimon added inline comments.Dec 9 2021, 10:39 AM

I did have separate ZERO/ANY_EXTEND nodes at one point but (for pure tidyness) to reuse the ZERO_EXTEND since for the rotation expansion its what the ANY_EXTEND will end up as. I can split it though if you prefer - we'll need to do this if/when this code gets reused for funnel shifts.

LuoYuanke added inline comments.Dec 9 2021, 4:07 PM

No, not necessary to split. I'm just curious on this node. :)

Any more comments?

pengfei added inline comments.Dec 13 2021, 6:29 AM

The NumElts may be 64 or 32 to the maximum, so the WideVT can be illegal one?


If the comments are usable for all if and else conditions, the second half comment in 29884 seems redundant. Or the comments should be moved to the if block?


There's extra space here.


Why do we still widen here rather that use avx instruction like other cases?


Didn't dig much, why the immediate value keep unchanged while zmm1 and zmm0 swapped?

RKSimon added inline comments.Dec 13 2021, 7:22 AM

I'm sorry I don't quite understand - do you mean you'd expect the (much longer) vpblendvb sequence?

pengfei added inline comments.Dec 13 2021, 4:58 PM

Sorry, my mistake. I thought the use of zmm register results from the lack of avx512vl. Now I understand we are using the full 512 bit register. Sorry for the noise.

RKSimon updated this revision to Diff 394194.Dec 14 2021, 2:28 AM

Add explicit 128/256/512-bit checks to the supportedVectorShift* helper checkers

RKSimon marked 3 inline comments as done.Dec 14 2021, 2:28 AM
pengfei accepted this revision.Dec 15 2021, 2:17 AM


This revision is now accepted and ready to land.Dec 15 2021, 2:17 AM
This revision was landed with ongoing or failed builds.Dec 15 2021, 3:31 AM
This revision was automatically updated to reflect the committed changes.