Since:
r246981 AVX-512: Lowering for 512-bit vector shuffles.
VPERMV is recognized in getTargetShuffleMask.
This breaks assumptions in most callers, as they expect N->getOperand(0) to be (one of) the vector operand(s). It isn't, as VPERMV has the mask as operand #0 (I can't think of another shuffle-like instruction that works the same).
In the added testcase, this leads the funny-looking:
vmovdqa .LCPI0_0(%rip), %ymm0 # ymm0 = [0,1,2,3,4,5,6,4] vpshufb .LCPI0_1(%rip), %ymm0, %ymm0 # ymm0 = ymm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,16,17,18,18]
In my original testcase (s/i32 4>)/i32 1>)/ should do the trick) , the VPSHUFB lane restriction was another problem, but Simon fixed that in r260063.
I can think of two obvious solutions:
- swap the X86ISD::VPERMV operands, commenting in X86ISelLowering.h that it's different from the instructions. IMO, it's confusing either way.
- return the operands and fix the users. There are many users, some of which (e.g., setTargetShuffleZeroElements) only return a mask themselves. This doesn't seem perfect either.
This (very rough, WIP) patch implements the latter.
What do you think? We might improve this by having a struct wrap <Mask, IsUnary, Ops>, and hopefully avoid computing slightly different things in different places.
If you're going to add this maybe add an assertion for Mask.empty() as well?