Since:
r246981 AVX-512: Lowering for 512-bit vector shuffles.
VPERMV is recognized in getTargetShuffleMask.
This breaks assumptions in most callers, as they expect N->getOperand(0) to be (one of) the vector operand(s). It isn't, as VPERMV has the mask as operand #0 (I can't think of another shuffle-like instruction that works the same).
In the added testcase, this leads the funny-looking:
vmovdqa .LCPI0_0(%rip), %ymm0 # ymm0 = [0,1,2,3,4,5,6,4] vpshufb .LCPI0_1(%rip), %ymm0, %ymm0 # ymm0 = ymm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,16,17,18,18]
In my original testcase (s/i32 4>)/i32 1>)/ should do the trick) , the VPSHUFB lane restriction was another problem, but Simon fixed that in r260063.
I can think of two obvious solutions:
- swap the X86ISD::VPERMV operands, commenting in X86ISelLowering.h that it's different from the instructions. IMO, it's confusing either way.
- return the operands and fix the users. There are many users, some of which (e.g., setTargetShuffleZeroElements) only return a mask themselves. This doesn't seem perfect either.
This (very rough, WIP) patch implements the latter.
What do you think? We might improve this by having a struct wrap <Mask, IsUnary, Ops>, and hopefully avoid computing slightly different things in different places.
Just noticed that we're not attempting to detect unary shuffles (which probably explains why you've had so much trouble getting combines to fire):