Otherwise we fallback to pshufb+blend later on.
We could generalize lowerVectorShuffleAsPSHUFB to ymm, but when I tried that was less readable than doing it inline.
Differential D19661
[X86] Also try to zero elts when lowering v32i8 shuffle with PSHUFB. ab on Apr 28 2016, 7:29 AM. Authored by
Details Otherwise we fallback to pshufb+blend later on. We could generalize lowerVectorShuffleAsPSHUFB to ymm, but when I tried that was less readable than doing it inline.
Diff Detail Event Timeline
Comment Actions
How much did you have to do to alter lowerVectorShuffleAsPSHUFB? I'd have expected it to end up quite similar to the implementation in combineX86ShuffleChain which I thought was quite straightforward.
Comment Actions Sorry: it's not ymm that's the problem, it's that lowerVectorShuffleAsPSHUFB should really be lowerVectorShuffleAsBlendOfPSHUFBs; if you don't want the final blend, there's not that much you can reuse. I'll give it another think; maybe we can do better.
Comment Actions LGTM with a couple of nits: Please can you add a comment explaining why lowerVectorShuffleAsPSHUFB can't be used? I think we're getting to the point where we should probably just generate Zeroable as standard and pass it around the same as Mask - we're generating it a lot these days. That would also make the addition of a 'isSingleInputShuffleMaskWithZeroables' helper more straightforward. We can look into this for a future patch.
Comment Actions I did a final review before committing, but realized I missed the v16i16 case. Here's an alternative patch, with a generalized PSHUFB lowering, and "zeroable mask" computation. WDYT? I went ahead and renamed lowerVectorShuffleAsPSHUFB in r271024. |
const int NumEltBytes = VT.getScalarSizeInBits() / 8;