This is an archive of the discontinued LLVM Phabricator instance.

[X86] Also try to zero elts when lowering v32i8 shuffle with PSHUFB.
ClosedPublic

Authored by ab on Apr 28 2016, 7:29 AM.

Details

Summary

Otherwise we fallback to pshufb+blend later on.

We could generalize lowerVectorShuffleAsPSHUFB to ymm, but when I tried that was less readable than doing it inline.

Diff Detail

Repository
rL LLVM

Event Timeline

ab updated this revision to Diff 55413.Apr 28 2016, 7:29 AM
ab retitled this revision from to [X86] Also try to zero elts when lowering v32i8 shuffle with PSHUFB..
ab updated this object.
ab added reviewers: spatel, RKSimon.
ab added a subscriber: llvm-commits.
ab updated this revision to Diff 55414.Apr 28 2016, 7:32 AM
  • break early when the shuffle isn't single-input, even with zeroing.
spatel added inline comments.Apr 28 2016, 8:04 AM
lib/Target/X86/X86ISelLowering.cpp
11427–11436 ↗(On Diff #55413)

Would it make sense to enhance isSingleInputShuffleMask() like:
bool isSingleInputShuffleMask(ArrayRef<int> Mask, bool OrZeroes = false) {}
and then optionally check for zeroes there?

It seems like vperm2* and maybe some other instructions would then be able to use the same helper.

11445 ↗(On Diff #55413)

Please add a comment here to describe the pshufb encoding. Something like:
If the most-significant-bit of any byte of the PSHUFB mask is set, the destination byte is zeroed.
I hate having to look back at the manual. :)

11446 ↗(On Diff #55413)

Use SM_SentinelUndef?

RKSimon edited edge metadata.Apr 28 2016, 8:26 AM

We could generalize lowerVectorShuffleAsPSHUFB to ymm, but when I tried that was less readable than doing it inline.

How much did you have to do to alter lowerVectorShuffleAsPSHUFB? I'd have expected it to end up quite similar to the implementation in combineX86ShuffleChain which I thought was quite straightforward.

lib/Target/X86/X86ISelLowering.cpp
11448 ↗(On Diff #55414)

Convention for lowering is to accept any negative value as undefined - its only later on that we make use of sentinel constants.

ab added a comment.May 5 2016, 1:09 PM

We could generalize lowerVectorShuffleAsPSHUFB to ymm, but when I tried that was less readable than doing it inline.

How much did you have to do to alter lowerVectorShuffleAsPSHUFB? I'd have expected it to end up quite similar to the implementation in combineX86ShuffleChain which I thought was quite straightforward.

Sorry: it's not ymm that's the problem, it's that lowerVectorShuffleAsPSHUFB should really be lowerVectorShuffleAsBlendOfPSHUFBs; if you don't want the final blend, there's not that much you can reuse.
It's also tricky to extract the PSHUFB creation out of this hypothetical lowerVectorShuffleAsBlendOfPSHUFB. It wants a PSHUFB that ignores the lanes from the other operand. But we want a PSHUFB if there are no lanes from the other operand.

I'll give it another think; maybe we can do better.

lib/Target/X86/X86ISelLowering.cpp
11427–11445 ↗(On Diff #55414)

I wanted to do that at first, but you need to look at the operands, and at that point you basically duplicated computeZeroableShuffleElements.

An alternative I considered was to do some kind of computeZeroableShuffleMask(Mask, V1, V2, NewMask), which returns a mask with SM_SentinelZero and SM_SentinelUndef. Then, users can check that it isSingleInputShuffleMask and do their thing. WDYT?

11447–11455 ↗(On Diff #55414)

Heh, fair enough, will do!

spatel added inline comments.May 5 2016, 2:13 PM
lib/Target/X86/X86ISelLowering.cpp
11427–11445 ↗(On Diff #55414)

That would be good - slow down the code explosion for x86 vector lowering.
But I don't think we have to hold up this patch for that; a TODO comment should be ok.

LGTM, but I'll let Simon give the final word because he has a much better understanding of everything going on with shuffles.

RKSimon accepted this revision.May 7 2016, 7:56 AM
RKSimon edited edge metadata.

LGTM with a couple of nits:

Please can you add a comment explaining why lowerVectorShuffleAsPSHUFB can't be used?

I think we're getting to the point where we should probably just generate Zeroable as standard and pass it around the same as Mask - we're generating it a lot these days. That would also make the addition of a 'isSingleInputShuffleMaskWithZeroables' helper more straightforward. We can look into this for a future patch.

lib/Target/X86/X86ISelLowering.cpp
11448 ↗(On Diff #55414)

Sorry I was wrong here (I missed the Zeroable case afterward) - it should read Mask[i] == SM_SentinelUndef

This revision is now accepted and ready to land.May 7 2016, 7:56 AM
ab updated this revision to Diff 58824.May 27 2016, 12:26 PM
ab edited edge metadata.

I did a final review before committing, but realized I missed the v16i16 case.

Here's an alternative patch, with a generalized PSHUFB lowering, and "zeroable mask" computation. WDYT?

I went ahead and renamed lowerVectorShuffleAsPSHUFB in r271024.

Minor comments but my previous LGTM still stands!

lib/Target/X86/X86ISelLowering.cpp
7189 ↗(On Diff #58824)

const int NumEltBytes = VT.getScalarSizeInBits() / 8;

7196 ↗(On Diff #58824)

Can't we just pass Subtarget as an argument like we do in other cases?

7205 ↗(On Diff #58824)

Rephrase 'Sign bit set in i8 mask means zero element'?

This revision was automatically updated to reflect the committed changes.