This is an archive of the discontinued LLVM Phabricator instance.

[x86] PR24562: fix incorrect folding of X86ISD::PSHUFB nodes that have a mask of all indices with the most significant bit set.
ClosedPublic

Authored by andreadb on Oct 1 2015, 1:20 PM.

Details

Summary

This patch fixes a problem in function 'combineX86ShuffleChain' that causes a chain of shuffles to be wrongly folded away when the combined shuffle mask has only one element.

We may end up with a combined shuffle mask of one element as a result of multiple calls to function 'canWidenShuffleElements()'.
Function canWidenShuffleElements attempts to simplify a shuffle mask by widening the size of the elements being shuffled.
For every pair of shuffle indices, function canWidenShuffleElements checks if indices refer to adjacent elements.
If all pairs refer to "adjacent" elements then the shuffle mask is safely widened. As a consequence of widening, we end up with a new shuffle mask which is half the size of the original shuffle mask.

The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero indices.
Function canWidenShuffleElements would combine each pair of SM_SentinelZero indices into a single SM_SentinelZero index. So, in a logarithmic number of steps (4 in this case), the pshufb mask is simplified to a mask with only one index which is equal to SM_SentinelZero.

Before this patch, function combineX86ShuffleChain wrongly assumed that a mask of size one is always equivalent to an identity mask. So, the entire shuffle chain was just folded away as the combined shuffle mask was treated as a no-op mask.

With this patch we know check if the only element of a combined shuffle mask is SM_SentinelZero. In case, we propagate a zero vector.

Please let me know if ok to submit.

Thanks,
Andrea

Diff Detail

Event Timeline

andreadb updated this revision to Diff 36281.Oct 1 2015, 1:20 PM
andreadb retitled this revision from to [x86] PR24562: fix incorrect folding of X86ISD::PSHUFB nodes that have a mask of all indices with the most significant bit set..
andreadb updated this object.
andreadb added a subscriber: llvm-commits.
RKSimon added inline comments.Oct 3 2015, 5:30 AM
lib/Target/X86/X86ISelLowering.cpp
21966

Can we assert here that the single Mask index is either SentinelZero or a valid element index? Undef should never get here correct?

andreadb added inline comments.Oct 4 2015, 6:38 AM
lib/Target/X86/X86ISelLowering.cpp
21966

Hi Simon,
We can't assert that a Mask index is either SentinelZero or a valid element index.

Consider the following example:

define <2 x i64> @TestUndef() {
entry:
  %0 = call <16 x i8> @llvm.x86.ssse3.pshuf.b.128(<16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, <16 x i8> <i8 0, i8 1, i8 undef, i8 undef, i8 4, i8 5, i8 undef, i8 undef, i8 8, i8 9, i8 undef, i8 undef, i8 2, i8 3, i8 undef, i8 undef>)
  %1 = call <16 x i8> @llvm.x86.ssse3.pshuf.b.128(<16 x i8> %0, <16 x i8> <i8 undef, i8 undef, i8 2, i8 3, i8 undef, i8 undef, i8 6, i8 7, i8 undef, i8 undef, i8 2, i8 3, i8 undef, i8 undef, i8 14, i8 15>)
  %2 = bitcast <16 x i8> %1 to <2 x i64>
  ret <2 x i64> %2
}

declare <16 x i8> @llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)

Function PerformShuffleCombine is able to recursively look through chains of shuffles.
In this example, it would firstly decode the PSHUFB mask for %1 (the second pshufb in the sequence) using the functionalities in X86ShuffleDecode.cpp. For each decoded index in the mask, it would check if the index points to an element of another shuffle. If so, then it would look through that shuffle and propagate the "combined" shuffle mask.

In our example, mask Index 2 of %1 points to the element at position 2 of %0.
PerformShuffleCombine would decode %0' shuffle mask and eventually realize that element 2 of %0 is 'Undef'. As a result, it would replace the 2nd shuffle mask index of %1 with SM_SentinelUndef.

In this example, all the shuffle indices of %1 are eventually replaced with SM_SentinelUndef. Later on, function 'canWidenShuffleElements' shrinks the shuffle mask to a mask with a single SM_SentinelUndef. That mask would be invalid according to the assertion you suggests.

So, to answer to your question, Undef can reach this code.

RKSimon edited edge metadata.Oct 6 2015, 6:05 AM

What about asserting that the Index is Zero/Undef or a valid index? This might be going too far but the 'undef == (index < 0)' pattern is something that is very common in the shuffle code and I think we should be doing more to sanitize these values,

What about asserting that the Index is Zero/Undef or a valid index? This might be going too far but the 'undef == (index < 0)' pattern is something that is very common in the shuffle code and I think we should be doing more to sanitize these values,

Right, I will do it. I understand your concern (and I have similar concerns :-) ).
As you said, "this may be going to far" but it definitely won't hurt to have an extra assert in the code to validate a precondition.
I will update the patch.

Thanks Simon!

andreadb updated this revision to Diff 36624.Oct 6 2015, 7:41 AM
andreadb edited edge metadata.

Patch updated.
Added assert to make sure that the shuffle index is Zero/Undef or a valid index.

Please let me know if ok to submit.

Thanks,
Andrea

RKSimon accepted this revision.Oct 11 2015, 7:07 AM
RKSimon edited edge metadata.

LGTM - 1 final minor comment.

lib/Target/X86/X86ISelLowering.cpp
21981

The comments inside the if / else blocks can probably be removed - they can be described in the preamble above (the SentinelZero case already is) and then the fact that the blocks don't have braces isn't so jarring.

This revision is now accepted and ready to land.Oct 11 2015, 7:07 AM
This revision was automatically updated to reflect the committed changes.