Page MenuHomePhabricator

[X86][AVX] Fix wrong lowering of VPERM2X128 nodes.
ClosedPublic

Authored by andreadb on Mar 6 2015, 12:16 PM.

Details

Summary

There are cases where the backend computes a wrong permute mask for a VPERM2X128 node.

Example:

define <8 x float> @foo(<8 x float> %a, <8 x float> %b) {
  %shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7>
  ret <8 x float> %shuffle
}

If we build this example with -mattr=+avx, we get the following assembly:

vperm2f128 $0, %ymm0, %ymm0, %ymm0  # ymm0 = ymm0[0,1,0,1]

However, it should have been:

vperm2f128 $17, %ymm0, %ymm0, %ymm0  # ymm0 = ymm0[2,3,2,3]

It turns out that function 'lowerV2X128VectorShuffle' doesn't check if the shuffle mask contains 'undef' indices at position 0 and 2. So, there are (few) cases where the backend expands a shuffle into a VPERM2X128 with a wrong shuffle mask.

Back to the example:
The initial selection dag contains the following shuffle node:

v8f32 = vector_shuffle V0, V1<u,u,6,7,u,u,6,7>

During legalization, the suffle value type is converted from v8f32 to v4f64:

v4f64 = vector_shuffle V0, V1<u,3,u,3>

The shuffle lowering tries to expand this shuffle into a VPERM2X128. However, the permute mask is wrongly computed as 0.

This patch fixes the problem by checking if Mask[0] and Mask[2] contain value SM_SentinelUndef.

Please let me know if ok to submit.

Thanks!
Andrea

Diff Detail

Repository
rL LLVM

Event Timeline

andreadb updated this revision to Diff 21380.Mar 6 2015, 12:16 PM
andreadb retitled this revision from to [X86][AVX] Fix wrong lowering of VPERM2X128 nodes..
andreadb updated this object.
andreadb edited the test plan for this revision. (Show Details)
andreadb added reviewers: chandlerc, qcolombet, mkuper.
andreadb added a subscriber: Unknown Object (MLST).

This patch fixes the problem by checking if Mask[0] and Mask[2] contain value SM_SentinelZero.

I should have written:
"This patch fixes the problem by checking if Mask[0] and Mask[2] contain value SM_SentinelUndef".

andreadb updated this revision to Diff 21422.Mar 7 2015, 5:08 AM
andreadb updated this object.

Updated patch. Added more test cases in file avx-vperm2x128.ll.

mkuper edited edge metadata.Mar 8 2015, 4:47 AM

LGTM, except one thing I'm not sure about (see comment).

lib/Target/X86/X86ISelLowering.cpp
9090 ↗(On Diff #21422)

Can we end up with both mask elements being 'u'? E.g. <u, u, 0, 1>?
Or will all these cases be caught by different code paths?

Not that it really matters in practice, since even if it ends up here we'll just end up with -1 / 2 == 0 as the VPERM mask, but I don't think we want to the mask to depend on the numerical value of SentinelUndef.

Hi Michael

lib/Target/X86/X86ISelLowering.cpp
9090 ↗(On Diff #21422)

Yes, we can end up with both mask elements being undef.
That would be legal according to function 'canWidenShuffleElements'.
In practice, as you said, it won't really matter as we would end up propagating index 0, which is still OK considering that it is undef.

I added explicit checks against SentinelUndef because of the FIXME message at line 9089. Basically, at some point, we may want to also check for SM_SentinelZero and use a different strategy for that.

mkuper added a comment.Mar 8 2015, 7:22 AM

I still think it'd be best to add an explicit check for both elements being SM_SentinelUndef.
I'd really hate to be the guy who has to hunt down bugs caused by changing the value of SM_SentinelUndef to be, say, INT_MIN instead of -1. :-)

andreadb updated this revision to Diff 21450.Mar 8 2015, 8:48 AM
andreadb updated this object.
andreadb edited edge metadata.

Hi Michael,

Not sure if this new patch would address your comments.
However, this should be more resilient to potential changes to the vaule of SM_SentinelUndef.

Please let me know what you think.

Thanks again for your time!
-Andrea

mkuper added a comment.EditedMar 8 2015, 8:52 AM

I may be missing something, but I don't see how this helps.
We'll still get a nonsense MaskLO if both Mask[0] and Mask[1] are SM_SentinelUndef, and SM_SentinelUndef happens to be INT_MIN.
Why not set MaskLO/MaskHI to 0 explicitly if both relevant Mask elements are undef?

(I know this is purely theoretical, if you feel it unnecessarily complicates the code, feel free to ignore.)

I may be missing something, but I don't see how this helps.
We'll still get a nonsense MaskLO if both Mask[0] and Mask[1] are SM_SentinelUndef, and SM_SentinelUndef happens to be INT_MIN.
Why not set MaskLO/MaskHI to 0 explicitly if both relevant Mask elements are undef?

(I know this is purely theoretical, if you feel it unnecessarily complicates the code, feel free to ignore.)

Ok right. Now I think I understand what you meant before :-).
Sorry for the confusion..

Right, I can change the code so that we propagate index 0 if the resulting permute index is SM_SentinelUndef.

andreadb updated this revision to Diff 21453.Mar 8 2015, 9:14 AM

Hi Michael,

Here is a new version of the patch.
This time, we explicitly propagate permute index 0 (instead of SM_SentinelUndef) if we see that the resulting MaskLO/MaskHI would be undef.

Thanks,
Andrea

mkuper accepted this revision.Mar 8 2015, 9:20 AM
mkuper edited edge metadata.

Yes, thanks.
LGTM.

This revision is now accepted and ready to land.Mar 8 2015, 9:20 AM
This revision was automatically updated to reflect the committed changes.

Thanks Michael!
Committed revision 231601.