There are cases where the backend computes a wrong permute mask for a VPERM2X128 node.
Example:
define <8 x float> @foo(<8 x float> %a, <8 x float> %b) { %shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7> ret <8 x float> %shuffle }
If we build this example with -mattr=+avx, we get the following assembly:
vperm2f128 $0, %ymm0, %ymm0, %ymm0 # ymm0 = ymm0[0,1,0,1]
However, it should have been:
vperm2f128 $17, %ymm0, %ymm0, %ymm0 # ymm0 = ymm0[2,3,2,3]
It turns out that function 'lowerV2X128VectorShuffle' doesn't check if the shuffle mask contains 'undef' indices at position 0 and 2. So, there are (few) cases where the backend expands a shuffle into a VPERM2X128 with a wrong shuffle mask.
Back to the example:
The initial selection dag contains the following shuffle node:
v8f32 = vector_shuffle V0, V1<u,u,6,7,u,u,6,7>
During legalization, the suffle value type is converted from v8f32 to v4f64:
v4f64 = vector_shuffle V0, V1<u,3,u,3>
The shuffle lowering tries to expand this shuffle into a VPERM2X128. However, the permute mask is wrongly computed as 0.
This patch fixes the problem by checking if Mask[0] and Mask[2] contain value SM_SentinelUndef.
Please let me know if ok to submit.
Thanks!
Andrea
Can we end up with both mask elements being 'u'? E.g. <u, u, 0, 1>?
Or will all these cases be caught by different code paths?
Not that it really matters in practice, since even if it ends up here we'll just end up with -1 / 2 == 0 as the VPERM mask, but I don't think we want to the mask to depend on the numerical value of SentinelUndef.