Download Raw Diff

Details

Reviewers

qcolombet
chandlerc
mkuper

Commits

rGdf6c515f381b: [release_36] Cherry-pick r231601.
rG6c7d70469cd0: [X86][AVX] Fix wrong lowering of VPERM2X128 nodes
rL232803: [release_36] Cherry-pick r231601.
rL231601: [X86][AVX] Fix wrong lowering of VPERM2X128 nodes

Summary

There are cases where the backend computes a wrong permute mask for a VPERM2X128 node.

Example:

define <8 x float> @foo(<8 x float> %a, <8 x float> %b) {
  %shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7>
  ret <8 x float> %shuffle
}

If we build this example with -mattr=+avx, we get the following assembly:

vperm2f128 $0, %ymm0, %ymm0, %ymm0  # ymm0 = ymm0[0,1,0,1]

However, it should have been:

vperm2f128 $17, %ymm0, %ymm0, %ymm0  # ymm0 = ymm0[2,3,2,3]

It turns out that function 'lowerV2X128VectorShuffle' doesn't check if the shuffle mask contains 'undef' indices at position 0 and 2. So, there are (few) cases where the backend expands a shuffle into a VPERM2X128 with a wrong shuffle mask.

Back to the example:
The initial selection dag contains the following shuffle node:

v8f32 = vector_shuffle V0, V1<u,u,6,7,u,u,6,7>

During legalization, the suffle value type is converted from v8f32 to v4f64:

v4f64 = vector_shuffle V0, V1<u,3,u,3>

The shuffle lowering tries to expand this shuffle into a VPERM2X128. However, the permute mask is wrongly computed as 0.

This patch fixes the problem by checking if Mask[0] and Mask[2] contain value SM_SentinelUndef.

Please let me know if ok to submit.

Thanks!
Andrea

Diff Detail

Event Timeline

andreadb updated this revision to Diff 21380.Mar 6 2015, 12:16 PM

andreadb retitled this revision from to [X86][AVX] Fix wrong lowering of VPERM2X128 nodes..

andreadb updated this object.

andreadb edited the test plan for this revision. (Show Details)

andreadb added reviewers: chandlerc, qcolombet, mkuper.

andreadb added a subscriber: Unknown Object (MLST).

This patch fixes the problem by checking if Mask[0] and Mask[2] contain value SM_SentinelZero.

I should have written:
"This patch fixes the problem by checking if Mask[0] and Mask[2] contain value SM_SentinelUndef".

Updated patch. Added more test cases in file avx-vperm2x128.ll.

LGTM, except one thing I'm not sure about (see comment).

lib/Target/X86/X86ISelLowering.cpp
9090–9092	Can we end up with both mask elements being 'u'? E.g. <u, u, 0, 1>? Or will all these cases be caught by different code paths? Not that it really matters in practice, since even if it ends up here we'll just end up with -1 / 2 == 0 as the VPERM mask, but I don't think we want to the mask to depend on the numerical value of SentinelUndef.

Hi Michael

lib/Target/X86/X86ISelLowering.cpp
9090–9092	Yes, we can end up with both mask elements being undef. That would be legal according to function 'canWidenShuffleElements'. In practice, as you said, it won't really matter as we would end up propagating index 0, which is still OK considering that it is undef. I added explicit checks against SentinelUndef because of the FIXME message at line 9089. Basically, at some point, we may want to also check for SM_SentinelZero and use a different strategy for that.

I still think it'd be best to add an explicit check for both elements being SM_SentinelUndef.
I'd really hate to be the guy who has to hunt down bugs caused by changing the value of SM_SentinelUndef to be, say, INT_MIN instead of -1. :-)

Hi Michael,

Not sure if this new patch would address your comments.
However, this should be more resilient to potential changes to the vaule of SM_SentinelUndef.

Please let me know what you think.

Thanks again for your time!
-Andrea

I may be missing something, but I don't see how this helps.
We'll still get a nonsense MaskLO if both Mask[0] and Mask[1] are SM_SentinelUndef, and SM_SentinelUndef happens to be INT_MIN.
Why not set MaskLO/MaskHI to 0 explicitly if both relevant Mask elements are undef?

(I know this is purely theoretical, if you feel it unnecessarily complicates the code, feel free to ignore.)

In D8119#136324, @mkuper wrote:

I may be missing something, but I don't see how this helps.
We'll still get a nonsense MaskLO if both Mask[0] and Mask[1] are SM_SentinelUndef, and SM_SentinelUndef happens to be INT_MIN.
Why not set MaskLO/MaskHI to 0 explicitly if both relevant Mask elements are undef?

(I know this is purely theoretical, if you feel it unnecessarily complicates the code, feel free to ignore.)

Ok right. Now I think I understand what you meant before :-).
Sorry for the confusion..

Right, I can change the code so that we propagate index 0 if the resulting permute index is SM_SentinelUndef.

Hi Michael,

Here is a new version of the patch.
This time, we explicitly propagate permute index 0 (instead of SM_SentinelUndef) if we see that the resulting MaskLO/MaskHI would be undef.

Thanks,
Andrea

Yes, thanks.
LGTM.

This revision is now accepted and ready to land.Mar 8 2015, 9:20 AM

Closed by commit rL231601: [X86][AVX] Fix wrong lowering of VPERM2X128 nodes (authored by adibiagio). · Explain WhyMar 8 2015, 9:31 AM

This revision was automatically updated to reflect the committed changes.

Thanks Michael!
Committed revision 231601.

Diff 21380

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,081 Lines • ▼ Show 20 Lines	SDValue LoV = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVT, V1,
DAG.getIntPtrConstant(0));		DAG.getIntPtrConstant(0));
SDValue HiV = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVT, V2,		SDValue HiV = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVT, V2,
DAG.getIntPtrConstant(2));		DAG.getIntPtrConstant(2));
return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, LoV, HiV);		return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, LoV, HiV);
}		}

// Otherwise form a 128-bit permutation.		// Otherwise form a 128-bit permutation.
// FIXME: Detect zero-vector inputs and use the VPERM2X128 to zero that half.		// FIXME: Detect zero-vector inputs and use the VPERM2X128 to zero that half.
unsigned PermMask = Mask[0] / 2 \| (Mask[2] / 2) << 4;		int MaskLO = Mask[0] == SM_SentinelUndef ? Mask[1] : Mask[0];
		int MaskHI = Mask[2] == SM_SentinelUndef ? Mask[3] : Mask[2];
		unsigned PermMask = MaskLO / 2 \| (MaskHI / 2) << 4;
		mkuperUnsubmitted Not Done Reply Inline Actions Can we end up with both mask elements being 'u'? E.g. <u, u, 0, 1>? Or will all these cases be caught by different code paths? Not that it really matters in practice, since even if it ends up here we'll just end up with -1 / 2 == 0 as the VPERM mask, but I don't think we want to the mask to depend on the numerical value of SentinelUndef. mkuper: Can we end up with both mask elements being 'u'? E.g. <u, u, 0, 1>? Or will all these cases be…
		andreadbAuthorUnsubmitted Not Done Reply Inline Actions Yes, we can end up with both mask elements being undef. That would be legal according to function 'canWidenShuffleElements'. In practice, as you said, it won't really matter as we would end up propagating index 0, which is still OK considering that it is undef. I added explicit checks against SentinelUndef because of the FIXME message at line 9089. Basically, at some point, we may want to also check for SM_SentinelZero and use a different strategy for that. andreadb: Yes, we can end up with both mask elements being undef. That would be legal according to…
return DAG.getNode(X86ISD::VPERM2X128, DL, VT, V1, V2,		return DAG.getNode(X86ISD::VPERM2X128, DL, VT, V1, V2,
DAG.getConstant(PermMask, MVT::i8));		DAG.getConstant(PermMask, MVT::i8));
}		}

/// \brief Lower a vector shuffle by first fixing the 128-bit lanes and then		/// \brief Lower a vector shuffle by first fixing the 128-bit lanes and then
/// shuffling each lane.		/// shuffling each lane.
///		///
/// This will only succeed when the result of fixing the 128-bit lanes results		/// This will only succeed when the result of fixing the 128-bit lanes results
▲ Show 20 Lines • Show All 15,464 Lines • Show Last 20 Lines

test/CodeGen/X86/avx-vperm2x128.ll

Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	entry:
ret <16 x i16> %shuffle		ret <16 x i16> %shuffle
}		}

;;;; Cases with undef indicies mixed in the mask		;;;; Cases with undef indicies mixed in the mask

define <8 x float> @F(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {		define <8 x float> @F(<8 x float> %a, <8 x float> %b) nounwind uwtable readnone ssp {
; ALL-LABEL: F:		; ALL-LABEL: F:
; ALL: ## BB#0: ## %entry		; ALL: ## BB#0: ## %entry
; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm1[0,1,0,1]		; ALL-NEXT: vperm2f128 {{.*#+}} ymm0 = ymm0[2,3],ymm1[0,1]
; ALL-NEXT: retq		; ALL-NEXT: retq
entry:		entry:
%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 9, i32 undef, i32 11>		%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 9, i32 undef, i32 11>
ret <8 x float> %shuffle		ret <8 x float> %shuffle
}		}

;;;; Cases we must not select vperm2f128		;;;; Cases we must not select vperm2f128

Show All 10 Lines

test/CodeGen/X86/vector-shuffle-512-v8.ll

Show First 20 Lines • Show All 680 Lines • ▼ Show 20 Lines	; ALL-NEXT: retq
%shuffle = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 6>		%shuffle = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 6>
ret <8 x double> %shuffle		ret <8 x double> %shuffle
}		}

define <8 x double> @shuffle_v8f64_c348cda0(<8 x double> %a, <8 x double> %b) {		define <8 x double> @shuffle_v8f64_c348cda0(<8 x double> %a, <8 x double> %b) {
; ALL-LABEL: shuffle_v8f64_c348cda0:		; ALL-LABEL: shuffle_v8f64_c348cda0:
; ALL: # BB#0:		; ALL: # BB#0:
; ALL-NEXT: vextractf64x4 $1, %zmm0, %ymm2		; ALL-NEXT: vextractf64x4 $1, %zmm0, %ymm2
; ALL-NEXT: vperm2f128 {{.*#+}} ymm2 = ymm0[0,1],ymm2[0,1]		; ALL-NEXT: vperm2f128 {{.*#+}} ymm2 = ymm0[2,3],ymm2[0,1]
; ALL-NEXT: vextractf64x4 $1, %zmm1, %ymm3		; ALL-NEXT: vextractf64x4 $1, %zmm1, %ymm3
; ALL-NEXT: vbroadcastsd %xmm1, %ymm4		; ALL-NEXT: vbroadcastsd %xmm1, %ymm4
; ALL-NEXT: vblendpd {{.*#+}} ymm4 = ymm3[0,1,2],ymm4[3]		; ALL-NEXT: vblendpd {{.*#+}} ymm4 = ymm3[0,1,2],ymm4[3]
; ALL-NEXT: vblendpd {{.*#+}} ymm2 = ymm4[0],ymm2[1,2],ymm4[3]		; ALL-NEXT: vblendpd {{.*#+}} ymm2 = ymm4[0],ymm2[1,2],ymm4[3]
; ALL-NEXT: vblendpd {{.*#+}} ymm1 = ymm3[0,1],ymm1[2],ymm3[3]		; ALL-NEXT: vblendpd {{.*#+}} ymm1 = ymm3[0,1],ymm1[2],ymm3[3]
; ALL-NEXT: vbroadcastsd %xmm0, %ymm0		; ALL-NEXT: vbroadcastsd %xmm0, %ymm0
; ALL-NEXT: vblendpd {{.*#+}} ymm0 = ymm1[0,1,2],ymm0[3]		; ALL-NEXT: vblendpd {{.*#+}} ymm0 = ymm1[0,1,2],ymm0[3]
; ALL-NEXT: vinsertf64x4 $1, %ymm0, %zmm2, %zmm0		; ALL-NEXT: vinsertf64x4 $1, %ymm0, %zmm2, %zmm0
▲ Show 20 Lines • Show All 699 Lines • ▼ Show 20 Lines	; ALL-NEXT: retq
%shuffle = shufflevector <8 x i64> %a, <8 x i64> %b, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 6>		%shuffle = shufflevector <8 x i64> %a, <8 x i64> %b, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 6, i32 6>
ret <8 x i64> %shuffle		ret <8 x i64> %shuffle
}		}

define <8 x i64> @shuffle_v8i64_6caa87e5(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @shuffle_v8i64_6caa87e5(<8 x i64> %a, <8 x i64> %b) {
; ALL-LABEL: shuffle_v8i64_6caa87e5:		; ALL-LABEL: shuffle_v8i64_6caa87e5:
; ALL: # BB#0:		; ALL: # BB#0:
; ALL-NEXT: vextracti64x4 $1, %zmm0, %ymm0		; ALL-NEXT: vextracti64x4 $1, %zmm0, %ymm0
; ALL-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[0,1,0,1]
; ALL-NEXT: vextracti64x4 $1, %zmm1, %ymm3
; ALL-NEXT: vpblendd {{.*#+}} ymm4 = ymm1[0,1,2,3],ymm3[4,5],ymm1[6,7]
; ALL-NEXT: vpblendd {{.*#+}} ymm2 = ymm4[0,1],ymm2[2,3],ymm4[4,5],ymm2[6,7]
; ALL-NEXT: vperm2i128 {{.*#+}} ymm0 = ymm0[2,3,0,1]		; ALL-NEXT: vperm2i128 {{.*#+}} ymm0 = ymm0[2,3,0,1]
; ALL-NEXT: vpblendd {{.*#+}} ymm1 = ymm3[0,1,2,3],ymm1[4,5,6,7]		; ALL-NEXT: vextracti64x4 $1, %zmm1, %ymm2
		; ALL-NEXT: vpblendd {{.*#+}} ymm3 = ymm1[0,1,2,3],ymm2[4,5],ymm1[6,7]
		; ALL-NEXT: vpblendd {{.*#+}} ymm3 = ymm3[0,1],ymm0[2,3],ymm3[4,5],ymm0[6,7]
		; ALL-NEXT: vpblendd {{.*#+}} ymm1 = ymm2[0,1,2,3],ymm1[4,5,6,7]
; ALL-NEXT: vpshufd {{.*#+}} ymm1 = ymm1[0,1,0,1,4,5,4,5]		; ALL-NEXT: vpshufd {{.*#+}} ymm1 = ymm1[0,1,0,1,4,5,4,5]
; ALL-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5,6,7]		; ALL-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5,6,7]
; ALL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0		; ALL-NEXT: vinserti64x4 $1, %ymm3, %zmm0, %zmm0
; ALL-NEXT: retq		; ALL-NEXT: retq
%shuffle = shufflevector <8 x i64> %a, <8 x i64> %b, <8 x i32> <i32 6, i32 12, i32 10, i32 10, i32 8, i32 7, i32 14, i32 5>		%shuffle = shufflevector <8 x i64> %a, <8 x i64> %b, <8 x i32> <i32 6, i32 12, i32 10, i32 10, i32 8, i32 7, i32 14, i32 5>
ret <8 x i64> %shuffle		ret <8 x i64> %shuffle
}		}

define <8 x double> @shuffle_v8f64_082a4c6e(<8 x double> %a, <8 x double> %b) {		define <8 x double> @shuffle_v8f64_082a4c6e(<8 x double> %a, <8 x double> %b) {
; ALL-LABEL: shuffle_v8f64_082a4c6e:		; ALL-LABEL: shuffle_v8f64_082a4c6e:
; ALL: # BB#0:		; ALL: # BB#0:
Show All 32 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86][AVX] Fix wrong lowering of VPERM2X128 nodes.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 21380

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/X86/avx-vperm2x128.ll

test/CodeGen/X86/vector-shuffle-512-v8.ll

This is an archive of the discontinued LLVM Phabricator instance.

[X86][AVX] Fix wrong lowering of VPERM2X128 nodes.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 21380

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/X86/avx-vperm2x128.ll

test/CodeGen/X86/vector-shuffle-512-v8.ll

[X86][AVX] Fix wrong lowering of VPERM2X128 nodes.
ClosedPublic