This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
ClosedPublic

Authored by craig.topper on Nov 22 2018, 11:57 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel

Commits

rGe35b01f8ea75: [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that…
rL348158: [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that…

Summary

Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Diff Detail

Event Timeline

craig.topper created this revision.Nov 22 2018, 11:57 AM

Harbormaster completed remote builds in B25273: Diff 175065.Nov 22 2018, 11:57 AM

Add a DAG combine to avoid this failing after r347593. Since we now have an AssertSExt and a stronger AssertZExt sandwiched around the truncate.

craig.topper marked an inline comment as done.Nov 26 2018, 2:19 PM

craig.topper added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9082	I wonder if we could just float all Asserts above truncates? And more aggressively merge adjacent asserts?

RKSimon added inline comments.Dec 2 2018, 2:32 AM

lib/Target/X86/X86ISelLowering.cpp
35383	Can the PACKSS equivalent occur as well?

craig.topper marked an inline comment as done.Dec 2 2018, 11:11 AM

craig.topper added inline comments.

lib/Target/X86/X86ISelLowering.cpp
35383	It doesn't occur in any of our lit tests so I'm not sure.

LGTM with one minor, cheers.

lib/Target/X86/X86ISelLowering.cpp
35383	OK - please can you add a TODO comment for now?

This revision is now accepted and ready to land.Dec 3 2018, 12:00 AM

Closed by commit rL348158: [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that… (authored by ctopper). · Explain WhyDec 3 2018, 10:29 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

20 lines

Target/

X86/

X86ISelLowering.cpp

19 lines

test/

CodeGen/

X86/

avx512-cvt-widen.ll

6 lines

Diff 175340

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,073 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::TRUNCATE && N0.hasOneUse() &&
SDLoc DL(N);		SDLoc DL(N);
EVT MinAssertVT = AssertVT.bitsLT(BigA_AssertVT) ? AssertVT : BigA_AssertVT;		EVT MinAssertVT = AssertVT.bitsLT(BigA_AssertVT) ? AssertVT : BigA_AssertVT;
SDValue MinAssertVTVal = DAG.getValueType(MinAssertVT);		SDValue MinAssertVTVal = DAG.getValueType(MinAssertVT);
SDValue NewAssert = DAG.getNode(Opcode, DL, BigA.getValueType(),		SDValue NewAssert = DAG.getNode(Opcode, DL, BigA.getValueType(),
BigA.getOperand(0), MinAssertVTVal);		BigA.getOperand(0), MinAssertVTVal);
return DAG.getNode(ISD::TRUNCATE, DL, N->getValueType(0), NewAssert);		return DAG.getNode(ISD::TRUNCATE, DL, N->getValueType(0), NewAssert);
}		}

		// If we have (AssertZext (truncate (AssertSext X, iX)), iY) and Y is smaller
		craig.topperAuthorUnsubmitted Done Reply Inline Actions I wonder if we could just float all Asserts above truncates? And more aggressively merge adjacent asserts? craig.topper: I wonder if we could just float all Asserts above truncates? And more aggressively merge…
		// than X. Just move the AssertZext in front of the truncate and drop the
		// AssertSExt.
		if (N0.getOpcode() == ISD::TRUNCATE && N0.hasOneUse() &&
		N0.getOperand(0).getOpcode() == ISD::AssertSext &&
		Opcode == ISD::AssertZext) {
		SDValue BigA = N0.getOperand(0);
		EVT BigA_AssertVT = cast<VTSDNode>(BigA.getOperand(1))->getVT();
		assert(BigA_AssertVT.bitsLE(N0.getValueType()) &&
		"Asserting zero/sign-extended bits to a type larger than the "
		"truncated destination does not provide information");

		if (AssertVT.bitsLT(BigA_AssertVT)) {
		SDLoc DL(N);
		SDValue NewAssert = DAG.getNode(Opcode, DL, BigA.getValueType(),
		BigA.getOperand(0), N1);
		return DAG.getNode(ISD::TRUNCATE, DL, N->getValueType(0), NewAssert);
		}
		}

return SDValue();		return SDValue();
}		}

/// If the result of a wider load is shifted to right of N bits and then		/// If the result of a wider load is shifted to right of N bits and then
/// truncated to a narrower type and where N is a multiple of number of bits of		/// truncated to a narrower type and where N is a multiple of number of bits of
/// the narrower type, transform it to a narrower load from address + N / num of		/// the narrower type, transform it to a narrower load from address + N / num of
/// bits of new type. Also narrow the load if the result is masked with an AND		/// bits of new type. Also narrow the load if the result is masked with an AND
/// to effectively produce a smaller type. If the result is to be extended, also		/// to effectively produce a smaller type. If the result is to be extended, also
▲ Show 20 Lines • Show All 9,947 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 35,372 Lines • ▼ Show 20 Lines	for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {
}		}
Bits[Lane * NumDstEltsPerLane + Elt] = Val;		Bits[Lane * NumDstEltsPerLane + Elt] = Val;
}		}
}		}

return getConstVector(Bits, Undefs, VT.getSimpleVT(), DAG, SDLoc(N));		return getConstVector(Bits, Undefs, VT.getSimpleVT(), DAG, SDLoc(N));
}		}

		// Try to combine a PACKUSWB implemented truncate with a regular truncate to
		// create a larger truncate.
		if (Subtarget.hasAVX512() && Opcode == X86ISD::PACKUS &&
		RKSimonUnsubmitted Not Done Reply Inline Actions Can the PACKSS equivalent occur as well? RKSimon: Can the PACKSS equivalent occur as well?
		craig.topperAuthorUnsubmitted Done Reply Inline Actions It doesn't occur in any of our lit tests so I'm not sure. craig.topper: It doesn't occur in any of our lit tests so I'm not sure.
		RKSimonUnsubmitted Not Done Reply Inline Actions OK - please can you add a TODO comment for now? RKSimon: OK - please can you add a TODO comment for now?
		N0.getOpcode() == ISD::TRUNCATE && N1.isUndef() && VT == MVT::v16i8 &&
		N0.getOperand(0).getValueType() == MVT::v8i32) {

		APInt ZeroMask = APInt::getHighBitsSet(16, 8);
		if (DAG.MaskedValueIsZero(N0, ZeroMask)) {
		if (Subtarget.hasVLX())
		return DAG.getNode(X86ISD::VTRUNC, SDLoc(N), VT, N0.getOperand(0));

		// Widen input to v16i32 so we can truncate that.
		SDLoc dl(N);
		SDValue Concat = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v16i32,
		N0.getOperand(0), DAG.getUNDEF(MVT::v8i32));
		return DAG.getNode(ISD::TRUNCATE, SDLoc(N), VT, Concat);
		}
		}

// Attempt to combine as shuffle.		// Attempt to combine as shuffle.
SDValue Op(N, 0);		SDValue Op(N, 0);
if (SDValue Res =		if (SDValue Res =
combineX86ShufflesRecursively({Op}, 0, Op, {0}, {}, /Depth/ 1,		combineX86ShufflesRecursively({Op}, 0, Op, {0}, {}, /Depth/ 1,
/HasVarMask/ false,		/HasVarMask/ false,
/AllowVarMask/ true, DAG, Subtarget))		/AllowVarMask/ true, DAG, Subtarget))
return Res;		return Res;

▲ Show 20 Lines • Show All 6,785 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512-cvt-widen.ll

Show First 20 Lines • Show All 499 Lines • ▼ Show 20 Lines	; VL-NEXT: retq
%res = fptoui <8 x double> %f to <8 x i16>		%res = fptoui <8 x double> %f to <8 x i16>
ret <8 x i16> %res		ret <8 x i16> %res
}		}

define <8 x i8> @f64to8uc(<8 x double> %f) {		define <8 x i8> @f64to8uc(<8 x double> %f) {
; NOVL-LABEL: f64to8uc:		; NOVL-LABEL: f64to8uc:
; NOVL: # %bb.0:		; NOVL: # %bb.0:
; NOVL-NEXT: vcvttpd2dq %zmm0, %ymm0		; NOVL-NEXT: vcvttpd2dq %zmm0, %ymm0
; NOVL-NEXT: vpmovdw %zmm0, %ymm0		; NOVL-NEXT: vpmovdb %zmm0, %xmm0
; NOVL-NEXT: vpackuswb %xmm0, %xmm0, %xmm0
; NOVL-NEXT: vzeroupper		; NOVL-NEXT: vzeroupper
; NOVL-NEXT: retq		; NOVL-NEXT: retq
;		;
; VL-LABEL: f64to8uc:		; VL-LABEL: f64to8uc:
; VL: # %bb.0:		; VL: # %bb.0:
; VL-NEXT: vcvttpd2dq %zmm0, %ymm0		; VL-NEXT: vcvttpd2dq %zmm0, %ymm0
; VL-NEXT: vpmovdw %ymm0, %xmm0		; VL-NEXT: vpmovdb %ymm0, %xmm0
; VL-NEXT: vpackuswb %xmm0, %xmm0, %xmm0
; VL-NEXT: vzeroupper		; VL-NEXT: vzeroupper
; VL-NEXT: retq		; VL-NEXT: retq
%res = fptoui <8 x double> %f to <8 x i8>		%res = fptoui <8 x double> %f to <8 x i8>
ret <8 x i8> %res		ret <8 x i8> %res
}		}

define <4 x i32> @f64to4ui(<4 x double> %a) nounwind {		define <4 x i32> @f64to4ui(<4 x double> %a) nounwind {
; NOVL-LABEL: f64to4ui:		; NOVL-LABEL: f64to4ui:
▲ Show 20 Lines • Show All 1,956 Lines • Show Last 20 Lines