This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86ISelDAGToDAG.cpp
2
X86ISelLowering.cpp
-
X86InstrAVX512.td
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx512-select.ll
-
combine-bitselect.ll
-
ssub_sat_vec.ll
-
usub_sat_vec.ll
-
vec_ssubo.ll
-
vec_usubo.ll

Differential D85499

[X86] Canonicalize andnp for bitmask arithmetic
AbandonedPublic

Authored by loladiro on Aug 6 2020, 9:46 PM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
spatel

Summary

We have a DAG combine that tries to fold (vselect cond, 0000..., X) -> (andnp cond, x).
However, it does so by attempting to create an i64 vector with the number
of elements obtained by truncating division by 64 from the bitwidth. This is
bad for mask vectors like v8i1, since that division is just zero. Besides,
we don't want i64 vectors anyway. The easy change is just to avoid changing
the VT, but this is slightly problematic because the canonical pattern for
kandn is (and (vnot a) b) rather than (x86andnp a b), so this fails
to select. Rather than playing games here with having the mask vectors
use a different canonical representation, the bulk of this commit switches
the canonical ISD representation for kandn to (x86andnp a b) such
that all vector types may be handled equally here. To avoid regressing
other tests, we need to extend a few other folds to handle x86andnp in
addition to plain and. However, that should be generally a good
improvement, since x86andnp is already canonical for non-i1 vectors
prior to this commit, and said folds were just missing.

When all is said and done, fixes the issue reported in
https://github.com/JuliaLang/julia/issues/36955.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

loladiro created this revision.Aug 6 2020, 9:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2020, 9:46 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

loladiro requested review of this revision.Aug 6 2020, 9:46 PM

loladiro added a subscriber: Restricted Project.Aug 6 2020, 9:47 PM

Harbormaster completed remote builds in B67414: Diff 283807.Aug 6 2020, 9:55 PM

I kind of think we shouldn't change the canonical form for masks and we should handle the mask case of vselect in generic DAG combine like we do in InstCombine. Maybe we could handle the non mask case there instead too, but that depend on whether we can form ANDNP afterwards.

I think that would be fine, but in that case, we should probably get rid of x86andnp entirely and just make (and (not x) y) canonical everywhere. The supported patterns for these are much the same, so if we have both canonical forms, we're just signing up for double the work.

I'm all for removing ANDNP. I think it largely exists for load folding and/or shuffle lowering/combining. Neither of which apply to mask which is why I'm hesitant to change mask to match it. There are also other mask patterns in tablegen rooted on AND that aren't covered by tryVPTESTM.

Are you saying you think we can get away with removing X86ISD::ANDNP entirely? We don't have it for the BMI scalar variant, and it /usually/ works.

llvm/lib/Target/X86/X86ISelLowering.cpp
39637	This looks a separate NFC-ish change - a legacy from when the bitops were legal for just vXi64 types

chriselrod added a subscriber: chriselrod.Aug 7 2020, 7:11 AM

In D85499#2202181, @RKSimon wrote:

Are you saying you think we can get away with removing X86ISD::ANDNP entirely? We don't have it for the BMI scalar variant, and it /usually/ works.

I'm not saying that we can definitely get away with removing it. It has reasons for existing. I'm just saying I don't want to spread it further. We usually like to avoid target specific nodes when possible so increasing the scope of a target specific node seems the wrong direction. Masks and xmm/ymm/zmm vectors are already treated differently in many places. I don't see that having two canonical representations makes their differences that much larger than they already are.

llvm/lib/Target/X86/X86ISelLowering.cpp
39637	This is the buggy code that is creating ANDNP for vXi1. The bitcast part is NFCish.

loladiro mentioned this in D85553: [X86] Don't produce bad x86andp nodes for i1 vectors.Aug 7 2020, 1:42 PM

loladiro mentioned this in rGc58674df147a: [X86] Don't produce bad x86andp nodes for i1 vectors.Aug 7 2020, 5:19 PM

Abandon this now D85553 has landed?

loladiro abandoned this revision.Aug 10 2020, 1:47 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelDAGToDAG.cpp

40 lines

X86ISelLowering.cpp

18 lines

X86InstrAVX512.td

13 lines

test/

CodeGen/

X86/

71 lines

18 lines

8 lines

8 lines

2 lines

6 lines

Diff 283807

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 493 Lines • ▼ Show 20 Lines	private:
bool foldLoadStoreIntoMemOperand(SDNode *Node);		bool foldLoadStoreIntoMemOperand(SDNode *Node);
MachineSDNode matchBEXTRFromAndImm(SDNode Node);		MachineSDNode matchBEXTRFromAndImm(SDNode Node);
bool matchBitExtract(SDNode *Node);		bool matchBitExtract(SDNode *Node);
bool shrinkAndImmediate(SDNode *N);		bool shrinkAndImmediate(SDNode *N);
bool isMaskZeroExtended(SDNode *N) const;		bool isMaskZeroExtended(SDNode *N) const;
bool tryShiftAmountMod(SDNode *N);		bool tryShiftAmountMod(SDNode *N);
bool tryShrinkShlLogicImm(SDNode *N);		bool tryShrinkShlLogicImm(SDNode *N);
bool tryVPTERNLOG(SDNode *N);		bool tryVPTERNLOG(SDNode *N);
bool tryVPTESTM(SDNode *Root, SDValue Setcc, SDValue Mask);		bool tryVPTESTM(SDNode *Root, SDValue Setcc, SDValue Mask, bool Invert);
bool tryMatchBitSelect(SDNode *N);		bool tryMatchBitSelect(SDNode *N);

MachineSDNode *emitPCMPISTR(unsigned ROpc, unsigned MOpc, bool MayFoldLoad,		MachineSDNode *emitPCMPISTR(unsigned ROpc, unsigned MOpc, bool MayFoldLoad,
const SDLoc &dl, MVT VT, SDNode *Node);		const SDLoc &dl, MVT VT, SDNode *Node);
MachineSDNode *emitPCMPESTR(unsigned ROpc, unsigned MOpc, bool MayFoldLoad,		MachineSDNode *emitPCMPESTR(unsigned ROpc, unsigned MOpc, bool MayFoldLoad,
const SDLoc &dl, MVT VT, SDNode *Node,		const SDLoc &dl, MVT VT, SDNode *Node,
SDValue &InFlag);		SDValue &InFlag);

▲ Show 20 Lines • Show All 3,428 Lines • ▼ Show 20 Lines	bool X86DAGToDAGISel::tryVPTERNLOG(SDNode *N) {
if (!NVT.isVector() \|\| !Subtarget->hasAVX512() \|\|		if (!NVT.isVector() \|\| !Subtarget->hasAVX512() \|\|
NVT.getVectorElementType() == MVT::i1)		NVT.getVectorElementType() == MVT::i1)
return false;		return false;

// We need VLX for 128/256-bit.		// We need VLX for 128/256-bit.
if (!(Subtarget->hasVLX() \|\| NVT.is512BitVector()))		if (!(Subtarget->hasVLX() \|\| NVT.is512BitVector()))
return false;		return false;

		unsigned NOpc = N->getOpcode();
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);

auto getFoldableLogicOp = [](SDValue Op) {		auto getFoldableLogicOp = [](SDValue Op) {
// Peek through single use bitcast.		// Peek through single use bitcast.
if (Op.getOpcode() == ISD::BITCAST && Op.hasOneUse())		if (Op.getOpcode() == ISD::BITCAST && Op.hasOneUse())
Op = Op.getOperand(0);		Op = Op.getOperand(0);

Show All 29 Lines	bool X86DAGToDAGISel::tryVPTERNLOG(SDNode *N) {
switch (FoldableOp.getOpcode()) {		switch (FoldableOp.getOpcode()) {
default: llvm_unreachable("Unexpected opcode!");		default: llvm_unreachable("Unexpected opcode!");
case ISD::AND: Imm = TernlogMagicB & TernlogMagicC; break;		case ISD::AND: Imm = TernlogMagicB & TernlogMagicC; break;
case ISD::OR: Imm = TernlogMagicB \| TernlogMagicC; break;		case ISD::OR: Imm = TernlogMagicB \| TernlogMagicC; break;
case ISD::XOR: Imm = TernlogMagicB ^ TernlogMagicC; break;		case ISD::XOR: Imm = TernlogMagicB ^ TernlogMagicC; break;
case X86ISD::ANDNP: Imm = ~(TernlogMagicB) & TernlogMagicC; break;		case X86ISD::ANDNP: Imm = ~(TernlogMagicB) & TernlogMagicC; break;
}		}

switch (N->getOpcode()) {		switch (NOpc) {
default: llvm_unreachable("Unexpected opcode!");		default: llvm_unreachable("Unexpected opcode!");
		case X86ISD::ANDNP:
		if (A == N0)
		Imm &= ~TernlogMagicA;
		else
		Imm = ~(Imm)&TernlogMagicA;
		break;
case ISD::AND: Imm &= TernlogMagicA; break;		case ISD::AND: Imm &= TernlogMagicA; break;
case ISD::OR: Imm \|= TernlogMagicA; break;		case ISD::OR: Imm \|= TernlogMagicA; break;
case ISD::XOR: Imm ^= TernlogMagicA; break;		case ISD::XOR: Imm ^= TernlogMagicA; break;
}		}

auto tryFoldLoadOrBCast =		auto tryFoldLoadOrBCast =
[this](SDNode Root, SDNode P, SDValue &L, SDValue &Base, SDValue &Scale,		[this](SDNode Root, SDNode P, SDValue &L, SDValue &Base, SDValue &Scale,
SDValue &Index, SDValue &Disp, SDValue &Segment) {		SDValue &Index, SDValue &Disp, SDValue &Segment) {
▲ Show 20 Lines • Show All 223 Lines • ▼ Show 20 Lines

#undef VPTESTM_FULL_CASES		#undef VPTESTM_FULL_CASES
#undef VPTESTM_BROADCAST_CASES		#undef VPTESTM_BROADCAST_CASES
#undef VPTESTM_CASE		#undef VPTESTM_CASE
}		}

// Try to create VPTESTM instruction. If InMask is not null, it will be used		// Try to create VPTESTM instruction. If InMask is not null, it will be used
// to form a masked operation.		// to form a masked operation.
bool X86DAGToDAGISel::tryVPTESTM(SDNode *Root, SDValue Setcc,		bool X86DAGToDAGISel::tryVPTESTM(SDNode *Root, SDValue Setcc, SDValue InMask,
SDValue InMask) {		bool Invert) {
assert(Subtarget->hasAVX512() && "Expected AVX512!");		assert(Subtarget->hasAVX512() && "Expected AVX512!");
assert(Setcc.getSimpleValueType().getVectorElementType() == MVT::i1 &&		assert(Setcc.getSimpleValueType().getVectorElementType() == MVT::i1 &&
"Unexpected VT!");		"Unexpected VT!");

// Look for equal and not equal compares.		// Look for equal and not equal compares.
ISD::CondCode CC = cast<CondCodeSDNode>(Setcc.getOperand(2))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Setcc.getOperand(2))->get();
if (CC != ISD::SETEQ && CC != ISD::SETNE)		if (CC != ISD::SETEQ && CC != ISD::SETNE)
return false;		return false;
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	if (IsMasked) {
unsigned RegClass = TLI->getRegClassFor(MaskVT)->getID();		unsigned RegClass = TLI->getRegClassFor(MaskVT)->getID();
SDValue RC = CurDAG->getTargetConstant(RegClass, dl, MVT::i32);		SDValue RC = CurDAG->getTargetConstant(RegClass, dl, MVT::i32);
InMask = SDValue(CurDAG->getMachineNode(TargetOpcode::COPY_TO_REGCLASS,		InMask = SDValue(CurDAG->getMachineNode(TargetOpcode::COPY_TO_REGCLASS,
dl, MaskVT, InMask, RC), 0);		dl, MaskVT, InMask, RC), 0);
}		}
}		}

bool IsTestN = CC == ISD::SETEQ;		bool IsTestN = CC == ISD::SETEQ;
		if (Invert)
		IsTestN = !IsTestN;

unsigned Opc = getVPTESTMOpc(CmpVT, IsTestN, FoldedLoad, FoldedBCast,		unsigned Opc = getVPTESTMOpc(CmpVT, IsTestN, FoldedLoad, FoldedBCast,
IsMasked);		IsMasked);

MachineSDNode *CNode;		MachineSDNode *CNode;
if (FoldedLoad) {		if (FoldedLoad) {
SDVTList VTs = CurDAG->getVTList(MaskVT, MVT::Other);		SDVTList VTs = CurDAG->getVTList(MaskVT, MVT::Other);

if (IsMasked) {		if (IsMasked) {
Show All 36 Lines	bool X86DAGToDAGISel::tryMatchBitSelect(SDNode *N) {
assert(N->getOpcode() == ISD::OR && "Unexpected opcode!");		assert(N->getOpcode() == ISD::OR && "Unexpected opcode!");

MVT NVT = N->getSimpleValueType(0);		MVT NVT = N->getSimpleValueType(0);

// Make sure we support VPTERNLOG.		// Make sure we support VPTERNLOG.
if (!NVT.isVector() \|\| !Subtarget->hasAVX512())		if (!NVT.isVector() \|\| !Subtarget->hasAVX512())
return false;		return false;

		if (!NVT.is128BitVector() && !NVT.is256BitVector() && !NVT.is512BitVector())
		return false;

// We need VLX for 128/256-bit.		// We need VLX for 128/256-bit.
if (!(Subtarget->hasVLX() \|\| NVT.is512BitVector()))		if (!(Subtarget->hasVLX() \|\| NVT.is512BitVector()))
return false;		return false;

SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);

// Canonicalize AND to LHS.		// Canonicalize AND to LHS.
▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	if (matchBitExtract(Node))
return;		return;
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case ISD::SRA:		case ISD::SRA:
case ISD::SHL:		case ISD::SHL:
if (tryShiftAmountMod(Node))		if (tryShiftAmountMod(Node))
return;		return;
break;		break;

		case X86ISD::ANDNP:
		if (NVT.isVector() && NVT.getVectorElementType() == MVT::i1) {
		SDValue N0 = Node->getOperand(0);
		SDValue N1 = Node->getOperand(1);
		// Try to form a masked VPTESTM
		if (N0.getOpcode() == ISD::SETCC && N0.hasOneUse() &&
		tryVPTESTM(Node, N0, N1, true))
		return;
		}
		if (tryVPTERNLOG(Node))
		return;
		break;

case ISD::AND:		case ISD::AND:
if (NVT.isVector() && NVT.getVectorElementType() == MVT::i1) {		if (NVT.isVector() && NVT.getVectorElementType() == MVT::i1) {
// Try to form a masked VPTESTM. Operands can be in either order.		// Try to form a masked VPTESTM. Operands can be in either order.
SDValue N0 = Node->getOperand(0);		SDValue N0 = Node->getOperand(0);
SDValue N1 = Node->getOperand(1);		SDValue N1 = Node->getOperand(1);
if (N0.getOpcode() == ISD::SETCC && N0.hasOneUse() &&		if (N0.getOpcode() == ISD::SETCC && N0.hasOneUse() &&
tryVPTESTM(Node, N0, N1))		tryVPTESTM(Node, N0, N1, false))
return;		return;
if (N1.getOpcode() == ISD::SETCC && N1.hasOneUse() &&		if (N1.getOpcode() == ISD::SETCC && N1.hasOneUse() &&
tryVPTESTM(Node, N1, N0))		tryVPTESTM(Node, N1, N0, false))
return;		return;
}		}

if (MachineSDNode *NewNode = matchBEXTRFromAndImm(Node)) {		if (MachineSDNode *NewNode = matchBEXTRFromAndImm(Node)) {
ReplaceUses(SDValue(Node, 0), SDValue(NewNode, 0));		ReplaceUses(SDValue(Node, 0), SDValue(NewNode, 0));
CurDAG->RemoveDeadNode(Node);		CurDAG->RemoveDeadNode(Node);
return;		return;
}		}
▲ Show 20 Lines • Show All 789 Lines • ▼ Show 20 Lines	case X86ISD::PCMPESTR: {
}		}
// Connect the flag usage to the last instruction created.		// Connect the flag usage to the last instruction created.
ReplaceUses(SDValue(Node, 2), SDValue(CNode, 1));		ReplaceUses(SDValue(Node, 2), SDValue(CNode, 1));
CurDAG->RemoveDeadNode(Node);		CurDAG->RemoveDeadNode(Node);
return;		return;
}		}

case ISD::SETCC: {		case ISD::SETCC: {
if (NVT.isVector() && tryVPTESTM(Node, SDValue(Node, 0), SDValue()))		if (NVT.isVector() && tryVPTESTM(Node, SDValue(Node, 0), SDValue(), false))
return;		return;

break;		break;
}		}

case ISD::STORE:		case ISD::STORE:
if (foldLoadStoreIntoMemOperand(Node))		if (foldLoadStoreIntoMemOperand(Node))
return;		return;
▲ Show 20 Lines • Show All 309 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 32,759 Lines • ▼ Show 20 Lines
	if (FValIsAllZeros) {			if (FValIsAllZeros) {
	SDValue CastLHS = DAG.getBitcast(CondVT, LHS);			SDValue CastLHS = DAG.getBitcast(CondVT, LHS);
	SDValue And = DAG.getNode(ISD::AND, DL, CondVT, Cond, CastLHS);			SDValue And = DAG.getNode(ISD::AND, DL, CondVT, Cond, CastLHS);
	return DAG.getBitcast(VT, And);			return DAG.getBitcast(VT, And);
	}			}

	// vselect Cond, 000..., X -> andn Cond, X			// vselect Cond, 000..., X -> andn Cond, X
	if (TValIsAllZeros) {			if (TValIsAllZeros) {
	MVT AndNVT = MVT::getVectorVT(MVT::i64, CondVT.getSizeInBits() / 64);			SDValue CastRHS = DAG.getBitcast(CondVT, RHS);
	SDValue CastCond = DAG.getBitcast(AndNVT, Cond);			SDValue AndN = DAG.getNode(X86ISD::ANDNP, DL, CondVT, Cond, CastRHS);
				RKSimonUnsubmitted Not Done Reply Inline Actions This looks a separate NFC-ish change - a legacy from when the bitops were legal for just vXi64 types RKSimon: This looks a separate NFC-ish change - a legacy from when the bitops were legal for just vXi64…
				craig.topperUnsubmitted Not Done Reply Inline Actions This is the buggy code that is creating ANDNP for vXi1. The bitcast part is NFCish. craig.topper: This is the buggy code that is creating ANDNP for vXi1. The bitcast part is NFCish.
	SDValue CastRHS = DAG.getBitcast(AndNVT, RHS);
	SDValue AndN = DAG.getNode(X86ISD::ANDNP, DL, AndNVT, CastCond, CastRHS);
	return DAG.getBitcast(VT, AndN);			return DAG.getBitcast(VT, AndN);
	}			}

	return SDValue();			return SDValue();
	}			}

	/// If both arms of a vector select are concatenated vectors, split the select,			/// If both arms of a vector select are concatenated vectors, split the select,
	/// and concatenate the result to eliminate a wide (256-bit) vector instruction:			/// and concatenate the result to eliminate a wide (256-bit) vector instruction:
	▲ Show 20 Lines • Show All 2,770 Lines • ▼ Show 20 Lines
	return OneBitOfTruth;			return OneBitOfTruth;
	}			}
	}			}
	}			}
	}			}
	return SDValue();			return SDValue();
	}			}

	/// Try to fold: (and (xor X, -1), Y) -> (andnp X, Y).			/// Try to fold:
	static SDValue combineANDXORWithAllOnesIntoANDNP(SDNode *N, SelectionDAG &DAG) {			/// (and (not X), Y) -> (andnp X, Y)
				/// (and (xor X, -1), Y) -> (andnp X, Y).
				static SDValue combineANDXORWithAllOnesIntoANDNP(SDNode *N, SelectionDAG &DAG,
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -static SDValue combineANDXORWithAllOnesIntoANDNP(SDNode N, SelectionDAG &DAG, - const X86Subtarget &Subtarget) { +static SDValue +combineANDXORWithAllOnesIntoANDNP(SDNode N, SelectionDAG &DAG, + const X86Subtarget &Subtarget) { Lint: Pre-merge checks: clang-format: please reformat the code ``` -static SDValue combineANDXORWithAllOnesIntoANDNP…
				const X86Subtarget &Subtarget) {
	assert(N->getOpcode() == ISD::AND);			assert(N->getOpcode() == ISD::AND);

	MVT VT = N->getSimpleValueType(0);			MVT VT = N->getSimpleValueType(0);
	if (!VT.is128BitVector() && !VT.is256BitVector() && !VT.is512BitVector())			if (!VT.is128BitVector() && !VT.is256BitVector() && !VT.is512BitVector() &&
				!(VT.getVectorElementType() == MVT::i1 && Subtarget.hasAVX512()))
	return SDValue();			return SDValue();

	SDValue X, Y;			SDValue X, Y;
	SDValue N0 = N->getOperand(0);			SDValue N0 = N->getOperand(0);
	SDValue N1 = N->getOperand(1);			SDValue N1 = N->getOperand(1);

	auto GetNot = [&VT, &DAG](SDValue V) {			auto GetNot = [&VT, &DAG](SDValue V) {
	// Basic X = NOT(Y) detection.			// Basic X = NOT(Y) detection.
	▲ Show 20 Lines • Show All 561 Lines • ▼ Show 20 Lines
	return SDValue();			return SDValue();

	if (SDValue R = combineCompareEqual(N, DAG, DCI, Subtarget))			if (SDValue R = combineCompareEqual(N, DAG, DCI, Subtarget))
	return R;			return R;

	if (SDValue FPLogic = convertIntLogicToFPLogic(N, DAG, Subtarget))			if (SDValue FPLogic = convertIntLogicToFPLogic(N, DAG, Subtarget))
	return FPLogic;			return FPLogic;

	if (SDValue R = combineANDXORWithAllOnesIntoANDNP(N, DAG))			if (SDValue R = combineANDXORWithAllOnesIntoANDNP(N, DAG, Subtarget))
	return R;			return R;

	if (SDValue ShiftRight = combineAndMaskToShift(N, DAG, Subtarget))			if (SDValue ShiftRight = combineAndMaskToShift(N, DAG, Subtarget))
	return ShiftRight;			return ShiftRight;

	if (SDValue R = combineAndLoadToBZHI(N, DAG, Subtarget))			if (SDValue R = combineAndLoadToBZHI(N, DAG, Subtarget))
	return R;			return R;

	▲ Show 20 Lines • Show All 7,370 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,988 Lines • ▼ Show 20 Lines	defm W : avx512_mask_binop<opc, !strconcat(OpcodeStr, "w"), VK16, OpNode,
sched, prdW, IsCommutable>, VEX_4V, VEX_L, PS;		sched, prdW, IsCommutable>, VEX_4V, VEX_L, PS;
defm D : avx512_mask_binop<opc, !strconcat(OpcodeStr, "d"), VK32, OpNode,		defm D : avx512_mask_binop<opc, !strconcat(OpcodeStr, "d"), VK32, OpNode,
sched, HasBWI, IsCommutable>, VEX_4V, VEX_L, VEX_W, PD;		sched, HasBWI, IsCommutable>, VEX_4V, VEX_L, VEX_W, PD;
defm Q : avx512_mask_binop<opc, !strconcat(OpcodeStr, "q"), VK64, OpNode,		defm Q : avx512_mask_binop<opc, !strconcat(OpcodeStr, "q"), VK64, OpNode,
sched, HasBWI, IsCommutable>, VEX_4V, VEX_L, VEX_W, PS;		sched, HasBWI, IsCommutable>, VEX_4V, VEX_L, VEX_W, PS;
}		}

// These nodes use 'vnot' instead of 'not' to support vectors.		// These nodes use 'vnot' instead of 'not' to support vectors.
def vandn : PatFrag<(ops node:$i0, node:$i1), (and (vnot node:$i0), node:$i1)>;
def vxnor : PatFrag<(ops node:$i0, node:$i1), (vnot (xor node:$i0, node:$i1))>;		def vxnor : PatFrag<(ops node:$i0, node:$i1), (vnot (xor node:$i0, node:$i1))>;

// TODO - do we need a X86SchedWriteWidths::KMASK type?		// TODO - do we need a X86SchedWriteWidths::KMASK type?
defm KAND : avx512_mask_binop_all<0x41, "kand", and, SchedWriteVecLogic.XMM, 1>;		defm KAND : avx512_mask_binop_all<0x41, "kand", and, SchedWriteVecLogic.XMM, 1>;
defm KOR : avx512_mask_binop_all<0x45, "kor", or, SchedWriteVecLogic.XMM, 1>;		defm KOR : avx512_mask_binop_all<0x45, "kor", or, SchedWriteVecLogic.XMM, 1>;
defm KXNOR : avx512_mask_binop_all<0x46, "kxnor", vxnor, SchedWriteVecLogic.XMM, 1>;		defm KXNOR : avx512_mask_binop_all<0x46, "kxnor", vxnor, SchedWriteVecLogic.XMM, 1>;
defm KXOR : avx512_mask_binop_all<0x47, "kxor", xor, SchedWriteVecLogic.XMM, 1>;		defm KXOR : avx512_mask_binop_all<0x47, "kxor", xor, SchedWriteVecLogic.XMM, 1>;
defm KANDN : avx512_mask_binop_all<0x42, "kandn", vandn, SchedWriteVecLogic.XMM, 0>;		defm KANDN : avx512_mask_binop_all<0x42, "kandn", X86andnp, SchedWriteVecLogic.XMM, 0>;
defm KADD : avx512_mask_binop_all<0x4A, "kadd", X86kadd, SchedWriteVecLogic.XMM, 1, HasDQI>;		defm KADD : avx512_mask_binop_all<0x4A, "kadd", X86kadd, SchedWriteVecLogic.XMM, 1, HasDQI>;

multiclass avx512_binop_pat<SDPatternOperator VOpNode,		multiclass avx512_binop_pat<SDPatternOperator VOpNode,
Instruction Inst> {		Instruction Inst> {
// With AVX512F, 8-bit mask is promoted to 16-bit mask,		// With AVX512F, 8-bit mask is promoted to 16-bit mask,
// for the DQI set, this type is legal and KxxxB instruction is used		// for the DQI set, this type is legal and KxxxB instruction is used
let Predicates = [NoDQI] in		let Predicates = [NoDQI] in
def : Pat<(VOpNode VK8:$src1, VK8:$src2),		def : Pat<(VOpNode VK8:$src1, VK8:$src2),
Show All 11 Lines	def : Pat<(VOpNode VK2:$src1, VK2:$src2),
(COPY_TO_REGCLASS VK2:$src1, VK16),		(COPY_TO_REGCLASS VK2:$src1, VK16),
(COPY_TO_REGCLASS VK2:$src2, VK16)), VK2)>;		(COPY_TO_REGCLASS VK2:$src2, VK16)), VK2)>;
def : Pat<(VOpNode VK4:$src1, VK4:$src2),		def : Pat<(VOpNode VK4:$src1, VK4:$src2),
(COPY_TO_REGCLASS (Inst		(COPY_TO_REGCLASS (Inst
(COPY_TO_REGCLASS VK4:$src1, VK16),		(COPY_TO_REGCLASS VK4:$src1, VK16),
(COPY_TO_REGCLASS VK4:$src2, VK16)), VK4)>;		(COPY_TO_REGCLASS VK4:$src2, VK16)), VK4)>;
}		}

defm : avx512_binop_pat<and, KANDWrr>;		defm : avx512_binop_pat<and, KANDWrr>;
defm : avx512_binop_pat<vandn, KANDNWrr>;		defm : avx512_binop_pat<X86andnp, KANDNWrr>;
defm : avx512_binop_pat<or, KORWrr>;		defm : avx512_binop_pat<or, KORWrr>;
defm : avx512_binop_pat<vxnor, KXNORWrr>;		defm : avx512_binop_pat<vxnor, KXNORWrr>;
defm : avx512_binop_pat<xor, KXORWrr>;		defm : avx512_binop_pat<xor, KXORWrr>;

// Mask unpacking		// Mask unpacking
multiclass avx512_mask_unpck<string Suffix, X86KVectorVTInfo Dst,		multiclass avx512_mask_unpck<string Suffix, X86KVectorVTInfo Dst,
X86KVectorVTInfo Src, X86FoldableSchedWrite sched,		X86KVectorVTInfo Src, X86FoldableSchedWrite sched,
Predicate prd> {		Predicate prd> {
let Predicates = [prd] in {		let Predicates = [prd] in {
let hasSideEffects = 0 in		let hasSideEffects = 0 in
def rr : I<0x4b, MRMSrcReg, (outs Dst.KRC:$dst),		def rr : I<0x4b, MRMSrcReg, (outs Dst.KRC:$dst),
▲ Show 20 Lines • Show All 9,628 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512-select.ll

Show First 20 Lines • Show All 699 Lines • ▼ Show 20 Lines	; X64-AVX512BW-NEXT: retq
%a = load <1 x i1>, <1 x i1>* %x		%a = load <1 x i1>, <1 x i1>* %x
%b = load <1 x i1>, <1 x i1>* %y		%b = load <1 x i1>, <1 x i1>* %y
%b2 = load <1 x i1>, <1 x i1>* %w		%b2 = load <1 x i1>, <1 x i1>* %w
%b3 = xor <1 x i1> %b, %b2		%b3 = xor <1 x i1> %b, %b2
%c = select i1 %z, <1 x i1> %a, <1 x i1> %b3		%c = select i1 %z, <1 x i1> %a, <1 x i1> %b3
store <1 x i1> %c, <1 x i1>* %x		store <1 x i1> %c, <1 x i1>* %x
ret void		ret void
}		}

		; Regression test from https://github.com/JuliaLang/julia/issues/36955
		define i8 @julia_issue36955(<8 x i1> %mask, <8 x double> %a) {
		; X86-AVX512F-LABEL: julia_issue36955:
		; X86-AVX512F: # %bb.0:
		; X86-AVX512F-NEXT: vpmovsxwq %xmm0, %zmm0
		; X86-AVX512F-NEXT: vpsllq $63, %zmm0, %zmm0
		; X86-AVX512F-NEXT: vptestmq %zmm0, %zmm0, %k0
		; X86-AVX512F-NEXT: vpxor %xmm0, %xmm0, %xmm0
		; X86-AVX512F-NEXT: vcmpnlepd %zmm0, %zmm1, %k1
		; X86-AVX512F-NEXT: kandnw %k0, %k1, %k0
		; X86-AVX512F-NEXT: kandw %k1, %k0, %k0
		; X86-AVX512F-NEXT: knotw %k1, %k1
		; X86-AVX512F-NEXT: korw %k1, %k0, %k0
		; X86-AVX512F-NEXT: kmovw %k0, %eax
		; X86-AVX512F-NEXT: # kill: def $al killed $al killed $eax
		; X86-AVX512F-NEXT: vzeroupper
		; X86-AVX512F-NEXT: retl
		;
		; X64-AVX512F-LABEL: julia_issue36955:
		; X64-AVX512F: # %bb.0:
		; X64-AVX512F-NEXT: vpmovsxwq %xmm0, %zmm0
		; X64-AVX512F-NEXT: vpsllq $63, %zmm0, %zmm0
		; X64-AVX512F-NEXT: vptestmq %zmm0, %zmm0, %k0
		; X64-AVX512F-NEXT: vpxor %xmm0, %xmm0, %xmm0
		; X64-AVX512F-NEXT: vcmpnlepd %zmm0, %zmm1, %k1
		; X64-AVX512F-NEXT: kandnw %k0, %k1, %k0
		; X64-AVX512F-NEXT: kandw %k1, %k0, %k0
		; X64-AVX512F-NEXT: knotw %k1, %k1
		; X64-AVX512F-NEXT: korw %k1, %k0, %k0
		; X64-AVX512F-NEXT: kmovw %k0, %eax
		; X64-AVX512F-NEXT: # kill: def $al killed $al killed $eax
		; X64-AVX512F-NEXT: vzeroupper
		; X64-AVX512F-NEXT: retq
		;
		; X86-AVX512BW-LABEL: julia_issue36955:
		; X86-AVX512BW: # %bb.0:
		; X86-AVX512BW-NEXT: vpsllw $15, %xmm0, %xmm0
		; X86-AVX512BW-NEXT: vpmovw2m %zmm0, %k0
		; X86-AVX512BW-NEXT: vpxor %xmm0, %xmm0, %xmm0
		; X86-AVX512BW-NEXT: vcmpnlepd %zmm0, %zmm1, %k1
		; X86-AVX512BW-NEXT: kandnw %k0, %k1, %k0
		; X86-AVX512BW-NEXT: kandw %k1, %k0, %k0
		; X86-AVX512BW-NEXT: knotw %k1, %k1
		; X86-AVX512BW-NEXT: korw %k1, %k0, %k0
		; X86-AVX512BW-NEXT: kmovd %k0, %eax
		; X86-AVX512BW-NEXT: # kill: def $al killed $al killed $eax
		; X86-AVX512BW-NEXT: vzeroupper
		; X86-AVX512BW-NEXT: retl
		;
		; X64-AVX512BW-LABEL: julia_issue36955:
		; X64-AVX512BW: # %bb.0:
		; X64-AVX512BW-NEXT: vpsllw $15, %xmm0, %xmm0
		; X64-AVX512BW-NEXT: vpmovw2m %zmm0, %k0
		; X64-AVX512BW-NEXT: vpxor %xmm0, %xmm0, %xmm0
		; X64-AVX512BW-NEXT: vcmpnlepd %zmm0, %zmm1, %k1
		; X64-AVX512BW-NEXT: kandnw %k0, %k1, %k0
		; X64-AVX512BW-NEXT: kandw %k1, %k0, %k0
		; X64-AVX512BW-NEXT: knotw %k1, %k1
		; X64-AVX512BW-NEXT: korw %k1, %k0, %k0
		; X64-AVX512BW-NEXT: kmovd %k0, %eax
		; X64-AVX512BW-NEXT: # kill: def $al killed $al killed $eax
		; X64-AVX512BW-NEXT: vzeroupper
		; X64-AVX512BW-NEXT: retq
		%fcmp = fcmp ugt <8 x double> %a, zeroinitializer
		%xor = xor <8 x i1> %fcmp, <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
		%select1 = select <8 x i1> %fcmp, <8 x i1> zeroinitializer, <8 x i1> %mask
		%select2 = select <8 x i1> %xor, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i1> %select1
		%ret = bitcast <8 x i1> %select2 to i8
		ret i8 %ret
		}

llvm/test/CodeGen/X86/combine-bitselect.ll

	Show First 20 Lines • Show All 1,046 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpcmpeqd %xmm3, %xmm1, %xmm1			; AVX2-NEXT: vpcmpeqd %xmm3, %xmm1, %xmm1
	; AVX2-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0			; AVX2-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: bitselect_v4i1_loop:			; AVX512F-LABEL: bitselect_v4i1_loop:
	; AVX512F: # %bb.0: # %bb			; AVX512F: # %bb.0: # %bb
	; AVX512F-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; AVX512F-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; AVX512F-NEXT: vpcmpeqd {{.*}}(%rip){1to16}, %zmm1, %k1			; AVX512F-NEXT: vptestmd %zmm0, %zmm0, %k1
	; AVX512F-NEXT: vpcmpeqd {{.*}}(%rip){1to16}, %zmm1, %k2			; AVX512F-NEXT: vpcmpeqd {{.*}}(%rip){1to16}, %zmm1, %k0
	; AVX512F-NEXT: vptestnmd %zmm0, %zmm0, %k0 {%k2}			; AVX512F-NEXT: kandnw %k0, %k1, %k0
	; AVX512F-NEXT: vptestmd %zmm0, %zmm0, %k1 {%k1}			; AVX512F-NEXT: vpcmpeqd {{.*}}(%rip){1to16}, %zmm1, %k1 {%k1}
	; AVX512F-NEXT: korw %k0, %k1, %k1			; AVX512F-NEXT: korw %k0, %k1, %k1
	; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
	; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0			; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: bitselect_v4i1_loop:			; AVX512VL-LABEL: bitselect_v4i1_loop:
	; AVX512VL: # %bb.0: # %bb			; AVX512VL: # %bb.0: # %bb
	; AVX512VL-NEXT: vpcmpeqd {{.*}}(%rip){1to4}, %xmm1, %k1			; AVX512VL-NEXT: vptestmd %xmm0, %xmm0, %k1
	; AVX512VL-NEXT: vpcmpeqd {{.*}}(%rip){1to4}, %xmm1, %k2			; AVX512VL-NEXT: vpcmpeqd {{.*}}(%rip){1to4}, %xmm1, %k0
	; AVX512VL-NEXT: vptestnmd %xmm0, %xmm0, %k0 {%k2}			; AVX512VL-NEXT: vpcmpeqd {{.*}}(%rip){1to4}, %xmm1, %k2 {%k1}
	; AVX512VL-NEXT: vptestmd %xmm0, %xmm0, %k1 {%k1}			; AVX512VL-NEXT: kandnw %k0, %k1, %k0
	; AVX512VL-NEXT: korw %k0, %k1, %k1			; AVX512VL-NEXT: korw %k0, %k2, %k1
	; AVX512VL-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512VL-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512VL-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}			; AVX512VL-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	bb:			bb:
	%tmp = icmp ne <4 x i32> %a0, zeroinitializer			%tmp = icmp ne <4 x i32> %a0, zeroinitializer
	%tmp2 = icmp eq <4 x i32> %a1, <i32 12, i32 12, i32 12, i32 12>			%tmp2 = icmp eq <4 x i32> %a1, <i32 12, i32 12, i32 12, i32 12>
	%tmp3 = icmp eq <4 x i32> %a1, <i32 15, i32 15, i32 15, i32 15>			%tmp3 = icmp eq <4 x i32> %a1, <i32 15, i32 15, i32 15, i32 15>
	%tmp4 = select <4 x i1> %tmp, <4 x i1> %tmp2, <4 x i1> %tmp3			%tmp4 = select <4 x i1> %tmp, <4 x i1> %tmp2, <4 x i1> %tmp3
	ret <4 x i1> %tmp4			ret <4 x i1> %tmp4
	}			}

llvm/test/CodeGen/X86/ssub_sat_vec.ll

	Show First 20 Lines • Show All 606 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpand %xmm2, %xmm0, %xmm0			; AVX2-NEXT: vpand %xmm2, %xmm0, %xmm0
	; AVX2-NEXT: vpsubsb %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpsubsb %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX2-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0			; AVX2-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: v16i1:			; AVX512F-LABEL: v16i1:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpmovsxbd %xmm0, %zmm0
	; AVX512F-NEXT: vpslld $31, %zmm0, %zmm0
	; AVX512F-NEXT: vpmovsxbd %xmm1, %zmm1			; AVX512F-NEXT: vpmovsxbd %xmm1, %zmm1
	; AVX512F-NEXT: vpslld $31, %zmm1, %zmm1			; AVX512F-NEXT: vpslld $31, %zmm1, %zmm1
	; AVX512F-NEXT: vptestnmd %zmm1, %zmm1, %k1			; AVX512F-NEXT: vpmovsxbd %xmm0, %zmm0
	; AVX512F-NEXT: vptestmd %zmm0, %zmm0, %k1 {%k1}			; AVX512F-NEXT: vpslld $31, %zmm0, %zmm0
				; AVX512F-NEXT: vptestmd %zmm0, %zmm0, %k1
				; AVX512F-NEXT: vptestnmd %zmm1, %zmm1, %k1 {%k1}
	; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
	; AVX512F-NEXT: vpmovdb %zmm0, %xmm0			; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: v16i1:			; AVX512BW-LABEL: v16i1:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsllw $7, %xmm0, %xmm0			; AVX512BW-NEXT: vpsllw $7, %xmm0, %xmm0
	▲ Show 20 Lines • Show All 1,641 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/usub_sat_vec.ll

	Show First 20 Lines • Show All 539 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpand %xmm2, %xmm0, %xmm0			; AVX2-NEXT: vpand %xmm2, %xmm0, %xmm0
	; AVX2-NEXT: vpsubusb %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpsubusb %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpsrlw $7, %xmm0, %xmm0			; AVX2-NEXT: vpsrlw $7, %xmm0, %xmm0
	; AVX2-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0			; AVX2-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: v16i1:			; AVX512F-LABEL: v16i1:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpmovsxbd %xmm0, %zmm0
	; AVX512F-NEXT: vpslld $31, %zmm0, %zmm0
	; AVX512F-NEXT: vpmovsxbd %xmm1, %zmm1			; AVX512F-NEXT: vpmovsxbd %xmm1, %zmm1
	; AVX512F-NEXT: vpslld $31, %zmm1, %zmm1			; AVX512F-NEXT: vpslld $31, %zmm1, %zmm1
	; AVX512F-NEXT: vptestnmd %zmm1, %zmm1, %k1			; AVX512F-NEXT: vpmovsxbd %xmm0, %zmm0
	; AVX512F-NEXT: vptestmd %zmm0, %zmm0, %k1 {%k1}			; AVX512F-NEXT: vpslld $31, %zmm0, %zmm0
				; AVX512F-NEXT: vptestmd %zmm0, %zmm0, %k1
				; AVX512F-NEXT: vptestnmd %zmm1, %zmm1, %k1 {%k1}
	; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
	; AVX512F-NEXT: vpmovdb %zmm0, %xmm0			; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: v16i1:			; AVX512BW-LABEL: v16i1:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsllw $7, %xmm0, %xmm0			; AVX512BW-NEXT: vpsllw $7, %xmm0, %xmm0
	▲ Show 20 Lines • Show All 764 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec_ssubo.ll

	Show First 20 Lines • Show All 1,129 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: ssubo_v4i1:			; AVX512-LABEL: ssubo_v4i1:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpslld $31, %xmm1, %xmm1			; AVX512-NEXT: vpslld $31, %xmm1, %xmm1
	; AVX512-NEXT: vptestmd %xmm1, %xmm1, %k0			; AVX512-NEXT: vptestmd %xmm1, %xmm1, %k0
	; AVX512-NEXT: vpslld $31, %xmm0, %xmm0			; AVX512-NEXT: vpslld $31, %xmm0, %xmm0
	; AVX512-NEXT: vptestmd %xmm0, %xmm0, %k1			; AVX512-NEXT: vptestmd %xmm0, %xmm0, %k1
	; AVX512-NEXT: vptestnmd %xmm1, %xmm1, %k2 {%k1}			; AVX512-NEXT: kandnw %k1, %k0, %k2
	; AVX512-NEXT: kxorw %k0, %k1, %k0			; AVX512-NEXT: kxorw %k0, %k1, %k0
	; AVX512-NEXT: kxorw %k2, %k0, %k1			; AVX512-NEXT: kxorw %k2, %k0, %k1
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}			; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
	; AVX512-NEXT: kmovd %k0, %eax			; AVX512-NEXT: kmovd %k0, %eax
	; AVX512-NEXT: movb %al, (%rdi)			; AVX512-NEXT: movb %al, (%rdi)
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%t = call {<4 x i1>, <4 x i1>} @llvm.ssub.with.overflow.v4i1(<4 x i1> %a0, <4 x i1> %a1)			%t = call {<4 x i1>, <4 x i1>} @llvm.ssub.with.overflow.v4i1(<4 x i1> %a0, <4 x i1> %a1)
	▲ Show 20 Lines • Show All 287 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec_usubo.ll

	Show First 20 Lines • Show All 1,215 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vmovmskps %xmm1, %eax			; AVX2-NEXT: vmovmskps %xmm1, %eax
	; AVX2-NEXT: movb %al, (%rdi)			; AVX2-NEXT: movb %al, (%rdi)
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: usubo_v4i1:			; AVX512-LABEL: usubo_v4i1:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpslld $31, %xmm0, %xmm0			; AVX512-NEXT: vpslld $31, %xmm0, %xmm0
	; AVX512-NEXT: vptestmd %xmm0, %xmm0, %k0			; AVX512-NEXT: vptestmd %xmm0, %xmm0, %k0
	; AVX512-NEXT: vpslld $31, %xmm1, %xmm1			; AVX512-NEXT: vpslld $31, %xmm1, %xmm0
	; AVX512-NEXT: vptestmd %xmm1, %xmm1, %k1			; AVX512-NEXT: vptestmd %xmm0, %xmm0, %k1
	; AVX512-NEXT: kxorw %k1, %k0, %k1			; AVX512-NEXT: kxorw %k1, %k0, %k1
	; AVX512-NEXT: vptestnmd %xmm0, %xmm0, %k2 {%k1}			; AVX512-NEXT: kandnw %k1, %k0, %k2
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k2} {z}			; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k2} {z}
	; AVX512-NEXT: kmovd %k1, %eax			; AVX512-NEXT: kmovd %k1, %eax
	; AVX512-NEXT: movb %al, (%rdi)			; AVX512-NEXT: movb %al, (%rdi)
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%t = call {<4 x i1>, <4 x i1>} @llvm.usub.with.overflow.v4i1(<4 x i1> %a0, <4 x i1> %a1)			%t = call {<4 x i1>, <4 x i1>} @llvm.usub.with.overflow.v4i1(<4 x i1> %a0, <4 x i1> %a1)
	%val = extractvalue {<4 x i1>, <4 x i1>} %t, 0			%val = extractvalue {<4 x i1>, <4 x i1>} %t, 0
	%obit = extractvalue {<4 x i1>, <4 x i1>} %t, 1			%obit = extractvalue {<4 x i1>, <4 x i1>} %t, 1
	▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines