This is an archive of the discontinued LLVM Phabricator instance.

[X86] Handle inverted inputs when matching VPTERNLOG from 2 binary ops.
ClosedPublic

Authored by craig.topper on Sep 5 2021, 10:12 AM.

Download Raw Diff

Details

Reviewers

pengfei
RKSimon
spatel
LuoYuanke

Commits

rGda3ef8b75612: [X86] Handle inverted inputs when matching VPTERNLOG from 2 binary ops.

Summary

This is a more general version of D109273. Though it doesn't
peek through bitcasts or rearange broadcasts.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.Sep 5 2021, 10:12 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 5 2021, 10:12 AM

craig.topper requested review of this revision.Sep 5 2021, 10:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2021, 10:12 AM

craig.topper mentioned this in D109273: [X86] Fold (and (or (xor X, -1), Y), Z) -> PTERNLOG Z, Y, X, 0xD0.Sep 5 2021, 10:14 AM

craig.topper added a reviewer: LuoYuanke.

LGTM but @pengfei might have seen some other cases that D109273 would address.

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
4048	Missing assert messages
llvm/test/CodeGen/X86/avx512vl-logic.ll
989	Please can you precommit this?

Harbormaster completed remote builds in B122695: Diff 370816.Sep 5 2021, 10:57 AM

• hafixo added a commit: rCRT373035: hwasan: Compatibility fixes for short granules..Sep 6 2021, 12:44 AM

• hafixo added a commit: rGc336557f0238: hwasan: Compatibility fixes for short granules..Sep 6 2021, 12:47 AM

LuoYuanke added inline comments.Sep 6 2021, 5:23 AM

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
4231	It seems for VPTERNLOG instruction we can accept the 4th operand whose value is allZero or allOne no matter what is logic operation is.
4233	Is the constant operand canonicalized as operand(1)?
4245	C.getOperand(0)?

LuoYuanke added inline comments.Sep 6 2021, 5:25 AM

llvm/test/CodeGen/X86/avx512vl-logic.ll
985	Miss the test case for ~B and ~C?

In D109295#2984246, @RKSimon wrote:

LGTM but @pengfei might have seen some other cases that D109273 would address.

The general approach looks great. I don't have other cases. Thanks Craig.

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
4231	Do you mean the FALSE and TRUE in table 5-10 and 5-11? I think we don't need a VPTERNLOG to generate allZero and allOne.
4233	We checked it in line 4225.

LuoYuanke added inline comments.Sep 6 2021, 6:03 AM

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
4231	VPTERNLOG select all the possible result of 3 bits. I mean it can be extent to 4 bit as long as the 4th bit is compile-time fixed 0 or 1. For this case the node is xor (X, -1), the same approach can be applied to xor(X, 0), and(X, -1), andnp(X, 0) and so on.

I found another example:

define dso_local <4 x i64> @foo2(<4 x i64> %0, <4 x i64> %1, <4 x i64> %2) {
  %4 = xor <4 x i64> %2, <i64 -1, i64 -1, i64 -1, i64 -1>
  %5 = or <4 x i64> %4, %1
  %6 = or <4 x i64> %0, %1
  %7 = and <4 x i64> %5, %6
  ret <4 x i64> %7
}

Can we simply it to below in the approach?

vpor    %ymm1, %ymm0, %ymm0
vpternlogq      $208, %ymm2, %ymm1, %ymm0
retq

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
4231	But other cases can be simplied directly, e.g. xor(X, 0) -> X, and(X, -1) -> X, andnp(X, 0) -> 0 etc.

In D109295#2985246, @pengfei wrote:

I found another example:

define dso_local <4 x i64> @foo2(<4 x i64> %0, <4 x i64> %1, <4 x i64> %2) {
  %4 = xor <4 x i64> %2, <i64 -1, i64 -1, i64 -1, i64 -1>
  %5 = or <4 x i64> %4, %1
  %6 = or <4 x i64> %0, %1
  %7 = and <4 x i64> %5, %6
  ret <4 x i64> %7
}

Can we simply it to below in the approach?

vpor    %ymm1, %ymm0, %ymm0
vpternlogq      $208, %ymm2, %ymm1, %ymm0
retq

Seem no with current vpternlog framework. We currently only support A op1 (B op2 C). Not figured out how to extend the framework to accept more operators as long as there is 3 source bit.

In D109295#2985347, @LuoYuanke wrote:

In D109295#2985246, @pengfei wrote:

I found another example:

define dso_local <4 x i64> @foo2(<4 x i64> %0, <4 x i64> %1, <4 x i64> %2) {
  %4 = xor <4 x i64> %2, <i64 -1, i64 -1, i64 -1, i64 -1>
  %5 = or <4 x i64> %4, %1
  %6 = or <4 x i64> %0, %1
  %7 = and <4 x i64> %5, %6
  ret <4 x i64> %7
}

Can we simply it to below in the approach?

vpor    %ymm1, %ymm0, %ymm0
vpternlogq      $208, %ymm2, %ymm1, %ymm0
retq

How about gcc? Can gcc generate one vpternlogq instruction?

In D109295#2985347, @LuoYuanke wrote:
In D109295#2985246, @pengfei wrote:
I found another example:
define dso_local <4 x i64> @foo2(<4 x i64> %0, <4 x i64> %1, <4 x i64> %2) {
  %4 = xor <4 x i64> %2, <i64 -1, i64 -1, i64 -1, i64 -1>
  %5 = or <4 x i64> %4, %1
  %6 = or <4 x i64> %0, %1
  %7 = and <4 x i64> %5, %6
  ret <4 x i64> %7
}
Can we simply it to below in the approach?
vpor    %ymm1, %ymm0, %ymm0
vpternlogq      $208, %ymm2, %ymm1, %ymm0
retq
Seem no with current vpternlog framework. We currently only support A op1 (B op2 C). Not figured out how to extend the framework to accept more operators as long as there is 3 source bit.

I meant simplified from current generation:

vpcmpeqd        %ymm3, %ymm3, %ymm3
vpternlogq      $222, %ymm2, %ymm1, %ymm3
vpternlogq      $200, %ymm1, %ymm3, %ymm0
retq

We can save one vpternlogq.

I meant simplified from current generation:
vpcmpeqd        %ymm3, %ymm3, %ymm3
vpternlogq      $222, %ymm2, %ymm1, %ymm3
vpternlogq      $200, %ymm1, %ymm3, %ymm0
retq
We can save one vpternlogq.

I think we may have another algorithm which iterate 8 possible composition of 3 bits and calculate the result with multi-operates and get the immediate operand of VPTERNLOGD.

VPTERNLOGD reg1, reg2, src3
Bit(reg1) Bit(reg2) Bit(src3)
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1

craig.topper added inline comments.Sep 6 2021, 10:14 AM

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
4231	Right those all should have been simplified by DAGCombine.
4233	DAGCombine should canonicalize XOR with constant to have it on the RHS.

craig.topper mentioned this in rGbf5a31bb9a90: [X86] Pre-commit test cases for D109295. NFC.Sep 6 2021, 10:30 AM

Address review comments. Rebase after pre-committing tests.

Rename IsNot lambda to PeekThroughNot and sync more code into it.

Also some more cases for ternlog

__m512i notBorC(__m512i B, __m512i C) {
    return ~(B|C); // 0x11 
}

__m512i notBandC(__m512i B, __m512i C) {
    return ~(B&C); // 0x77
}

__m512i notBxorC(__m512i B, __m512i C) {
    return ~(B^C); // 0x99
}

In D109295#2985534, @xbolva00 wrote:

Also some more cases for ternlog

__m512i notBorC(__m512i B, __m512i C) {
    return ~(B|C); // 0x11 
}

__m512i notBandC(__m512i B, __m512i C) {
    return ~(B&C); // 0x77
}

__m512i notBxorC(__m512i B, __m512i C) {
    return ~(B^C); // 0x99
}

We should be a little careful there. As far as I know, vpternlog doesn't break dependencies on inputs that aren't used by the immediate. So we should try to use one of the other registers twice to prevent false dependencies. If we can fold a load, we need to make sure we don't duplicate that register and prevent the folding.

Harbormaster completed remote builds in B122783: Diff 370943.Sep 6 2021, 12:13 PM

LGTM, thanks.

This revision is now accepted and ready to land.Sep 6 2021, 5:21 PM

pengfei accepted this revision.Sep 6 2021, 5:46 PM

Closed by commit rGda3ef8b75612: [X86] Handle inverted inputs when matching VPTERNLOG from 2 binary ops. (authored by craig.topper). · Explain WhySep 6 2021, 5:48 PM

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rGda3ef8b75612: [X86] Handle inverted inputs when matching VPTERNLOG from 2 binary ops..

thopre removed a commit: rGc336557f0238: hwasan: Compatibility fixes for short granules..Sep 7 2021, 2:47 AM

thopre removed a commit: rCRT373035: hwasan: Compatibility fixes for short granules..Sep 7 2021, 2:51 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelDAGToDAG.cpp

51 lines

test/

CodeGen/

X86/

avx512vl-logic.ll

11 lines

Diff 370975

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 498 Lines • ▼ Show 20 Lines	private:
bool foldLoadStoreIntoMemOperand(SDNode *Node);		bool foldLoadStoreIntoMemOperand(SDNode *Node);
MachineSDNode matchBEXTRFromAndImm(SDNode Node);		MachineSDNode matchBEXTRFromAndImm(SDNode Node);
bool matchBitExtract(SDNode *Node);		bool matchBitExtract(SDNode *Node);
bool shrinkAndImmediate(SDNode *N);		bool shrinkAndImmediate(SDNode *N);
bool isMaskZeroExtended(SDNode *N) const;		bool isMaskZeroExtended(SDNode *N) const;
bool tryShiftAmountMod(SDNode *N);		bool tryShiftAmountMod(SDNode *N);
bool tryShrinkShlLogicImm(SDNode *N);		bool tryShrinkShlLogicImm(SDNode *N);
bool tryVPTERNLOG(SDNode *N);		bool tryVPTERNLOG(SDNode *N);
bool matchVPTERNLOG(SDNode Root, SDNode ParentA, SDNode *ParentBC,		bool matchVPTERNLOG(SDNode Root, SDNode ParentA, SDNode *ParentB,
SDValue A, SDValue B, SDValue C, uint8_t Imm);		SDNode *ParentC, SDValue A, SDValue B, SDValue C,
		uint8_t Imm);
bool tryVPTESTM(SDNode *Root, SDValue Setcc, SDValue Mask);		bool tryVPTESTM(SDNode *Root, SDValue Setcc, SDValue Mask);
bool tryMatchBitSelect(SDNode *N);		bool tryMatchBitSelect(SDNode *N);

MachineSDNode *emitPCMPISTR(unsigned ROpc, unsigned MOpc, bool MayFoldLoad,		MachineSDNode *emitPCMPISTR(unsigned ROpc, unsigned MOpc, bool MayFoldLoad,
const SDLoc &dl, MVT VT, SDNode *Node);		const SDLoc &dl, MVT VT, SDNode *Node);
MachineSDNode *emitPCMPESTR(unsigned ROpc, unsigned MOpc, bool MayFoldLoad,		MachineSDNode *emitPCMPESTR(unsigned ROpc, unsigned MOpc, bool MayFoldLoad,
const SDLoc &dl, MVT VT, SDNode *Node,		const SDLoc &dl, MVT VT, SDNode *Node,
SDValue &InFlag);		SDValue &InFlag);
▲ Show 20 Lines • Show All 3,521 Lines • ▼ Show 20 Lines	bool X86DAGToDAGISel::tryShrinkShlLogicImm(SDNode *N) {
SDValue NewSHL = CurDAG->getNode(ISD::SHL, dl, NVT, NewBinOp,		SDValue NewSHL = CurDAG->getNode(ISD::SHL, dl, NVT, NewBinOp,
Shift.getOperand(1));		Shift.getOperand(1));
ReplaceNode(N, NewSHL.getNode());		ReplaceNode(N, NewSHL.getNode());
SelectCode(NewSHL.getNode());		SelectCode(NewSHL.getNode());
return true;		return true;
}		}

bool X86DAGToDAGISel::matchVPTERNLOG(SDNode Root, SDNode ParentA,		bool X86DAGToDAGISel::matchVPTERNLOG(SDNode Root, SDNode ParentA,
SDNode *ParentBC, SDValue A, SDValue B,		SDNode ParentB, SDNode ParentC,
SDValue C, uint8_t Imm) {		SDValue A, SDValue B, SDValue C,
		RKSimonUnsubmitted Not Done Reply Inline Actions Missing assert messages RKSimon: Missing assert messages
assert(A.isOperandOf(ParentA));		uint8_t Imm) {
assert(B.isOperandOf(ParentBC));		assert(A.isOperandOf(ParentA) && B.isOperandOf(ParentB) &&
assert(C.isOperandOf(ParentBC));		C.isOperandOf(ParentC) && "Incorrect parent node");

auto tryFoldLoadOrBCast =		auto tryFoldLoadOrBCast =
[this](SDNode Root, SDNode P, SDValue &L, SDValue &Base, SDValue &Scale,		[this](SDNode Root, SDNode P, SDValue &L, SDValue &Base, SDValue &Scale,
SDValue &Index, SDValue &Disp, SDValue &Segment) {		SDValue &Index, SDValue &Disp, SDValue &Segment) {
if (tryFoldLoad(Root, P, L, Base, Scale, Index, Disp, Segment))		if (tryFoldLoad(Root, P, L, Base, Scale, Index, Disp, Segment))
return true;		return true;

// Not a load, check for broadcast which may be behind a bitcast.		// Not a load, check for broadcast which may be behind a bitcast.
Show All 11 Lines	auto tryFoldLoadOrBCast =
if (Size != 32 && Size != 64)		if (Size != 32 && Size != 64)
return false;		return false;

return tryFoldBroadcast(Root, P, L, Base, Scale, Index, Disp, Segment);		return tryFoldBroadcast(Root, P, L, Base, Scale, Index, Disp, Segment);
};		};

bool FoldedLoad = false;		bool FoldedLoad = false;
SDValue Tmp0, Tmp1, Tmp2, Tmp3, Tmp4;		SDValue Tmp0, Tmp1, Tmp2, Tmp3, Tmp4;
if (tryFoldLoadOrBCast(Root, ParentBC, C, Tmp0, Tmp1, Tmp2, Tmp3, Tmp4)) {		if (tryFoldLoadOrBCast(Root, ParentC, C, Tmp0, Tmp1, Tmp2, Tmp3, Tmp4)) {
FoldedLoad = true;		FoldedLoad = true;
} else if (tryFoldLoadOrBCast(Root, ParentA, A, Tmp0, Tmp1, Tmp2, Tmp3,		} else if (tryFoldLoadOrBCast(Root, ParentA, A, Tmp0, Tmp1, Tmp2, Tmp3,
Tmp4)) {		Tmp4)) {
FoldedLoad = true;		FoldedLoad = true;
std::swap(A, C);		std::swap(A, C);
// Swap bits 1/4 and 3/6.		// Swap bits 1/4 and 3/6.
uint8_t OldImm = Imm;		uint8_t OldImm = Imm;
Imm = OldImm & 0xa5;		Imm = OldImm & 0xa5;
if (OldImm & 0x02) Imm \|= 0x10;		if (OldImm & 0x02) Imm \|= 0x10;
if (OldImm & 0x10) Imm \|= 0x02;		if (OldImm & 0x10) Imm \|= 0x02;
if (OldImm & 0x08) Imm \|= 0x40;		if (OldImm & 0x08) Imm \|= 0x40;
if (OldImm & 0x40) Imm \|= 0x08;		if (OldImm & 0x40) Imm \|= 0x08;
} else if (tryFoldLoadOrBCast(Root, ParentBC, B, Tmp0, Tmp1, Tmp2, Tmp3,		} else if (tryFoldLoadOrBCast(Root, ParentB, B, Tmp0, Tmp1, Tmp2, Tmp3,
Tmp4)) {		Tmp4)) {
FoldedLoad = true;		FoldedLoad = true;
std::swap(B, C);		std::swap(B, C);
// Swap bits 1/2 and 5/6.		// Swap bits 1/2 and 5/6.
uint8_t OldImm = Imm;		uint8_t OldImm = Imm;
Imm = OldImm & 0x99;		Imm = OldImm & 0x99;
if (OldImm & 0x02) Imm \|= 0x04;		if (OldImm & 0x02) Imm \|= 0x04;
if (OldImm & 0x04) Imm \|= 0x02;		if (OldImm & 0x04) Imm \|= 0x02;
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	bool X86DAGToDAGISel::matchVPTERNLOG(SDNode Root, SDNode ParentA,
}		}

ReplaceUses(SDValue(Root, 0), SDValue(MNode, 0));		ReplaceUses(SDValue(Root, 0), SDValue(MNode, 0));
CurDAG->RemoveDeadNode(Root);		CurDAG->RemoveDeadNode(Root);
return true;		return true;
}		}

// Try to match two logic ops to a VPTERNLOG.		// Try to match two logic ops to a VPTERNLOG.
// FIXME: Handle inverted inputs?
// FIXME: Handle more complex patterns that use an operand more than once?		// FIXME: Handle more complex patterns that use an operand more than once?
bool X86DAGToDAGISel::tryVPTERNLOG(SDNode *N) {		bool X86DAGToDAGISel::tryVPTERNLOG(SDNode *N) {
MVT NVT = N->getSimpleValueType(0);		MVT NVT = N->getSimpleValueType(0);

// Make sure we support VPTERNLOG.		// Make sure we support VPTERNLOG.
if (!NVT.isVector() \|\| !Subtarget->hasAVX512() \|\|		if (!NVT.isVector() \|\| !Subtarget->hasAVX512() \|\|
NVT.getVectorElementType() == MVT::i1)		NVT.getVectorElementType() == MVT::i1)
return false;		return false;
Show All 26 Lines	if ((FoldableOp = getFoldableLogicOp(N1))) {
A = N0;		A = N0;
} else if ((FoldableOp = getFoldableLogicOp(N0))) {		} else if ((FoldableOp = getFoldableLogicOp(N0))) {
A = N1;		A = N1;
} else		} else
return false;		return false;

SDValue B = FoldableOp.getOperand(0);		SDValue B = FoldableOp.getOperand(0);
SDValue C = FoldableOp.getOperand(1);		SDValue C = FoldableOp.getOperand(1);
		SDNode *ParentA = N;
		SDNode *ParentB = FoldableOp.getNode();
		SDNode *ParentC = FoldableOp.getNode();

// We can build the appropriate control immediate by performing the logic		// We can build the appropriate control immediate by performing the logic
// operation we're matching using these constants for A, B, and C.		// operation we're matching using these constants for A, B, and C.
const uint8_t TernlogMagicA = 0xf0;		uint8_t TernlogMagicA = 0xf0;
const uint8_t TernlogMagicB = 0xcc;		uint8_t TernlogMagicB = 0xcc;
const uint8_t TernlogMagicC = 0xaa;		uint8_t TernlogMagicC = 0xaa;

		// Some of the inputs may be inverted, peek through them and invert the
		// magic values accordingly.
		// TODO: There may be a bitcast before the xor that we should peek through.
		auto PeekThroughNot = [](SDValue &Op, SDNode *&Parent, uint8_t &Magic) {
		if (Op.getOpcode() == ISD::XOR && Op.hasOneUse() &&
		ISD::isBuildVectorAllOnes(Op.getOperand(1).getNode())) {
		Magic = ~Magic;
		Parent = Op.getNode();
		Op = Op.getOperand(0);
		}
		LuoYuankeUnsubmitted Not Done Reply Inline Actions It seems for VPTERNLOG instruction we can accept the 4th operand whose value is allZero or allOne no matter what is logic operation is. LuoYuanke: It seems for VPTERNLOG instruction we can accept the 4th operand whose value is allZero or…
		pengfeiUnsubmitted Not Done Reply Inline Actions Do you mean the FALSE and TRUE in table 5-10 and 5-11? I think we don't need a VPTERNLOG to generate allZero and allOne. pengfei: Do you mean the FALSE and TRUE in table 5-10 and 5-11? I think we don't need a VPTERNLOG to…
		LuoYuankeUnsubmitted Not Done Reply Inline Actions VPTERNLOG select all the possible result of 3 bits. I mean it can be extent to 4 bit as long as the 4th bit is compile-time fixed 0 or 1. For this case the node is xor (X, -1), the same approach can be applied to xor(X, 0), and(X, -1), andnp(X, 0) and so on. LuoYuanke: VPTERNLOG select all the possible result of 3 bits. I mean it can be extent to 4 bit as long as…
		pengfeiUnsubmitted Not Done Reply Inline Actions But other cases can be simplied directly, e.g. xor(X, 0) -> X, and(X, -1) -> X, andnp(X, 0) -> 0 etc. pengfei: But other cases can be simplied directly, e.g. xor(X, 0) -> X, and(X, -1) -> X, andnp(X, 0) ->…
		craig.topperAuthorUnsubmitted Done Reply Inline Actions Right those all should have been simplified by DAGCombine. craig.topper: Right those all should have been simplified by DAGCombine.
		};

		LuoYuankeUnsubmitted Not Done Reply Inline Actions Is the constant operand canonicalized as operand(1)? LuoYuanke: Is the constant operand canonicalized as operand(1)?
		pengfeiUnsubmitted Not Done Reply Inline Actions We checked it in line 4225. pengfei: We checked it in line 4225.
		craig.topperAuthorUnsubmitted Done Reply Inline Actions DAGCombine should canonicalize XOR with constant to have it on the RHS. craig.topper: DAGCombine should canonicalize XOR with constant to have it on the RHS.
		PeekThroughNot(A, ParentA, TernlogMagicA);
		PeekThroughNot(B, ParentB, TernlogMagicB);
		PeekThroughNot(C, ParentC, TernlogMagicC);

uint8_t Imm;		uint8_t Imm;
switch (FoldableOp.getOpcode()) {		switch (FoldableOp.getOpcode()) {
default: llvm_unreachable("Unexpected opcode!");		default: llvm_unreachable("Unexpected opcode!");
case ISD::AND: Imm = TernlogMagicB & TernlogMagicC; break;		case ISD::AND: Imm = TernlogMagicB & TernlogMagicC; break;
case ISD::OR: Imm = TernlogMagicB \| TernlogMagicC; break;		case ISD::OR: Imm = TernlogMagicB \| TernlogMagicC; break;
case ISD::XOR: Imm = TernlogMagicB ^ TernlogMagicC; break;		case ISD::XOR: Imm = TernlogMagicB ^ TernlogMagicC; break;
case X86ISD::ANDNP: Imm = ~(TernlogMagicB) & TernlogMagicC; break;		case X86ISD::ANDNP: Imm = ~(TernlogMagicB) & TernlogMagicC; break;
}		}
		LuoYuankeUnsubmitted Not Done Reply Inline Actions C.getOperand(0)? LuoYuanke: C.getOperand(0)?

switch (N->getOpcode()) {		switch (N->getOpcode()) {
default: llvm_unreachable("Unexpected opcode!");		default: llvm_unreachable("Unexpected opcode!");
case X86ISD::ANDNP:		case X86ISD::ANDNP:
if (A == N0)		if (A == N0)
Imm &= ~TernlogMagicA;		Imm &= ~TernlogMagicA;
else		else
Imm = ~(Imm) & TernlogMagicA;		Imm = ~(Imm) & TernlogMagicA;
break;		break;
case ISD::AND: Imm &= TernlogMagicA; break;		case ISD::AND: Imm &= TernlogMagicA; break;
case ISD::OR: Imm \|= TernlogMagicA; break;		case ISD::OR: Imm \|= TernlogMagicA; break;
case ISD::XOR: Imm ^= TernlogMagicA; break;		case ISD::XOR: Imm ^= TernlogMagicA; break;
}		}

return matchVPTERNLOG(N, N, FoldableOp.getNode(), A, B, C, Imm);		return matchVPTERNLOG(N, ParentA, ParentB, ParentC, A, B, C, Imm);
}		}

/// If the high bits of an 'and' operand are known zero, try setting the		/// If the high bits of an 'and' operand are known zero, try setting the
/// high bits of an 'and' constant operand to produce a smaller encoding by		/// high bits of an 'and' constant operand to produce a smaller encoding by
/// creating a small, sign-extended negative immediate rather than a large		/// creating a small, sign-extended negative immediate rather than a large
/// positive one. This reverses a transform in SimplifyDemandedBits that		/// positive one. This reverses a transform in SimplifyDemandedBits that
/// shrinks mask constants by clearing bits. There is also a possibility that		/// shrinks mask constants by clearing bits. There is also a possibility that
/// the 'and' mask can be made -1, so the 'and' itself is unnecessary. In that		/// the 'and' mask can be made -1, so the 'and' itself is unnecessary. In that
▲ Show 20 Lines • Show All 320 Lines • ▼ Show 20 Lines	else
return false;		return false;

SDLoc dl(N);		SDLoc dl(N);
SDValue Imm = CurDAG->getTargetConstant(0xCA, dl, MVT::i8);		SDValue Imm = CurDAG->getTargetConstant(0xCA, dl, MVT::i8);
SDValue Ternlog = CurDAG->getNode(X86ISD::VPTERNLOG, dl, NVT, A, B, C, Imm);		SDValue Ternlog = CurDAG->getNode(X86ISD::VPTERNLOG, dl, NVT, A, B, C, Imm);
ReplaceNode(N, Ternlog.getNode());		ReplaceNode(N, Ternlog.getNode());

return matchVPTERNLOG(Ternlog.getNode(), Ternlog.getNode(), Ternlog.getNode(),		return matchVPTERNLOG(Ternlog.getNode(), Ternlog.getNode(), Ternlog.getNode(),
A, B, C, 0xCA);		Ternlog.getNode(), A, B, C, 0xCA);
}		}

void X86DAGToDAGISel::Select(SDNode *Node) {		void X86DAGToDAGISel::Select(SDNode *Node) {
MVT NVT = Node->getSimpleValueType(0);		MVT NVT = Node->getSimpleValueType(0);
unsigned Opcode = Node->getOpcode();		unsigned Opcode = Node->getOpcode();
SDLoc dl(Node);		SDLoc dl(Node);

if (Node->isMachineOpcode()) {		if (Node->isMachineOpcode()) {
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	void X86DAGToDAGISel::Select(SDNode *Node) {
case ISD::SRA:		case ISD::SRA:
case ISD::SHL:		case ISD::SHL:
if (tryShiftAmountMod(Node))		if (tryShiftAmountMod(Node))
return;		return;
break;		break;

case X86ISD::VPTERNLOG: {		case X86ISD::VPTERNLOG: {
uint8_t Imm = cast<ConstantSDNode>(Node->getOperand(3))->getZExtValue();		uint8_t Imm = cast<ConstantSDNode>(Node->getOperand(3))->getZExtValue();
if (matchVPTERNLOG(Node, Node, Node, Node->getOperand(0),		if (matchVPTERNLOG(Node, Node, Node, Node, Node->getOperand(0),
Node->getOperand(1), Node->getOperand(2), Imm))		Node->getOperand(1), Node->getOperand(2), Imm))
return;		return;
break;		break;
}		}

case X86ISD::ANDNP:		case X86ISD::ANDNP:
if (tryVPTERNLOG(Node))		if (tryVPTERNLOG(Node))
return;		return;
▲ Show 20 Lines • Show All 1,188 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512vl-logic.ll

Show First 20 Lines • Show All 974 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%b = and <4 x i32> %y, %a		%b = and <4 x i32> %y, %a
%c = or <4 x i32> %b, %z		%c = or <4 x i32> %b, %z
ret <4 x i32> %c		ret <4 x i32> %c
}		}

define <4 x i32> @ternlog_and_orn(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {		define <4 x i32> @ternlog_and_orn(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {
; CHECK-LABEL: ternlog_and_orn:		; CHECK-LABEL: ternlog_and_orn:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpternlogq $15, %xmm2, %xmm2, %xmm2		; CHECK-NEXT: vpternlogd $176, %xmm1, %xmm2, %xmm0
; CHECK-NEXT: vpternlogd $224, %xmm1, %xmm2, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%a = xor <4 x i32> %z, <i32 -1, i32 -1, i32 -1, i32 -1>		%a = xor <4 x i32> %z, <i32 -1, i32 -1, i32 -1, i32 -1>
		LuoYuankeUnsubmitted Not Done Reply Inline Actions Miss the test case for ~B and ~C? LuoYuanke: Miss the test case for ~B and ~C?
%b = or <4 x i32> %a, %y		%b = or <4 x i32> %a, %y
%c = and <4 x i32> %b, %x		%c = and <4 x i32> %b, %x
ret <4 x i32> %c		ret <4 x i32> %c
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions Please can you precommit this? RKSimon: Please can you precommit this?

define <4 x i32> @ternlog_and_orn_2(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {		define <4 x i32> @ternlog_and_orn_2(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {
; CHECK-LABEL: ternlog_and_orn_2:		; CHECK-LABEL: ternlog_and_orn_2:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpternlogq $15, %xmm2, %xmm2, %xmm2		; CHECK-NEXT: vpternlogd $208, %xmm2, %xmm1, %xmm0
; CHECK-NEXT: vpternlogd $224, %xmm2, %xmm1, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%a = xor <4 x i32> %z, <i32 -1, i32 -1, i32 -1, i32 -1>		%a = xor <4 x i32> %z, <i32 -1, i32 -1, i32 -1, i32 -1>
%b = or <4 x i32> %y, %a		%b = or <4 x i32> %y, %a
%c = and <4 x i32> %b, %x		%c = and <4 x i32> %b, %x
ret <4 x i32> %c		ret <4 x i32> %c
}		}

		; FIXME: This should be a single vpternlog, but we accidentally match the xor -1
		; as the second binary op instead of the and.
define <4 x i32> @ternlog_orn_and(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {		define <4 x i32> @ternlog_orn_and(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {
; CHECK-LABEL: ternlog_orn_and:		; CHECK-LABEL: ternlog_orn_and:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpcmpeqd %xmm3, %xmm3, %xmm3		; CHECK-NEXT: vpcmpeqd %xmm3, %xmm3, %xmm3
; CHECK-NEXT: vpand %xmm2, %xmm1, %xmm1		; CHECK-NEXT: vpand %xmm2, %xmm1, %xmm1
; CHECK-NEXT: vpternlogd $222, %xmm3, %xmm1, %xmm0		; CHECK-NEXT: vpternlogd $222, %xmm3, %xmm1, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%a = xor <4 x i32> %x, <i32 -1, i32 -1, i32 -1, i32 -1>		%a = xor <4 x i32> %x, <i32 -1, i32 -1, i32 -1, i32 -1>
%b = and <4 x i32> %y, %z		%b = and <4 x i32> %y, %z
%c = or <4 x i32> %b, %a		%c = or <4 x i32> %b, %a
ret <4 x i32> %c		ret <4 x i32> %c
}		}

define <4 x i32> @ternlog_orn_and_2(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {		define <4 x i32> @ternlog_orn_and_2(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {
; CHECK-LABEL: ternlog_orn_and_2:		; CHECK-LABEL: ternlog_orn_and_2:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpternlogq $15, %xmm0, %xmm0, %xmm0		; CHECK-NEXT: vpternlogd $143, %xmm2, %xmm1, %xmm0
; CHECK-NEXT: vpternlogd $248, %xmm2, %xmm1, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%a = xor <4 x i32> %x, <i32 -1, i32 -1, i32 -1, i32 -1>		%a = xor <4 x i32> %x, <i32 -1, i32 -1, i32 -1, i32 -1>
%b = and <4 x i32> %y, %z		%b = and <4 x i32> %y, %z
%c = or <4 x i32> %a, %b		%c = or <4 x i32> %a, %b
ret <4 x i32> %c		ret <4 x i32> %c
}		}

define <4 x i32> @ternlog_xor_andn(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {		define <4 x i32> @ternlog_xor_andn(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {
▲ Show 20 Lines • Show All 341 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Handle inverted inputs when matching VPTERNLOG from 2 binary ops.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 370975

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

llvm/test/CodeGen/X86/avx512vl-logic.ll

[X86] Handle inverted inputs when matching VPTERNLOG from 2 binary ops.
ClosedPublic