This is an archive of the discontinued LLVM Phabricator instance.

[X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel
ClosedPublic

Authored by craig.topper on Sep 7 2017, 1:52 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
zvi
igorb
aymanmus

Commits

rG958106d0f16f: [X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI…
rL313054: [X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI…

Summary

Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp.

This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together.

I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel.

I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Sep 7 2017, 1:52 PM

aymanmus added a subscriber: aymanmus.Sep 11 2017, 12:04 AM

aymanmus added inline comments.

lib/Target/X86/X86ISelDAGToDAG.cpp
2129 ↗	(On Diff #114248)	Maybe add an assert to guarantee an 'AND' Node.
2190 ↗	(On Diff #114248)	Why do you manually handle the memory folding here instead of letting the regular mechanism take care of that?
2272 ↗	(On Diff #114248)	remove extra line

craig.topper added inline comments.Sep 11 2017, 9:15 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
2190 ↗	(On Diff #114248)	The regular mechanism is tablegen patterns and the isel table. But we're bypassing all of that here. We could be lazy here and let the peephole pass that can also fold memory using the fold folding tables take care of it. But we normally try to fold as much as possible at isel.

aymanmus added inline comments.Sep 12 2017, 2:19 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
2190 ↗	(On Diff #114248)	Sorry if I'm being picky about that, but I can't understand the reason behind this. In this case there is no special handling of memory folding (e.g. no special condition for applying/not applying the transformation), so why would we prefer to handle some of the cases here, some via tablegen patterns and the rest with the folding tables? Wouldn't it be easier and more well-designed if the transformation would be completely controlled by one "owner" (lets say folding tables) and only exceptions are handled here or through tablegen patterns?

Normal loads are usually folded by isel patterns
Stack reloads created by the register allocator are folded using the load folding tables. Since these loads don't exist during isel we can't fold them there.
The peephole pass can also fold loads using the loading fold tables. At one point in time this was definitely weaker than the isel mechanism even ignoring the missing instructions in the isel table. I'm not sure what the status is now.

In this case I'd really like to be able to write an isel pattern for this instruction but I can't because there's no way to check the relationship of the two immediates. So I'm supplying custom code at the time of isel. If we were able to write the isel pattern we'd fold the load at that time. So I'm folding the load manually so that it happens during isel like it would if I could write the pattern. This is similar to what we do for integer MUL and DIV instructions that we also can't handle with isel patterns and do manually.

Thanks for the answer.
LGTM

This revision is now accepted and ready to land.Sep 12 2017, 9:00 AM

Closed by commit rL313054: [X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI… (authored by ctopper). · Explain WhySep 12 2017, 10:41 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

92 lines

3 lines

27 lines

22 lines

test/

CodeGen/

X86/

bmi.ll

24 lines

tbm_patterns.ll

30 lines

Diff 114864

llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 416 Lines • ▼ Show 20 Lines	bool useNonTemporalLoad(LoadSDNode *N) const {
case 32:		case 32:
return Subtarget->hasAVX2();		return Subtarget->hasAVX2();
case 64:		case 64:
return Subtarget->hasAVX512();		return Subtarget->hasAVX512();
}		}
}		}

bool foldLoadStoreIntoMemOperand(SDNode *Node);		bool foldLoadStoreIntoMemOperand(SDNode *Node);

		bool matchBEXTRFromAnd(SDNode *Node);
};		};
}		}


bool		bool
X86DAGToDAGISel::IsProfitableToFold(SDValue N, SDNode U, SDNode Root) const {		X86DAGToDAGISel::IsProfitableToFold(SDValue N, SDNode U, SDNode Root) const {
if (OptLevel == CodeGenOpt::None) return false;		if (OptLevel == CodeGenOpt::None) return false;

▲ Show 20 Lines • Show All 1,842 Lines • ▼ Show 20 Lines	bool X86DAGToDAGISel::foldLoadStoreIntoMemOperand(SDNode *Node) {
Result->setMemRefs(MemOp, MemOp + 2);		Result->setMemRefs(MemOp, MemOp + 2);

ReplaceUses(SDValue(StoreNode, 0), SDValue(Result, 1));		ReplaceUses(SDValue(StoreNode, 0), SDValue(Result, 1));
ReplaceUses(SDValue(StoredVal.getNode(), 1), SDValue(Result, 0));		ReplaceUses(SDValue(StoredVal.getNode(), 1), SDValue(Result, 0));
CurDAG->RemoveDeadNode(Node);		CurDAG->RemoveDeadNode(Node);
return true;		return true;
}		}

		// See if this is an (X >> C1) & C2 that we can match to BEXTR/BEXTRI.
		bool X86DAGToDAGISel::matchBEXTRFromAnd(SDNode *Node) {
		MVT NVT = Node->getSimpleValueType(0);
		SDLoc dl(Node);

		SDValue N0 = Node->getOperand(0);
		SDValue N1 = Node->getOperand(1);

		if (!Subtarget->hasBMI() && !Subtarget->hasTBM())
		return false;

		// Must have a shift right.
		if (N0->getOpcode() != ISD::SRL && N0->getOpcode() != ISD::SRA)
		return false;

		// Shift can't have additional users.
		if (!N0->hasOneUse())
		return false;

		// Only supported for 32 and 64 bits.
		if (NVT != MVT::i32 && NVT != MVT::i64)
		return false;

		// Shift amount and RHS of and must be constant.
		ConstantSDNode *MaskCst = dyn_cast<ConstantSDNode>(N1);
		ConstantSDNode *ShiftCst = dyn_cast<ConstantSDNode>(N0->getOperand(1));
		if (!MaskCst \|\| !ShiftCst)
		return false;

		// And RHS must be a mask.
		uint64_t Mask = MaskCst->getZExtValue();
		if (!isMask_64(Mask))
		return false;

		uint64_t Shift = ShiftCst->getZExtValue();
		uint64_t MaskSize = countPopulation(Mask);

		// Don't interfere with something that can be handled by extracting AH.
		// TODO: If we are able to fold a load, BEXTR might still be better than AH.
		if (Shift == 8 && MaskSize == 8)
		return false;

		// Make sure we are only using bits that were in the original value, not
		// shifted in.
		if (Shift + MaskSize > NVT.getSizeInBits())
		return false;

		SDValue New = CurDAG->getTargetConstant(Shift \| (MaskSize << 8), dl, NVT);
		unsigned ROpc = NVT == MVT::i64 ? X86::BEXTRI64ri : X86::BEXTRI32ri;
		unsigned MOpc = NVT == MVT::i64 ? X86::BEXTRI64mi : X86::BEXTRI32mi;

		// BMI requires the immediate to placed in a register.
		if (!Subtarget->hasTBM()) {
		ROpc = NVT == MVT::i64 ? X86::BEXTR64rr : X86::BEXTR32rr;
		MOpc = NVT == MVT::i64 ? X86::BEXTR64rm : X86::BEXTR32rm;
		SDNode *Move = CurDAG->getMachineNode(X86::MOV32ri, dl, NVT, New);
		New = SDValue(Move, 0);
		}

		MachineSDNode *NewNode;
		SDValue Input = N0->getOperand(0);
		SDValue Tmp0, Tmp1, Tmp2, Tmp3, Tmp4;
		if (tryFoldLoad(Node, Input, Tmp0, Tmp1, Tmp2, Tmp3, Tmp4)) {
		SDValue Ops[] = { Tmp0, Tmp1, Tmp2, Tmp3, Tmp4, New, Input.getOperand(0) };
		SDVTList VTs = CurDAG->getVTList(NVT, MVT::Other);
		NewNode = CurDAG->getMachineNode(MOpc, dl, VTs, Ops);
		// Update the chain.
		ReplaceUses(N1.getValue(1), SDValue(NewNode, 1));
		// Record the mem-refs
		LoadSDNode *LoadNode = cast<LoadSDNode>(Input);
		if (LoadNode) {
		MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
		MemOp[0] = LoadNode->getMemOperand();
		NewNode->setMemRefs(MemOp, MemOp + 1);
		}
		} else {
		NewNode = CurDAG->getMachineNode(ROpc, dl, NVT, Input, New);
		}

		ReplaceUses(SDValue(Node, 0), SDValue(NewNode, 0));
		CurDAG->RemoveDeadNode(Node);
		return true;
		}

void X86DAGToDAGISel::Select(SDNode *Node) {		void X86DAGToDAGISel::Select(SDNode *Node) {
MVT NVT = Node->getSimpleValueType(0);		MVT NVT = Node->getSimpleValueType(0);
unsigned Opc, MOpc;		unsigned Opc, MOpc;
unsigned Opcode = Node->getOpcode();		unsigned Opcode = Node->getOpcode();
SDLoc dl(Node);		SDLoc dl(Node);

DEBUG(dbgs() << "Selecting: "; Node->dump(CurDAG); dbgs() << '\n');		DEBUG(dbgs() << "Selecting: "; Node->dump(CurDAG); dbgs() << '\n');

Show All 37 Lines	SDValue VSelect = CurDAG->getNode(
Node->getOperand(1), Node->getOperand(2));		Node->getOperand(1), Node->getOperand(2));
ReplaceNode(Node, VSelect.getNode());		ReplaceNode(Node, VSelect.getNode());
SelectCode(VSelect.getNode());		SelectCode(VSelect.getNode());
// We already called ReplaceUses.		// We already called ReplaceUses.
return;		return;
}		}

case ISD::AND:		case ISD::AND:
		// Try to match BEXTR/BEXTRI instruction.
		if (matchBEXTRFromAnd(Node))
		return;

		LLVM_FALLTHROUGH;
case ISD::OR:		case ISD::OR:
case ISD::XOR: {		case ISD::XOR: {

// For operations of the form (x << C1) op C2, check if we can use a smaller		// For operations of the form (x << C1) op C2, check if we can use a smaller
// encoding for C2 by transforming it into (x op (C2>>C1)) << C1.		// encoding for C2 by transforming it into (x op (C2>>C1)) << C1.
SDValue N0 = Node->getOperand(0);		SDValue N0 = Node->getOperand(0);
SDValue N1 = Node->getOperand(1);		SDValue N1 = Node->getOperand(1);

if (N0->getOpcode() != ISD::SHL \|\| !N0->hasOneUse())		if (N0->getOpcode() != ISD::SHL \|\| !N0->hasOneUse())
break;		break;

▲ Show 20 Lines • Show All 640 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
CMPMU,		CMPMU,
// Vector comparison with rounding mode for FP values		// Vector comparison with rounding mode for FP values
CMPM_RND,		CMPM_RND,

// Arithmetic operations with FLAGS results.		// Arithmetic operations with FLAGS results.
ADD, SUB, ADC, SBB, SMUL,		ADD, SUB, ADC, SBB, SMUL,
INC, DEC, OR, XOR, AND,		INC, DEC, OR, XOR, AND,

// Bit field extract.
BEXTR,

// LOW, HI, FLAGS = umul LHS, RHS.		// LOW, HI, FLAGS = umul LHS, RHS.
UMUL,		UMUL,

// 8-bit SMUL/UMUL - AX, FLAGS = smul8/umul8 AL, RHS.		// 8-bit SMUL/UMUL - AX, FLAGS = smul8/umul8 AL, RHS.
SMUL8, UMUL8,		SMUL8, UMUL8,

// 8-bit divrem that zero-extend the high result (AH).		// 8-bit divrem that zero-extend the high result (AH).
UDIVREM8_ZEXT_HREG,		UDIVREM8_ZEXT_HREG,
▲ Show 20 Lines • Show All 1,143 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 24,580 Lines • ▼ Show 20 Lines	const char *X86TargetLowering::getTargetNodeName(unsigned Opcode) const {
case X86ISD::UMUL8: return "X86ISD::UMUL8";		case X86ISD::UMUL8: return "X86ISD::UMUL8";
case X86ISD::SDIVREM8_SEXT_HREG: return "X86ISD::SDIVREM8_SEXT_HREG";		case X86ISD::SDIVREM8_SEXT_HREG: return "X86ISD::SDIVREM8_SEXT_HREG";
case X86ISD::UDIVREM8_ZEXT_HREG: return "X86ISD::UDIVREM8_ZEXT_HREG";		case X86ISD::UDIVREM8_ZEXT_HREG: return "X86ISD::UDIVREM8_ZEXT_HREG";
case X86ISD::INC: return "X86ISD::INC";		case X86ISD::INC: return "X86ISD::INC";
case X86ISD::DEC: return "X86ISD::DEC";		case X86ISD::DEC: return "X86ISD::DEC";
case X86ISD::OR: return "X86ISD::OR";		case X86ISD::OR: return "X86ISD::OR";
case X86ISD::XOR: return "X86ISD::XOR";		case X86ISD::XOR: return "X86ISD::XOR";
case X86ISD::AND: return "X86ISD::AND";		case X86ISD::AND: return "X86ISD::AND";
case X86ISD::BEXTR: return "X86ISD::BEXTR";
case X86ISD::MUL_IMM: return "X86ISD::MUL_IMM";		case X86ISD::MUL_IMM: return "X86ISD::MUL_IMM";
case X86ISD::MOVMSK: return "X86ISD::MOVMSK";		case X86ISD::MOVMSK: return "X86ISD::MOVMSK";
case X86ISD::PTEST: return "X86ISD::PTEST";		case X86ISD::PTEST: return "X86ISD::PTEST";
case X86ISD::TESTP: return "X86ISD::TESTP";		case X86ISD::TESTP: return "X86ISD::TESTP";
case X86ISD::TESTM: return "X86ISD::TESTM";		case X86ISD::TESTM: return "X86ISD::TESTM";
case X86ISD::TESTNM: return "X86ISD::TESTNM";		case X86ISD::TESTNM: return "X86ISD::TESTNM";
case X86ISD::KORTEST: return "X86ISD::KORTEST";		case X86ISD::KORTEST: return "X86ISD::KORTEST";
case X86ISD::KTEST: return "X86ISD::KTEST";		case X86ISD::KTEST: return "X86ISD::KTEST";
▲ Show 20 Lines • Show All 7,559 Lines • ▼ Show 20 Lines	static SDValue combineAnd(SDNode *N, SelectionDAG &DAG,

if (SDValue R = combineANDXORWithAllOnesIntoANDNP(N, DAG))		if (SDValue R = combineANDXORWithAllOnesIntoANDNP(N, DAG))
return R;		return R;

if (SDValue ShiftRight = combineAndMaskToShift(N, DAG, Subtarget))		if (SDValue ShiftRight = combineAndMaskToShift(N, DAG, Subtarget))
return ShiftRight;		return ShiftRight;

EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);
SDLoc DL(N);

// Attempt to recursively combine a bitmask AND with shuffles.		// Attempt to recursively combine a bitmask AND with shuffles.
if (VT.isVector() && (VT.getScalarSizeInBits() % 8) == 0) {		if (VT.isVector() && (VT.getScalarSizeInBits() % 8) == 0) {
SDValue Op(N, 0);		SDValue Op(N, 0);
SmallVector<int, 1> NonceMask; // Just a placeholder.		SmallVector<int, 1> NonceMask; // Just a placeholder.
NonceMask.push_back(0);		NonceMask.push_back(0);
if (combineX86ShufflesRecursively({Op}, 0, Op, NonceMask, {},		if (combineX86ShufflesRecursively({Op}, 0, Op, NonceMask, {},
/Depth/ 1, /HasVarMask/ false, DAG,		/Depth/ 1, /HasVarMask/ false, DAG,
DCI, Subtarget))		DCI, Subtarget))
return SDValue(); // This routine will use CombineTo to replace N.		return SDValue(); // This routine will use CombineTo to replace N.
}		}

// Create BEXTR instructions
// BEXTR is ((X >> imm) & (2**size-1))
if (VT != MVT::i32 && VT != MVT::i64)
return SDValue();

if (!Subtarget.hasBMI() && !Subtarget.hasTBM())
return SDValue();
if (N0.getOpcode() != ISD::SRA && N0.getOpcode() != ISD::SRL)
return SDValue();

ConstantSDNode *MaskNode = dyn_cast<ConstantSDNode>(N1);
ConstantSDNode *ShiftNode = dyn_cast<ConstantSDNode>(N0.getOperand(1));
if (MaskNode && ShiftNode) {
uint64_t Mask = MaskNode->getZExtValue();
uint64_t Shift = ShiftNode->getZExtValue();
if (isMask_64(Mask)) {
uint64_t MaskSize = countPopulation(Mask);
if (Shift + MaskSize <= VT.getSizeInBits())
return DAG.getNode(X86ISD::BEXTR, DL, VT, N0.getOperand(0),
DAG.getConstant(Shift \| (MaskSize << 8), DL,
VT));
}
}
return SDValue();		return SDValue();
}		}

// Try to fold:		// Try to fold:
// (or (and (m, y), (pandn m, x)))		// (or (and (m, y), (pandn m, x)))
// into:		// into:
// (vselect m, x, y)		// (vselect m, x, y)
// As a special case, try to fold:		// As a special case, try to fold:
▲ Show 20 Lines • Show All 4,743 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.td

Show First 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	def X86lock_or : SDNode<"X86ISD::LOR", SDTLockBinaryArithWithFlags,
SDNPMemOperand]>;		SDNPMemOperand]>;
def X86lock_xor : SDNode<"X86ISD::LXOR", SDTLockBinaryArithWithFlags,		def X86lock_xor : SDNode<"X86ISD::LXOR", SDTLockBinaryArithWithFlags,
[SDNPHasChain, SDNPMayStore, SDNPMayLoad,		[SDNPHasChain, SDNPMayStore, SDNPMayLoad,
SDNPMemOperand]>;		SDNPMemOperand]>;
def X86lock_and : SDNode<"X86ISD::LAND", SDTLockBinaryArithWithFlags,		def X86lock_and : SDNode<"X86ISD::LAND", SDTLockBinaryArithWithFlags,
[SDNPHasChain, SDNPMayStore, SDNPMayLoad,		[SDNPHasChain, SDNPMayStore, SDNPMayLoad,
SDNPMemOperand]>;		SDNPMemOperand]>;

def X86bextr : SDNode<"X86ISD::BEXTR", SDTIntBinOp>;

def X86mul_imm : SDNode<"X86ISD::MUL_IMM", SDTIntBinOp>;		def X86mul_imm : SDNode<"X86ISD::MUL_IMM", SDTIntBinOp>;

def X86WinAlloca : SDNode<"X86ISD::WIN_ALLOCA", SDT_X86WIN_ALLOCA,		def X86WinAlloca : SDNode<"X86ISD::WIN_ALLOCA", SDT_X86WIN_ALLOCA,
[SDNPHasChain, SDNPOutGlue]>;		[SDNPHasChain, SDNPOutGlue]>;

def X86SegAlloca : SDNode<"X86ISD::SEG_ALLOCA", SDT_X86SEG_ALLOCA,		def X86SegAlloca : SDNode<"X86ISD::SEG_ALLOCA", SDT_X86SEG_ALLOCA,
[SDNPHasChain]>;		[SDNPHasChain]>;

▲ Show 20 Lines • Show All 2,150 Lines • ▼ Show 20 Lines	def : Pat<(srl (shl GR64:$src, (i8 (trunc (sub 64, GR32:$lz)))),
(BZHI64rr GR64:$src,		(BZHI64rr GR64:$src,
(INSERT_SUBREG (i64 (IMPLICIT_DEF)), GR32:$lz, sub_32bit))>;		(INSERT_SUBREG (i64 (IMPLICIT_DEF)), GR32:$lz, sub_32bit))>;
def : Pat<(srl (shl (loadi64 addr:$src), (i8 (trunc (sub 64, GR32:$lz)))),		def : Pat<(srl (shl (loadi64 addr:$src), (i8 (trunc (sub 64, GR32:$lz)))),
(i8 (trunc (sub 64, GR32:$lz)))),		(i8 (trunc (sub 64, GR32:$lz)))),
(BZHI64rm addr:$src,		(BZHI64rm addr:$src,
(INSERT_SUBREG (i64 (IMPLICIT_DEF)), GR32:$lz, sub_32bit))>;		(INSERT_SUBREG (i64 (IMPLICIT_DEF)), GR32:$lz, sub_32bit))>;
} // HasBMI2		} // HasBMI2

let Predicates = [HasBMI] in {
def : Pat<(X86bextr GR32:$src1, GR32:$src2),
(BEXTR32rr GR32:$src1, GR32:$src2)>;
def : Pat<(X86bextr (loadi32 addr:$src1), GR32:$src2),
(BEXTR32rm addr:$src1, GR32:$src2)>;
def : Pat<(X86bextr GR64:$src1, GR64:$src2),
(BEXTR64rr GR64:$src1, GR64:$src2)>;
def : Pat<(X86bextr (loadi64 addr:$src1), GR64:$src2),
(BEXTR64rm addr:$src1, GR64:$src2)>;
} // HasBMI

multiclass bmi_pdep_pext<string mnemonic, RegisterClass RC,		multiclass bmi_pdep_pext<string mnemonic, RegisterClass RC,
X86MemOperand x86memop, Intrinsic Int,		X86MemOperand x86memop, Intrinsic Int,
PatFrag ld_frag> {		PatFrag ld_frag> {
def rr : I<0xF5, MRMSrcReg, (outs RC:$dst), (ins RC:$src1, RC:$src2),		def rr : I<0xF5, MRMSrcReg, (outs RC:$dst), (ins RC:$src1, RC:$src2),
!strconcat(mnemonic, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),		!strconcat(mnemonic, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),
[(set RC:$dst, (Int RC:$src1, RC:$src2))]>,		[(set RC:$dst, (Int RC:$src1, RC:$src2))]>,
VEX_4V;		VEX_4V;
def rm : I<0xF5, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, x86memop:$src2),		def rm : I<0xF5, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, x86memop:$src2),
▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
def : InstAlias<"clzero\t{%eax\|eax}", (CLZEROr)>, Requires<[Not64BitMode]>;		def : InstAlias<"clzero\t{%eax\|eax}", (CLZEROr)>, Requires<[Not64BitMode]>;
def : InstAlias<"clzero\t{%rax\|rax}", (CLZEROr)>, Requires<[In64BitMode]>;		def : InstAlias<"clzero\t{%rax\|rax}", (CLZEROr)>, Requires<[In64BitMode]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Pattern fragments to auto generate TBM instructions.		// Pattern fragments to auto generate TBM instructions.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let Predicates = [HasTBM] in {		let Predicates = [HasTBM] in {
def : Pat<(X86bextr GR32:$src1, (i32 imm:$src2)),
(BEXTRI32ri GR32:$src1, imm:$src2)>;
def : Pat<(X86bextr (loadi32 addr:$src1), (i32 imm:$src2)),
(BEXTRI32mi addr:$src1, imm:$src2)>;
def : Pat<(X86bextr GR64:$src1, i64immSExt32:$src2),
(BEXTRI64ri GR64:$src1, i64immSExt32:$src2)>;
def : Pat<(X86bextr (loadi64 addr:$src1), i64immSExt32:$src2),
(BEXTRI64mi addr:$src1, i64immSExt32:$src2)>;

// FIXME: patterns for the load versions are not implemented		// FIXME: patterns for the load versions are not implemented
def : Pat<(and GR32:$src, (add GR32:$src, 1)),		def : Pat<(and GR32:$src, (add GR32:$src, 1)),
(BLCFILL32rr GR32:$src)>;		(BLCFILL32rr GR32:$src)>;
def : Pat<(and GR64:$src, (add GR64:$src, 1)),		def : Pat<(and GR64:$src, (add GR64:$src, 1)),
(BLCFILL64rr GR64:$src)>;		(BLCFILL64rr GR64:$src)>;

def : Pat<(or GR32:$src, (not (add GR32:$src, 1))),		def : Pat<(or GR32:$src, (not (add GR32:$src, 1))),
(BLCI32rr GR32:$src)>;		(BLCI32rr GR32:$src)>;
▲ Show 20 Lines • Show All 652 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/bmi.ll

	Show First 20 Lines • Show All 305 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: movl $3076, %eax # imm = 0xC04			; CHECK-NEXT: movl $3076, %eax # imm = 0xC04
	; CHECK-NEXT: bextrl %eax, %edi, %eax			; CHECK-NEXT: bextrl %eax, %edi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%1 = lshr i32 %x, 4			%1 = lshr i32 %x, 4
	%2 = and i32 %1, 4095			%2 = and i32 %1, 4095
	ret i32 %2			ret i32 %2
	}			}

				; Make sure we still use AH subreg trick to extract 15:8
				define i32 @bextr32_subreg(i32 %x) uwtable ssp {
				; CHECK-LABEL: bextr32_subreg:
				; CHECK: # BB#0:
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: movzbl %ah, %eax # NOREX
				; CHECK-NEXT: retq
				%1 = lshr i32 %x, 8
				%2 = and i32 %1, 255
				ret i32 %2
				}

	define i32 @bextr32b_load(i32* %x) uwtable ssp {			define i32 @bextr32b_load(i32* %x) uwtable ssp {
	; CHECK-LABEL: bextr32b_load:			; CHECK-LABEL: bextr32b_load:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movl $3076, %eax # imm = 0xC04			; CHECK-NEXT: movl $3076, %eax # imm = 0xC04
	; CHECK-NEXT: bextrl %eax, (%rdi), %eax			; CHECK-NEXT: bextrl %eax, (%rdi), %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%1 = load i32, i32* %x			%1 = load i32, i32* %x
	%2 = lshr i32 %1, 4			%2 = lshr i32 %1, 4
	Show All 30 Lines
	; CHECK-NEXT: movl $3076, %eax # imm = 0xC04			; CHECK-NEXT: movl $3076, %eax # imm = 0xC04
	; CHECK-NEXT: bextrl %eax, %edi, %eax			; CHECK-NEXT: bextrl %eax, %edi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%1 = lshr i64 %x, 4			%1 = lshr i64 %x, 4
	%2 = and i64 %1, 4095			%2 = and i64 %1, 4095
	ret i64 %2			ret i64 %2
	}			}

				; Make sure we still use the AH subreg trick to extract 15:8
				define i64 @bextr64_subreg(i64 %x) uwtable ssp {
				; CHECK-LABEL: bextr64_subreg:
				; CHECK: # BB#0:
				; CHECK-NEXT: movq %rdi, %rax
				; CHECK-NEXT: movzbl %ah, %eax # NOREX
				; CHECK-NEXT: retq
				%1 = lshr i64 %x, 8
				%2 = and i64 %1, 255
				ret i64 %2
				}

	define i64 @bextr64b_load(i64* %x) {			define i64 @bextr64b_load(i64* %x) {
	; CHECK-LABEL: bextr64b_load:			; CHECK-LABEL: bextr64b_load:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movl $3076, %eax # imm = 0xC04			; CHECK-NEXT: movl $3076, %eax # imm = 0xC04
	; CHECK-NEXT: bextrl %eax, (%rdi), %eax			; CHECK-NEXT: bextrl %eax, (%rdi), %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%1 = load i64, i64* %x, align 8			%1 = load i64, i64* %x, align 8
	%2 = lshr i64 %1, 4			%2 = lshr i64 %1, 4
	▲ Show 20 Lines • Show All 407 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/tbm_patterns.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+tbm < %s \| FileCheck %s		; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+tbm < %s \| FileCheck %s

; TODO - Patterns fail to fold with ZF flags and prevents TBM instruction selection.		; TODO - Patterns fail to fold with ZF flags and prevents TBM instruction selection.

define i32 @test_x86_tbm_bextri_u32(i32 %a) nounwind {		define i32 @test_x86_tbm_bextri_u32(i32 %a) nounwind {
; CHECK-LABEL: test_x86_tbm_bextri_u32:		; CHECK-LABEL: test_x86_tbm_bextri_u32:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: bextr $3076, %edi, %eax # imm = 0xC04		; CHECK-NEXT: bextr $3076, %edi, %eax # imm = 0xC04
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = lshr i32 %a, 4		%t0 = lshr i32 %a, 4
%t1 = and i32 %t0, 4095		%t1 = and i32 %t0, 4095
ret i32 %t1		ret i32 %t1
}		}

		; Make sure we still use AH subreg trick for extracting bits 15:8
		define i32 @test_x86_tbm_bextri_u32_subreg(i32 %a) nounwind {
		; CHECK-LABEL: test_x86_tbm_bextri_u32_subreg:
		; CHECK: # BB#0:
		; CHECK-NEXT: movl %edi, %eax
		; CHECK-NEXT: movzbl %ah, %eax # NOREX
		; CHECK-NEXT: retq
		%t0 = lshr i32 %a, 8
		%t1 = and i32 %t0, 255
		ret i32 %t1
		}

define i32 @test_x86_tbm_bextri_u32_m(i32* nocapture %a) nounwind {		define i32 @test_x86_tbm_bextri_u32_m(i32* nocapture %a) nounwind {
; CHECK-LABEL: test_x86_tbm_bextri_u32_m:		; CHECK-LABEL: test_x86_tbm_bextri_u32_m:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: bextr $3076, (%rdi), %eax # imm = 0xC04		; CHECK-NEXT: bextr $3076, (%rdi), %eax # imm = 0xC04
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = load i32, i32* %a		%t0 = load i32, i32* %a
%t1 = lshr i32 %t0, 4		%t1 = lshr i32 %t0, 4
%t2 = and i32 %t1, 4095		%t2 = and i32 %t1, 4095
Show All 11 Lines	; CHECK-NEXT: retq
%t2 = icmp eq i32 %t1, 0		%t2 = icmp eq i32 %t1, 0
%t3 = select i1 %t2, i32 %b, i32 %t1		%t3 = select i1 %t2, i32 %b, i32 %t1
ret i32 %t3		ret i32 %t3
}		}

define i32 @test_x86_tbm_bextri_u32_z2(i32 %a, i32 %b, i32 %c) nounwind {		define i32 @test_x86_tbm_bextri_u32_z2(i32 %a, i32 %b, i32 %c) nounwind {
; CHECK-LABEL: test_x86_tbm_bextri_u32_z2:		; CHECK-LABEL: test_x86_tbm_bextri_u32_z2:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: bextr $3076, %edi, %eax # imm = 0xC04		; CHECK-NEXT: shrl $4, %edi
		; CHECK-NEXT: testw $4095, %di # imm = 0xFFF
; CHECK-NEXT: cmovnel %edx, %esi		; CHECK-NEXT: cmovnel %edx, %esi
; CHECK-NEXT: movl %esi, %eax		; CHECK-NEXT: movl %esi, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = lshr i32 %a, 4		%t0 = lshr i32 %a, 4
%t1 = and i32 %t0, 4095		%t1 = and i32 %t0, 4095
%t2 = icmp eq i32 %t1, 0		%t2 = icmp eq i32 %t1, 0
%t3 = select i1 %t2, i32 %b, i32 %c		%t3 = select i1 %t2, i32 %b, i32 %c
ret i32 %t3		ret i32 %t3
}		}

define i64 @test_x86_tbm_bextri_u64(i64 %a) nounwind {		define i64 @test_x86_tbm_bextri_u64(i64 %a) nounwind {
; CHECK-LABEL: test_x86_tbm_bextri_u64:		; CHECK-LABEL: test_x86_tbm_bextri_u64:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: bextr $3076, %edi, %eax # imm = 0xC04		; CHECK-NEXT: bextr $3076, %edi, %eax # imm = 0xC04
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = lshr i64 %a, 4		%t0 = lshr i64 %a, 4
%t1 = and i64 %t0, 4095		%t1 = and i64 %t0, 4095
ret i64 %t1		ret i64 %t1
}		}

		; Make sure we still use AH subreg trick for extracting bits 15:8
		define i64 @test_x86_tbm_bextri_u64_subreg(i64 %a) nounwind {
		; CHECK-LABEL: test_x86_tbm_bextri_u64_subreg:
		; CHECK: # BB#0:
		; CHECK-NEXT: movq %rdi, %rax
		; CHECK-NEXT: movzbl %ah, %eax # NOREX
		; CHECK-NEXT: retq
		%t0 = lshr i64 %a, 8
		%t1 = and i64 %t0, 255
		ret i64 %t1
		}

define i64 @test_x86_tbm_bextri_u64_m(i64* nocapture %a) nounwind {		define i64 @test_x86_tbm_bextri_u64_m(i64* nocapture %a) nounwind {
; CHECK-LABEL: test_x86_tbm_bextri_u64_m:		; CHECK-LABEL: test_x86_tbm_bextri_u64_m:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: bextr $3076, (%rdi), %eax # imm = 0xC04		; CHECK-NEXT: bextr $3076, (%rdi), %eax # imm = 0xC04
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = load i64, i64* %a		%t0 = load i64, i64* %a
%t1 = lshr i64 %t0, 4		%t1 = lshr i64 %t0, 4
%t2 = and i64 %t1, 4095		%t2 = and i64 %t1, 4095
Show All 11 Lines	; CHECK-NEXT: retq
%t2 = icmp eq i64 %t1, 0		%t2 = icmp eq i64 %t1, 0
%t3 = select i1 %t2, i64 %b, i64 %t1		%t3 = select i1 %t2, i64 %b, i64 %t1
ret i64 %t3		ret i64 %t3
}		}

define i64 @test_x86_tbm_bextri_u64_z2(i64 %a, i64 %b, i64 %c) nounwind {		define i64 @test_x86_tbm_bextri_u64_z2(i64 %a, i64 %b, i64 %c) nounwind {
; CHECK-LABEL: test_x86_tbm_bextri_u64_z2:		; CHECK-LABEL: test_x86_tbm_bextri_u64_z2:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: bextr $3076, %edi, %eax # imm = 0xC04		; CHECK-NEXT: shrl $4, %edi
		; CHECK-NEXT: testw $4095, %di # imm = 0xFFF
; CHECK-NEXT: cmovneq %rdx, %rsi		; CHECK-NEXT: cmovneq %rdx, %rsi
; CHECK-NEXT: movq %rsi, %rax		; CHECK-NEXT: movq %rsi, %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = lshr i64 %a, 4		%t0 = lshr i64 %a, 4
%t1 = and i64 %t0, 4095		%t1 = and i64 %t0, 4095
%t2 = icmp eq i64 %t1, 0		%t2 = icmp eq i64 %t1, 0
%t3 = select i1 %t2, i64 %b, i64 %c		%t3 = select i1 %t2, i64 %b, i64 %c
ret i64 %t3		ret i64 %t3
▲ Show 20 Lines • Show All 783 Lines • Show Last 20 Lines