This is an archive of the discontinued LLVM Phabricator instance.

Differential D19087

[x86] prefer comparisons against zero for and+cmp sequences
ClosedPublic

Authored by spatel on Apr 13 2016, 5:25 PM.

Download Raw Diff

Details

Reviewers

andreadb
kbsmith1
hfinkel

Commits

rGc2751e7050b2: [x86, BMI] add TLI hook for 'andn' and use it to simplify comparisons
rL268858: [x86, BMI] add TLI hook for 'andn' and use it to simplify comparisons

Summary

For the sake of minimalism, this patch is x86 only, but I think that at least PPC, ARM, AArch64, and Sparc probably want to do this too.

We might want to generalize the hook and pattern recognition for a target like PPC that has a full assortment of negated logic ops (orc, nand).

Note that http://reviews.llvm.org/D18842 would cause this transform to trigger more often.

For reference, this relates to:
https://llvm.org/bugs/show_bug.cgi?id=27105
https://llvm.org/bugs/show_bug.cgi?id=27202
https://llvm.org/bugs/show_bug.cgi?id=27203
https://llvm.org/bugs/show_bug.cgi?id=27328

Diff Detail

Repository: rL LLVM

Event Timeline

spatel updated this revision to Diff 53644.Apr 13 2016, 5:25 PM

spatel retitled this revision from to [x86, ppc] prefer comparisons against zero for and+cmp sequences.

spatel updated this object.

spatel added reviewers: hfinkel, andreadb, kbsmith1.

spatel added a subscriber: llvm-commits.

Herald added subscribers: mcrosier, jyknight, aemerson. · View Herald TranscriptApr 13 2016, 5:25 PM

See inline comments.

include/llvm/Target/TargetLowering.h
308 ↗	(On Diff #53644)	I see \brief used in other comments preceding functions. Should that be here as well?
313 ↗	(On Diff #53644)	This specific statement is a little hard to follow. I think it might be clearer if you say X & Y == Y implies that the bits in Y are a subset of the bits in X. Therefore, the set of bits not in X (~X) unioned (&) with the bits in Y must be the empty set (0). So, X & Y == 0 is equivalent to ~X & Y == 0.
lib/CodeGen/CodeGenPrepare.cpp
5174 ↗	(On Diff #53644)	Maybe I am missing something, or don't understand, but I don't see anything in this routine that checks to make sure this is an == comparison.
test/CodeGen/X86/bmi.ll
174 ↗	(On Diff #53644)	Is this valid for ne? The comment in TargetLowering.h should discuss that. Seems like it is, but it would be worth showing that in the comment. This test should also have some tests that match the pattern, except use le, ge, lt, gt as well, as those are not legal patterns to hit the optimization.

This certainly a good pattern to catch, but I'm not sure that CGP is the right place for this. We generally have things in CGP to work-around the fact that SDAG/ISel is basic-block local. This kind of thing seems much more natural as a something that should be in DAGCombine.

RKSimon added a subscriber: RKSimon.Apr 14 2016, 1:47 AM

In D19087#400599, @hfinkel wrote:

This certainly a good pattern to catch, but I'm not sure that CGP is the right place for this. We generally have things in CGP to work-around the fact that SDAG/ISel is basic-block local. This kind of thing seems much more natural as a something that should be in DAGCombine.

Yes, I initially thought I'd do this as a DAGCombine, but then I realized that we'd need "isKnownToBeAPowerOfTwo()", and I don't see an SDNode equivalent for the IR version. I'm not sure where we draw the line between CGP and DAGCombine, but duplicating isKnownToBeAPowerOfTwo() scared me away.

In D19087#401119, @spatel wrote:

In D19087#400599, @hfinkel wrote:

This certainly a good pattern to catch, but I'm not sure that CGP is the right place for this. We generally have things in CGP to work-around the fact that SDAG/ISel is basic-block local. This kind of thing seems much more natural as a something that should be in DAGCombine.

Yes, I initially thought I'd do this as a DAGCombine, but then I realized that we'd need "isKnownToBeAPowerOfTwo()", and I don't see an SDNode equivalent for the IR version. I'm not sure where we draw the line between CGP and DAGCombine, but duplicating isKnownToBeAPowerOfTwo() scared me away.

Why? How do we select the instructions anyway? For x86 or ppc, would we not want to limit the applicability to constants? It needs to be a constant for rlwinm I'd imagine, and bt too?. For constants, we can check this with existing functionality.

In D19087#401155, @hfinkel wrote:

In D19087#401119, @spatel wrote:

In D19087#400599, @hfinkel wrote:

This certainly a good pattern to catch, but I'm not sure that CGP is the right place for this. We generally have things in CGP to work-around the fact that SDAG/ISel is basic-block local. This kind of thing seems much more natural as a something that should be in DAGCombine.

Yes, I initially thought I'd do this as a DAGCombine, but then I realized that we'd need "isKnownToBeAPowerOfTwo()", and I don't see an SDNode equivalent for the IR version. I'm not sure where we draw the line between CGP and DAGCombine, but duplicating isKnownToBeAPowerOfTwo() scared me away.

Why? How do we select the instructions anyway? For x86 or ppc, would we not want to limit the applicability to constants? It needs to be a constant for rlwinm I'd imagine, and bt too?. For constants, we can check this with existing functionality.

isKnownToBeAPowerOfTwo() is ~100 lines and recursive. We can approximate it using computeKnownBits(), but I think getting the full power of the IR version would mean duplicating the code.

One of the x86 regression tests shows the case where a simple constant check won't do:

define i1 @and_cmp_const_power_of_two(i32 %x, i32 %y) {
  %shl = shl i32 1, %y
  %and = and i32 %x, %shl
  %cmp = icmp ne i32 %and, %shl
  ret i1 %cmp
}

X86TargetLowering::LowerToBT() uses computeKnownBits() to know that's a power-of-2 mask.

In D19087#401302, @spatel wrote:
In D19087#401155, @hfinkel wrote:

In D19087#401119, @spatel wrote:

In D19087#400599, @hfinkel wrote:

This certainly a good pattern to catch, but I'm not sure that CGP is the right place for this. We generally have things in CGP to work-around the fact that SDAG/ISel is basic-block local. This kind of thing seems much more natural as a something that should be in DAGCombine.

Yes, I initially thought I'd do this as a DAGCombine, but then I realized that we'd need "isKnownToBeAPowerOfTwo()", and I don't see an SDNode equivalent for the IR version. I'm not sure where we draw the line between CGP and DAGCombine, but duplicating isKnownToBeAPowerOfTwo() scared me away.

Why? How do we select the instructions anyway? For x86 or ppc, would we not want to limit the applicability to constants? It needs to be a constant for rlwinm I'd imagine, and bt too?. For constants, we can check this with existing functionality.

isKnownToBeAPowerOfTwo() is ~100 lines and recursive. We can approximate it using computeKnownBits(), but I think getting the full power of the IR version would mean duplicating the code.

One of the x86 regression tests shows the case where a simple constant check won't do:
define i1 @and_cmp_const_power_of_two(i32 %x, i32 %y) {
  %shl = shl i32 1, %y
  %and = and i32 %x, %shl
  %cmp = icmp ne i32 %and, %shl
  ret i1 %cmp
}
X86TargetLowering::LowerToBT() uses computeKnownBits() to know that's a power-of-2 mask.

Okay, but in the end, the check used has to match the capabilities of the check used in the backend for lowering. In this case, it needs to use computeKnownBits(). Regardless, in this case, as I understand it, it is not enough to know that the number is some power of 2, we need to know which power of 2.

In D19087#401346, @hfinkel wrote:

...

X86TargetLowering::LowerToBT() uses computeKnownBits() to know that's a power-of-2 mask.

Okay, but in the end, the check used has to match the capabilities of the check used in the backend for lowering. In this case, it needs to use computeKnownBits(). Regardless, in this case, as I understand it, it is not enough to know that the number is some power of 2, we need to know which power of 2.

I don't understand the last statement. For x86, the test case demonstrates that "some power-of-2" is the requirement to avoid impeding lowering to 'bt'. For PPC, I think the requirement is the same - we could use a variable shift instruction (eg, 'rlwnm' rather than 'rlwinm').

If the PPC lowering is not doing that, wouldn't it be considered an optimization bug? I suppose this leads us to the conclusion that we really do need to duplicate isKnownToBeAPowerOfTwo() for SDNodes so everyone can use it.

Alternatively, we could wait even longer to attempt this transform - MachineCombiner? - so we avoid stepping on the other optimization. That looks simple enough for the x86 tests, but PPC would take a lot of work.

In D19087#401302, @spatel wrote:
In D19087#401155, @hfinkel wrote:

In D19087#401119, @spatel wrote:

In D19087#400599, @hfinkel wrote:

This certainly a good pattern to catch, but I'm not sure that CGP is the right place for this. We generally have things in CGP to work-around the fact that SDAG/ISel is basic-block local. This kind of thing seems much more natural as a something that should be in DAGCombine.

Yes, I initially thought I'd do this as a DAGCombine, but then I realized that we'd need "isKnownToBeAPowerOfTwo()", and I don't see an SDNode equivalent for the IR version. I'm not sure where we draw the line between CGP and DAGCombine, but duplicating isKnownToBeAPowerOfTwo() scared me away.

Why? How do we select the instructions anyway? For x86 or ppc, would we not want to limit the applicability to constants? It needs to be a constant for rlwinm I'd imagine, and bt too?. For constants, we can check this with existing functionality.

isKnownToBeAPowerOfTwo() is ~100 lines and recursive. We can approximate it using computeKnownBits(), but I think getting the full power of the IR version would mean duplicating the code.

One of the x86 regression tests shows the case where a simple constant check won't do:
define i1 @and_cmp_const_power_of_two(i32 %x, i32 %y) {
  %shl = shl i32 1, %y
  %and = and i32 %x, %shl
  %cmp = icmp ne i32 %and, %shl
  ret i1 %cmp
}
X86TargetLowering::LowerToBT() uses computeKnownBits() to know that's a power-of-2 mask.

In D19087#401505, @spatel wrote:

In D19087#401346, @hfinkel wrote:

...

X86TargetLowering::LowerToBT() uses computeKnownBits() to know that's a power-of-2 mask.

Okay, but in the end, the check used has to match the capabilities of the check used in the backend for lowering. In this case, it needs to use computeKnownBits(). Regardless, in this case, as I understand it, it is not enough to know that the number is some power of 2, we need to know which power of 2.

I don't understand the last statement. For x86, the test case demonstrates that "some power-of-2" is the requirement to avoid impeding lowering to 'bt'. For PPC, I think the requirement is the same - we could use a variable shift instruction (eg, 'rlwnm' rather than 'rlwinm').

If the PPC lowering is not doing that, wouldn't it be considered an optimization bug? I suppose this leads us to the conclusion that we really do need to duplicate isKnownToBeAPowerOfTwo() for SDNodes so everyone can use it.

You're right, it does not need to be a constant. However, we still need to know how to determine which power of two it will be (i.e. the shift amount), even if not a constant. That is stronger than just knowing it is some power of two. In any case, we should move this to DAGCombine and sync the associated logic with that used for instruction selection.

Alternatively, we could wait even longer to attempt this transform - MachineCombiner? - so we avoid stepping on the other optimization. That looks simple enough for the x86 tests, but PPC would take a lot of work.

Not sure why PPC would take more work than x86 (both backends currently use the MachineCombiner for other purposes). Regardless, I'd not jump here unless necessary.

In D19087#401557, @hfinkel wrote:

Not sure why PPC would take more work than x86 (both backends currently use the MachineCombiner for other purposes). Regardless, I'd not jump here unless necessary.

In the test cases, PPC is branching, so I figure there'd be more complexity undoing that than the simple instruction replacement/deletion that x86 needs. But I agree, I don't want to sink that far down if I don't have to.

Before abandoning ship, provide inline answers to Kevin's questions.

include/llvm/Target/TargetLowering.h
308 ↗	(On Diff #53644)	Doxygen's autobrief was enabled some time in the last few months, so \brief is just noise as I understand it. We can remove \brief from documentation comments for readability.
313 ↗	(On Diff #53644)	This may be a case where we should just let the equation speak for itself. :)
lib/CodeGen/CodeGenPrepare.cpp
5174 ↗	(On Diff #53644)	The equality check is (poorly) placed on the first line.
test/CodeGen/X86/bmi.ll
174 ↗	(On Diff #53644)	Yes, the transform is valid for 'ne'. This patch is just undoing the opposite direction transform performed by InstCombine in visitICmpInstWithInstAndIntCst(). But I agree we should have tests to confirm that the transform doesn't fire for lt/gt.

Replied to inline comments.

include/llvm/Target/TargetLowering.h
308 ↗	(On Diff #53644)	I didn't realize that. Not too long ago a reviewer had me add \brief in some code I was doing.
313 ↗	(On Diff #53644)	That sounds fine.
lib/CodeGen/CodeGenPrepare.cpp
5174 ↗	(On Diff #53644)	I see it now. I skipped right by it.

Patch updated:

Move the transform to the DAG vs. the earlier rev that was based in CGP. It turns out that the 'power-of-2' scenario is already handled for us here; TargetLowering has a function called valueHasExactlyOneBitSet() that is used in SimplifySetCC().

I removed the PPC diffs for simplicity, but my local testing shows the same benefits as the earlier rev of the patch.

I'm not sure if this transform should live in TargetLowering, but I've made it a helper of SimplifySetCC() in this draft because that's where all of the related transforms are. DAGCombiner::visitSETCC() calls DAGCombiner::SimplifySetCC() calls TargetLowering::SimplifySetCC(). It's not clear to me what the trade-offs would be if we added transforms directly to DAGCombiner::visitSETCC().

Did you add tests to check that lt/gt conditions don't get transformed?

lib/CodeGen/SelectionDAG/TargetLowering.cpp
1336 ↗	(On Diff #56314)	Although this might be your expectation, I don't think it should be an assertion. I think it should be if (!valueHasExactlyOneBitSet(Y, DAG)) return SDVALUE(); Using an assertion could cause a compiler internal error during a compilation. There is a safe and correct return by using SDValue(), and this code shouldn't really need to know whether the transform for bit test should happen before or after it.

In D19087#422972, @kbsmith1 wrote:

Did you add tests to check that lt/gt conditions don't get transformed?

Oops - let me add those and update the patch. The EQ/NE check is hopefully more obvious in the code now.

lib/CodeGen/SelectionDAG/TargetLowering.cpp
1336 ↗	(On Diff #56314)	I made this an assert rather an actual condition because I'm assuming this helper function should be tightly coupled with "SimplifySetCC()" below, and I thought duplicating a call to computeKnownBits() (by way of valueHasExactlyOneBitSet()) would be considered wasteful. The reason I've broken this into a separate function is because it felt wrong to add more code directly to SimplifySetCC() - that is already approaching 1000 lines.

Patch updated:

Added tests for predicates other than EQ/NE to show that we don't transform those.
Changed assert of valueHasExactlyOneBitSet() to a real check (if my earlier reasoning holds, I can change this back).

In D19087#423496, @spatel wrote:

In D19087#422972, @kbsmith1 wrote:

Did you add tests to check that lt/gt conditions don't get transformed?

Oops - let me add those and update the patch. The EQ/NE check is hopefully more obvious in the code now.

Thank you for adding those tests now, and yes the EQ/NE check was much more obvious when I read it this time. Thank you.

lib/CodeGen/SelectionDAG/TargetLowering.cpp
1336 ↗	(On Diff #56421)	I agree with your reasoning on breaking it out into this separate function. I think that greatly enhanced the readability. Thank you also for changing this to not be an assertion. I think that makes the code less fragile for the future.

This revision is now accepted and ready to land.May 6 2016, 9:09 AM

Closed by commit rL268858: [x86, BMI] add TLI hook for 'andn' and use it to simplify comparisons (authored by spatel). · Explain WhyMay 7 2016, 8:09 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D20050: [TargetLowering] make helper function for SetCC + and optimizations (NFC).May 7 2016, 10:16 AM

spatel mentioned this in rL268932: [TargetLowering] make helper function for SetCC + and optimizations (NFC).May 9 2016, 9:48 AM

spatel mentioned this in D20439: [SelectionDAG] rename/move isKnownToBeAPowerOfTwo() from TargetLowering.May 19 2016, 8:47 AM

spatel mentioned this in D27221: [AArch64] allow and-not-compare transform to form 'bics'.Nov 29 2016, 12:04 PM

spatel mentioned this in rL288206: [AArch64] allow and-not-compare transform to form 'bics'.Nov 29 2016, 2:39 PM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Target/

TargetLowering.h

16 lines

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

49 lines

Target/

X86/

X86ISelLowering.h

2 lines

X86ISelLowering.cpp

12 lines

test/

CodeGen/

X86/

bmi.ll

75 lines

Diff 56498

llvm/trunk/include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 328 Lines • ▼ Show 20 Lines	public:
/// into a single machine instruction of a form like:		/// into a single machine instruction of a form like:
/// \code		/// \code
/// brOnBitSet %register, #bitNumber, dest		/// brOnBitSet %register, #bitNumber, dest
/// \endcode		/// \endcode
bool isMaskAndBranchFoldingLegal() const {		bool isMaskAndBranchFoldingLegal() const {
return MaskAndBranchFoldingIsLegal;		return MaskAndBranchFoldingIsLegal;
}		}

		/// Return true if the target should transform:
		/// (X & Y) == Y ---> (~X & Y) == 0
		/// (X & Y) != Y ---> (~X & Y) != 0
		///
		/// This may be profitable if the target has a bitwise and-not operation that
		/// sets comparison flags. A target may want to limit the transformation based
		/// on the type of Y or if Y is a constant.
		///
		/// Note that the transform will not occur if Y is known to be a power-of-2
		/// because a mask and compare of a single bit can be handled by inverting the
		/// predicate, for example:
		/// (X & 8) == 8 ---> (X & 8) != 0
		virtual bool hasAndNotCompare(SDValue Y) const {
		return false;
		}

/// \brief Return true if the target wants to use the optimization that		/// \brief Return true if the target wants to use the optimization that
/// turns ext(promotableInst1(...(promotableInstN(load)))) into		/// turns ext(promotableInst1(...(promotableInstN(load)))) into
/// promotedInst1(...(promotedInstN(ext(load)))).		/// promotedInst1(...(promotedInstN(ext(load)))).
bool enableExtLdPromotion() const { return EnableExtLdPromotion; }		bool enableExtLdPromotion() const { return EnableExtLdPromotion; }

/// Return true if the target can combine store(extractelement VectorTy,		/// Return true if the target can combine store(extractelement VectorTy,
/// Idx).		/// Idx).
/// \p Cost[out] gives the cost of that transformation when this is true.		/// \p Cost[out] gives the cost of that transformation when this is true.
▲ Show 20 Lines • Show All 2,672 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 1,298 Lines • ▼ Show 20 Lines	case TargetLowering::ZeroOrOneBooleanContent:
return (N->isOne() && !SExt) \|\| (SExt && (N->getValueType(0) != MVT::i1));		return (N->isOne() && !SExt) \|\| (SExt && (N->getValueType(0) != MVT::i1));
case TargetLowering::UndefinedBooleanContent:		case TargetLowering::UndefinedBooleanContent:
case TargetLowering::ZeroOrNegativeOneBooleanContent:		case TargetLowering::ZeroOrNegativeOneBooleanContent:
return N->isAllOnesValue() && SExt;		return N->isAllOnesValue() && SExt;
}		}
llvm_unreachable("Unexpected enumeration.");		llvm_unreachable("Unexpected enumeration.");
}		}

		/// If the target supports an 'and-not' or 'and-complement' logic operation,
		/// try to use that to make a comparison operation more efficient.
		static SDValue createAndNotSetCC(EVT VT, SDValue N0, SDValue N1,
		ISD::CondCode Cond, SelectionDAG &DAG,
		SDLoc dl) {
		// Match these patterns in any of their permutations:
		// (X & Y) == Y
		// (X & Y) != Y
		if (N1.getOpcode() == ISD::AND && N0.getOpcode() != ISD::AND)
		std::swap(N0, N1);

		if (N0.getOpcode() != ISD::AND \|\| !N0.hasOneUse() \|\|
		(Cond != ISD::SETEQ && Cond != ISD::SETNE))
		return SDValue();

		SDValue X, Y;
		if (N0.getOperand(0) == N1) {
		X = N0.getOperand(1);
		Y = N0.getOperand(0);
		} else if (N0.getOperand(1) == N1) {
		X = N0.getOperand(0);
		Y = N0.getOperand(1);
		} else {
		return SDValue();
		}

		// Bail out if the compare operand that we want to turn into a zero is already
		// a zero (otherwise, infinite loop).
		auto *YConst = dyn_cast<ConstantSDNode>(Y);
		if (YConst && YConst->isNullValue())
		return SDValue();

		// We don't want to do this transform if the mask is a single bit because
		// there are more efficient ways to deal with that case (for example, 'bt' on
		// x86 or 'rlwinm' on PPC).
		if (!DAG.getTargetLoweringInfo().hasAndNotCompare(Y) \|\|
		valueHasExactlyOneBitSet(Y, DAG))
		return SDValue();

		// Transform this into: ~X & Y == 0.
		EVT OpVT = X.getValueType();
		SDValue NotX = DAG.getNOT(SDLoc(X), X, OpVT);
		SDValue NewAnd = DAG.getNode(ISD::AND, SDLoc(N0), OpVT, NotX, Y);
		return DAG.getSetCC(dl, VT, NewAnd, DAG.getConstant(0, dl, OpVT), Cond);
		}

/// Try to simplify a setcc built with the specified operands and cc. If it is		/// Try to simplify a setcc built with the specified operands and cc. If it is
/// unable to simplify it, return a null SDValue.		/// unable to simplify it, return a null SDValue.
SDValue		SDValue
TargetLowering::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,		TargetLowering::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
ISD::CondCode Cond, bool foldBooleans,		ISD::CondCode Cond, bool foldBooleans,
DAGCombinerInfo &DCI, SDLoc dl) const {		DAGCombinerInfo &DCI, SDLoc dl) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;

▲ Show 20 Lines • Show All 846 Lines • ▼ Show 20 Lines	if (VT != MVT::i1) {
if (!DCI.isCalledByLegalizer())		if (!DCI.isCalledByLegalizer())
DCI.AddToWorklist(N0.getNode());		DCI.AddToWorklist(N0.getNode());
// FIXME: If running after legalize, we probably can't do this.		// FIXME: If running after legalize, we probably can't do this.
N0 = DAG.getNode(ISD::ZERO_EXTEND, dl, VT, N0);		N0 = DAG.getNode(ISD::ZERO_EXTEND, dl, VT, N0);
}		}
return N0;		return N0;
}		}

		if (SDValue AndNotCC = createAndNotSetCC(VT, N0, N1, Cond, DAG, dl))
		return AndNotCC;

// Could not fold it.		// Could not fold it.
return SDValue();		return SDValue();
}		}

/// Returns true (and the GlobalValue and the offset) if the node is a		/// Returns true (and the GlobalValue and the offset) if the node is a
/// GlobalAddress + offset.		/// GlobalAddress + offset.
bool TargetLowering::isGAPlusOffset(SDNode N, const GlobalValue &GA,		bool TargetLowering::isGAPlusOffset(SDNode N, const GlobalValue &GA,
int64_t &Offset) const {		int64_t &Offset) const {
▲ Show 20 Lines • Show All 1,387 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 745 Lines • ▼ Show 20 Lines	public:

/// This method returns the name of a target specific DAG node.		/// This method returns the name of a target specific DAG node.
const char *getTargetNodeName(unsigned Opcode) const override;		const char *getTargetNodeName(unsigned Opcode) const override;

bool isCheapToSpeculateCttz() const override;		bool isCheapToSpeculateCttz() const override;

bool isCheapToSpeculateCtlz() const override;		bool isCheapToSpeculateCtlz() const override;

		bool hasAndNotCompare(SDValue Y) const override;

/// Return the value type to use for ISD::SETCC.		/// Return the value type to use for ISD::SETCC.
EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,		EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,
EVT VT) const override;		EVT VT) const override;

/// Determine which of the bits specified in Mask are known to be either		/// Determine which of the bits specified in Mask are known to be either
/// zero or one and return them in the KnownZero/KnownOne bitsets.		/// zero or one and return them in the KnownZero/KnownOne bitsets.
void computeKnownBitsForTargetNode(const SDValue Op,		void computeKnownBitsForTargetNode(const SDValue Op,
APInt &KnownZero,		APInt &KnownZero,
▲ Show 20 Lines • Show All 461 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,116 Lines • ▼ Show 20 Lines	bool X86TargetLowering::isCheapToSpeculateCttz() const {
return Subtarget.hasBMI();		return Subtarget.hasBMI();
}		}

bool X86TargetLowering::isCheapToSpeculateCtlz() const {		bool X86TargetLowering::isCheapToSpeculateCtlz() const {
// Speculate ctlz only if we can directly use LZCNT.		// Speculate ctlz only if we can directly use LZCNT.
return Subtarget.hasLZCNT();		return Subtarget.hasLZCNT();
}		}

		bool X86TargetLowering::hasAndNotCompare(SDValue Y) const {
		if (!Subtarget.hasBMI())
		return false;

		// There are only 32-bit and 64-bit forms for 'andn'.
		EVT VT = Y.getValueType();
		if (VT != MVT::i32 && VT != MVT::i64)
		return false;

		return true;
		}

/// Return true if every element in Mask, beginning		/// Return true if every element in Mask, beginning
/// from position Pos and ending in Pos+Size is undef.		/// from position Pos and ending in Pos+Size is undef.
static bool isUndefInRange(ArrayRef<int> Mask, unsigned Pos, unsigned Size) {		static bool isUndefInRange(ArrayRef<int> Mask, unsigned Pos, unsigned Size) {
for (unsigned i = Pos, e = Pos + Size; i != e; ++i)		for (unsigned i = Pos, e = Pos + Size; i != e; ++i)
if (0 <= Mask[i])		if (0 <= Mask[i])
return false;		return false;
return true;		return true;
}		}
▲ Show 20 Lines • Show All 26,338 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/bmi.ll

	Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: sete %al			; CHECK-NEXT: sete %al
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%notx = xor i32 %x, -1			%notx = xor i32 %x, -1
	%and = and i32 %notx, %y			%and = and i32 %notx, %y
	%cmp = icmp eq i32 %and, 0			%cmp = icmp eq i32 %and, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	; TODO: Recognize a disguised andn in the following 4 tests.			; Recognize a disguised andn in the following 4 tests.
	define i1 @and_cmp1(i32 %x, i32 %y) {			define i1 @and_cmp1(i32 %x, i32 %y) {
	; CHECK-LABEL: and_cmp1:			; CHECK-LABEL: and_cmp1:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: andl %esi, %edi			; CHECK-NEXT: andnl %esi, %edi, %eax
	; CHECK-NEXT: cmpl %esi, %edi
	; CHECK-NEXT: sete %al			; CHECK-NEXT: sete %al
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%and = and i32 %x, %y			%and = and i32 %x, %y
	%cmp = icmp eq i32 %and, %y			%cmp = icmp eq i32 %and, %y
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @and_cmp2(i32 %x, i32 %y) {			define i1 @and_cmp2(i32 %x, i32 %y) {
	; CHECK-LABEL: and_cmp2:			; CHECK-LABEL: and_cmp2:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: andl %esi, %edi			; CHECK-NEXT: andnl %esi, %edi, %eax
	; CHECK-NEXT: cmpl %esi, %edi
	; CHECK-NEXT: setne %al			; CHECK-NEXT: setne %al
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%and = and i32 %y, %x			%and = and i32 %y, %x
	%cmp = icmp ne i32 %and, %y			%cmp = icmp ne i32 %and, %y
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @and_cmp3(i32 %x, i32 %y) {			define i1 @and_cmp3(i32 %x, i32 %y) {
	; CHECK-LABEL: and_cmp3:			; CHECK-LABEL: and_cmp3:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: andl %esi, %edi			; CHECK-NEXT: andnl %esi, %edi, %eax
	; CHECK-NEXT: cmpl %edi, %esi
	; CHECK-NEXT: sete %al			; CHECK-NEXT: sete %al
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%and = and i32 %x, %y			%and = and i32 %x, %y
	%cmp = icmp eq i32 %y, %and			%cmp = icmp eq i32 %y, %and
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @and_cmp4(i32 %x, i32 %y) {			define i1 @and_cmp4(i32 %x, i32 %y) {
	; CHECK-LABEL: and_cmp4:			; CHECK-LABEL: and_cmp4:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: andl %esi, %edi			; CHECK-NEXT: andnl %esi, %edi, %eax
	; CHECK-NEXT: cmpl %edi, %esi
	; CHECK-NEXT: setne %al			; CHECK-NEXT: setne %al
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%and = and i32 %y, %x			%and = and i32 %y, %x
	%cmp = icmp ne i32 %y, %and			%cmp = icmp ne i32 %y, %and
	ret i1 %cmp			ret i1 %cmp
	}			}

	; A mask and compare against constant is ok for an 'andn' too			; A mask and compare against constant is ok for an 'andn' too
	; even though the BMI instruction doesn't have an immediate form.			; even though the BMI instruction doesn't have an immediate form.
	define i1 @and_cmp_const(i32 %x) {			define i1 @and_cmp_const(i32 %x) {
	; CHECK-LABEL: and_cmp_const:			; CHECK-LABEL: and_cmp_const:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: andl $43, %edi			; CHECK-NEXT: movl $43, %eax
	; CHECK-NEXT: cmpl $43, %edi			; CHECK-NEXT: andnl %eax, %edi, %eax
	; CHECK-NEXT: sete %al			; CHECK-NEXT: sete %al
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%and = and i32 %x, 43			%and = and i32 %x, 43
	%cmp = icmp eq i32 %and, 43			%cmp = icmp eq i32 %and, 43
	ret i1 %cmp			ret i1 %cmp
	}			}

				; But don't use 'andn' if the mask is a power-of-two.
				define i1 @and_cmp_const_power_of_two(i32 %x, i32 %y) {
				; CHECK-LABEL: and_cmp_const_power_of_two:
				; CHECK: # BB#0:
				; CHECK-NEXT: btl %esi, %edi
				; CHECK-NEXT: setae %al
				; CHECK-NEXT: retq
				;
				%shl = shl i32 1, %y
				%and = and i32 %x, %shl
				%cmp = icmp ne i32 %and, %shl
				ret i1 %cmp
				}

				; Don't transform to 'andn' if there's another use of the 'and'.
				define i32 @and_cmp_not_one_use(i32 %x) {
				; CHECK-LABEL: and_cmp_not_one_use:
				; CHECK: # BB#0:
				; CHECK-NEXT: andl $37, %edi
				; CHECK-NEXT: cmpl $37, %edi
				; CHECK-NEXT: sete %al
				; CHECK-NEXT: movzbl %al, %eax
				; CHECK-NEXT: addl %edi, %eax
				; CHECK-NEXT: retq
				;
				%and = and i32 %x, 37
				%cmp = icmp eq i32 %and, 37
				%ext = zext i1 %cmp to i32
				%add = add i32 %and, %ext
				ret i32 %add
				}

				; Verify that we're not transforming invalid comparison predicates.
				define i1 @not_an_andn1(i32 %x, i32 %y) {
				; CHECK-LABEL: not_an_andn1:
				; CHECK: # BB#0:
				; CHECK-NEXT: andl %esi, %edi
				; CHECK-NEXT: cmpl %edi, %esi
				; CHECK-NEXT: setg %al
				; CHECK-NEXT: retq
				%and = and i32 %x, %y
				%cmp = icmp sgt i32 %y, %and
				ret i1 %cmp
				}

				define i1 @not_an_andn2(i32 %x, i32 %y) {
				; CHECK-LABEL: not_an_andn2:
				; CHECK: # BB#0:
				; CHECK-NEXT: andl %esi, %edi
				; CHECK-NEXT: cmpl %edi, %esi
				; CHECK-NEXT: setbe %al
				; CHECK-NEXT: retq
				%and = and i32 %y, %x
				%cmp = icmp ule i32 %y, %and
				ret i1 %cmp
				}

	; Don't choose a 'test' if an 'andn' can be used.			; Don't choose a 'test' if an 'andn' can be used.
	define i1 @andn_cmp_swap_ops(i64 %x, i64 %y) {			define i1 @andn_cmp_swap_ops(i64 %x, i64 %y) {
	; CHECK-LABEL: andn_cmp_swap_ops:			; CHECK-LABEL: andn_cmp_swap_ops:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: andnq %rsi, %rdi, %rax			; CHECK-NEXT: andnq %rsi, %rdi, %rax
	; CHECK-NEXT: sete %al			; CHECK-NEXT: sete %al
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%notx = xor i64 %x, -1			%notx = xor i64 %x, -1
	▲ Show 20 Lines • Show All 387 Lines • Show Last 20 Lines