This is an archive of the discontinued LLVM Phabricator instance.

AArch64: Fold immediate into the immediate field of logical instructions
ClosedPublic

Authored by ahatanak on Oct 2 2014, 4:08 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
• tstellarAMD
mcrosier

Commits

rG22e839f4b2d2: [AArch64] Improve code generation for logical instructions taking immediate…
rG19077aaee0e0: [AArch64] Improve code generation for logical instructions taking immediate…
rGe327f098329f: [AArch64] Improve code generation for logical instructions taking immediate…
rL301019: [AArch64] Improve code generation for logical instructions taking
rL300930: [AArch64] Improve code generation for logical instructions taking
rL300913: [AArch64] Improve code generation for logical instructions taking

Summary

llvm currently turns the following code

void foo1(int a, char *p) {

int t = a & 0xfd;
*p = t;

}

into these instructions:

movz w8, #253
and w8, w0, w8
strb w8, [x1]

This can be done using just two instructions, since we don't care what the upper 24-bits of the "and" instruction are.

and w8, w0, 0xfffffffd
strb w8, [x1]

This patch adds a target hook to TargetLowering::TargetLoweringOpt and overrides it in the AArch64 backend to sign-extend an immediate operand if the upper bits are not demanded and sign-extending enables folding the immediate into the instruction.

This optimization speeds up 253.perlbmk by 5%.

Diff Detail

Repository: rL LLVM

Event Timeline

ahatanak updated this revision to Diff 14357.Oct 2 2014, 4:08 PM

ahatanak retitled this revision from to AArch64: Fold immediate into the immediate field of logical instructions.

ahatanak updated this object.

ahatanak edited the test plan for this revision. (Show Details)

ahatanak added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptOct 2 2014, 4:08 PM

Nice gain!! Just a few nits.

include/llvm/Target/TargetLowering.h
2056 ↗	(On Diff #14357)	s/LTO/TLO ?
lib/Target/AArch64/AArch64ISelLowering.cpp
674 ↗	(On Diff #14357)	Remove vertical whitespace.
681 ↗	(On Diff #14357)	Please add an assert message: assert(cond && "msg");

aadg added a subscriber: aadg.Oct 3 2014, 2:20 AM

Address Chad's comments.

ping

lib/Target/AArch64/AArch64ISelLowering.cpp
685 ↗	(On Diff #14392)	The code I previously had here didn't make sense, so I cleaned it up. It was checking Size > 0 after Size = std::max(VT.getSizeInBits(), 32u). Also, I changed it to break if VT is a vector, just in case this function is called on a vector node.

ahatanak added a reviewer: t.p.northover.Oct 22 2014, 10:03 AM

Hi Akira,

I've got a couple of comments:

include/llvm/Target/TargetLowering.h
2058 ↗	(On Diff #14392)	Functions should usually start with a lower-case letter.
lib/Target/AArch64/AArch64ISelLowering.cpp
693–694 ↗	(On Diff #14392)	OK, so we've got an immediate "ab...iJK..Z" where lower-case digits aren't used. Sign extending converts this to "JJ...JJK...Z". Is there any particular reason to expect that's representable? It seems like a bit of a shot in the dark.
700 ↗	(On Diff #14392)	Shifting a negative number left is undefined behaviour.

Tim, I fixed the undefined behavior and renamed the function to start with lower-case letter.

The reason I only do sign-extension is that it seemed to be the cheapest way to get the most gain without hurting compile time or making the code overly complicated. I looked at the instructions llvm emits, and I saw many places where logical instructions were followed by truncating stores.

I can try changing it to do a more exhaustive search of the bit patterns to see how much further performance can be improved, if that's necessary.

Hi Akira,

The reason I only do sign-extension is that it seemed to be the cheapest way to get the most gain without hurting compile time or making the code overly complicated. I looked at the instructions llvm emits, and I saw many places where logical instructions were followed by truncating stores.

I think the kind of store being done is largely orthogonal to the
calculation most likely to get you a valid immediate. It tells you
what bits can be ignored, but nothing about the best way to fill them.

So I think it's reasonable to start with just assuming that
DemandedBits has some number of low bits set and the rest ignored
based on your observations. I think it's harder to justify coming up
with a single NewImm value by sign extending the existing immediate
and giving up if that fails.

Cheers.

Tim.

Hi again,

I think it's harder to justify coming up
with a single NewImm value by sign extending the existing immediate
and giving up if that fails.

I've been doing some more thinking here, and if we're willing to
assume the truncation is to a power of two type (I am, anyone using
i14 deserves whatever they get) then I think we can cover *all* valid
cases by instead replicating the demanded bits across the 32-bit
register. E.g. try 0xfdfdfdfd instead of 0xfffffffd for the 8-bit
0xfd.

The argument goes that if the input is morally contiguous, then there
are multiple representations involving sign extension to 4, 8, 16 or
32 followed by replication. Otherwise the replication width is less
than the demanded width so we're completely forced and have to
continue the replication that's already started.

Cheers.

Tim.

In D5591#15, @t.p.northover wrote:

The argument goes that if the input is morally contiguous, then there
are multiple representations involving sign extension to 4, 8, 16 or
32 followed by replication. Otherwise the replication width is less
than the demanded width so we're completely forced and have to
continue the replication that's already started.

Sign-extension followed by replication enables converting constants like 0xfdfd (demanded = 16-bits) or 0x3dfd (demanded = 14bits) to bimm, where just sign-extending fails, but I think we should also try to handle cases like 0x19 (demanded = 5bits). This isn't a bimm as it stands and sign-extending doesn't make it a bimm either. In this case, we have to copy bits 1-3 to bits 5-7.

I think I can come up with a patch that does this.

Rewrote the algorithm for searching for a bitmask immediate based on Tim's feedback.

I had to make changes to DAGCombiner::visitAND because it was transforming the DAG in way that made it impossible to do any optimization in optimizeConstant (the code here was originally committed in r97616). This change doesn't seem to have any noticeable impact on performance, but I'm still investigating.

Sorry for not updating the patch for a long time. Here is my new patch.

For the most part, the new patch takes the same approach as the previous patch to find an immediate operand, but there were a couple of changes made.

Function optimizeLogicalImm emits AArch64 machine nodes to prevent the target-independent dagcombine from undoing the optimization. With this change, there is no need to make changes in DAGCombiner::visitOR as I did in my previous patch.
In optimizeLogicalImm, rotation is used to avoid using branches and simplify the logic.
A target-specific function object for optimizing nodes with immediates is passed to the constructor of TargetLoweringOpt, which gets called later in TargetLoweringOpt::ShrinkDemandedConstant.

Herald added a subscriber: rengolin. · View Herald TranscriptMay 19 2015, 1:51 PM

Rebase and make a couple of changes:

Add comments.
In optimizeLogicalImm, return false instead of true when the immediate is already a bimm32 or bimm64 so that it doesn't inhibit the optimization done later in AArch64ISelDAGToDAG.cpp that emits BFXIL.
Remove redundant instructions in bitreverse.ll.

Some drive-by comments; I haven't looked into the optimizeLogicalImm logic yet.

include/llvm/Target/TargetLowering.h
2252 ↗	(On Diff #71888)	Add more detail on when this is called and what the parameters are?
2278–2281 ↗	(On Diff #71888)	I don't think this is the best location for this: I'd rather have TargetLoweringOpt be "the result of an optimization", and TargetLowering be "how to do optimizations". What do you think of going back to the TLI virtual hook, and fixing the other TLO methods to do the same: https://reviews.llvm.org/differential/diff/73495/
lib/Target/AArch64/AArch64ISelLowering.cpp
745–746 ↗	(On Diff #71888)	Check this only once when initializing the std::function? (or leave it here if you remove the std::function, I suppose)
test/CodeGen/AArch64/optimize-imm.ll
1 ↗	(On Diff #71888)	Triple can be simplified to: -mtriple=aarch64--

ahatanak added inline comments.Oct 4 2016, 10:32 PM

include/llvm/Target/TargetLowering.h
2278–2281 ↗	(On Diff #71888)	I'm not sure what TargetLoweringOpt is supposed to do, but using TLI virtual hooks looks like a better approach.

Address review comments.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptOct 5 2016, 3:21 PM

Herald added subscribers: nhaehnle, arsenm. · View Herald Transcript

ahatanak marked 2 inline comments as done.Oct 5 2016, 3:22 PM

Fix comment.

Herald edited edge metadata. · View Herald TranscriptOct 5 2016, 7:21 PM

Herald added a subscriber: wdng. · View Herald Transcript

Rebase. Move TargetLoweringOpt::SimplifyDemandedBits into TargetLowering.

Herald edited edge metadata. · View Herald TranscriptNov 16 2016, 5:28 PM

Rebase.

Herald added a subscriber: igorb. · View Herald TranscriptJan 24 2017, 3:46 PM

AsafBadouh added a subscriber: AsafBadouh.Jan 25 2017, 1:05 AM

evandro added a subscriber: evandro.Jan 27 2017, 1:18 PM

mcrosier added inline comments.Apr 12 2017, 11:19 AM

lib/Target/AArch64/AArch64ISelLowering.cpp
806 ↗	(On Diff #85648)	You should use AArch64_AM::isLogicalImmediate() here.
901 ↗	(On Diff #85648)	If you put this after the below switch, you'll guarantee Op has operand 1.
907 ↗	(On Diff #85648)	Can't we early exit if we demand all of the bits? E.g., if (Demanded.countPopulation() == Size) return false;
922 ↗	(On Diff #85648)	Default case goes at top of switch, per coding guidelines.

FWIW, I think this is in pretty good shape. Also, I ran correctness tests on everything I've got (e.g., llvm-ts, SPEC200X, internal tests) and saw no correctness failures.

mcrosier added a reviewer: mcrosier.Apr 12 2017, 2:17 PM

Address Chad's comments.

Thanks for working on this, Akira. I believe you've addressed all of my concerns as well as Tim's and Arnaud's. LGTM, assuming you've done the due diligence to ensure this doesn't dramatically increase compile-time.

This revision is now accepted and ready to land.Apr 17 2017, 10:18 AM

I compiled the files in MultiSource/Applications of the test suite and didn't see any measurable increase in compile time. I'll commit this patch today.

Thank you for the review.

Thanks for working on this patch and for being *extremely* patient with the 2.5 year review, Akira!

Closed by commit rL300913: [AArch64] Improve code generation for logical instructions taking (authored by ahatanak). · Explain WhyApr 20 2017, 4:00 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Target/

TargetLowering.h

53 lines

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

66 lines

Target/

AArch64/

AArch64ISelLowering.h

3 lines

AArch64ISelLowering.cpp

139 lines

AMDGPU/

AMDGPUISelLowering.cpp

5 lines

X86/

X86ISelLowering.cpp

4 lines

XCore/

XCoreISelLowering.cpp

4 lines

test/

CodeGen/

AArch64/

optimize-imm.ll

64 lines

Diff 96041

llvm/trunk/include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 2,382 Lines • ▼ Show 20 Lines	struct TargetLoweringOpt {
bool LegalTypes() const { return LegalTys; }		bool LegalTypes() const { return LegalTys; }
bool LegalOperations() const { return LegalOps; }		bool LegalOperations() const { return LegalOps; }

bool CombineTo(SDValue O, SDValue N) {		bool CombineTo(SDValue O, SDValue N) {
Old = O;		Old = O;
New = N;		New = N;
return true;		return true;
}		}
		};

/// Check to see if the specified operand of the specified instruction is a		/// Check to see if the specified operand of the specified instruction is a
/// constant integer. If so, check to see if there are any bits set in the		/// constant integer. If so, check to see if there are any bits set in the
/// constant that are not demanded. If so, shrink the constant and return		/// constant that are not demanded. If so, shrink the constant and return
/// true.		/// true.
bool ShrinkDemandedConstant(SDValue Op, const APInt &Demanded);		bool ShrinkDemandedConstant(SDValue Op, const APInt &Demanded,
		TargetLoweringOpt &TLO) const;

		// Target hook to do target-specific const optimization, which is called by
		// ShrinkDemandedConstant. This function should return true if the target
		// doesn't want ShrinkDemandedConstant to further optimize the constant.
		virtual bool targetShrinkDemandedConstant(SDValue Op, const APInt &Demanded,
		TargetLoweringOpt &TLO) const {
		return false;
		}

/// Convert x+y to (VT)((SmallVT)x+(SmallVT)y) if the casts are free. This		/// Convert x+y to (VT)((SmallVT)x+(SmallVT)y) if the casts are free. This
/// uses isZExtFree and ZERO_EXTEND for the widening cast, but it could be		/// uses isZExtFree and ZERO_EXTEND for the widening cast, but it could be
/// generalized for targets with other types of implicit widening casts.		/// generalized for targets with other types of implicit widening casts.
bool ShrinkDemandedOp(SDValue Op, unsigned BitWidth, const APInt &Demanded,		bool ShrinkDemandedOp(SDValue Op, unsigned BitWidth, const APInt &Demanded,
const SDLoc &dl);		TargetLoweringOpt &TLO) const;

/// Helper for SimplifyDemandedBits that can simplify an operation with		/// Helper for SimplifyDemandedBits that can simplify an operation with
/// multiple uses. This function uses TLI.SimplifyDemandedBits to		/// multiple uses. This function simplifies operand \p OpIdx of \p User and
/// simplify Operand \p OpIdx of \p User and then updated \p User with		/// then updates \p User with the simplified version. No other uses of
/// the simplified version. No other uses of \p OpIdx are updated.		/// \p OpIdx are updated. If \p User is the only user of \p OpIdx, this
/// If \p User is the only user of \p OpIdx, this function behaves exactly		/// function behaves exactly like function SimplifyDemandedBits declared
/// like TLI.SimplifyDemandedBits except that it also updates the DAG by		/// below except that it also updates the DAG by calling
/// calling DCI.CommitTargetLoweringOpt.		/// DCI.CommitTargetLoweringOpt.
bool SimplifyDemandedBits(SDNode *User, unsigned OpIdx,		bool SimplifyDemandedBits(SDNode *User, unsigned OpIdx, const APInt &Demanded,
const APInt &Demanded, DAGCombinerInfo &DCI);		DAGCombinerInfo &DCI, TargetLoweringOpt &TLO) const;
};

/// Look at Op. At this point, we know that only the DemandedMask bits of the		/// Look at Op. At this point, we know that only the DemandedMask bits of the
/// result of Op are ever used downstream. If we can use this information to		/// result of Op are ever used downstream. If we can use this information to
/// simplify Op, create a new simplified DAG node and return true, returning		/// simplify Op, create a new simplified DAG node and return true, returning
/// the original and new nodes in Old and New. Otherwise, analyze the		/// the original and new nodes in Old and New. Otherwise, analyze the
/// expression and return a mask of KnownOne and KnownZero bits for the		/// expression and return a mask of KnownOne and KnownZero bits for the
/// expression (used to simplify the caller). The KnownZero/One bits may only		/// expression (used to simplify the caller). The KnownZero/One bits may only
/// be accurate for those bits in the DemandedMask.		/// be accurate for those bits in the DemandedMask.
▲ Show 20 Lines • Show All 833 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 336 Lines • ▼ Show 20 Lines

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Optimization Methods		// Optimization Methods
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// If the specified instruction has a constant integer operand and there are		/// If the specified instruction has a constant integer operand and there are
/// bits set in that constant that are not demanded, then clear those bits and		/// bits set in that constant that are not demanded, then clear those bits and
/// return true.		/// return true.
bool TargetLowering::TargetLoweringOpt::ShrinkDemandedConstant(		bool TargetLowering::ShrinkDemandedConstant(SDValue Op, const APInt &Demanded,
SDValue Op, const APInt &Demanded) {		TargetLoweringOpt &TLO) const {
		SelectionDAG &DAG = TLO.DAG;
SDLoc DL(Op);		SDLoc DL(Op);
unsigned Opcode = Op.getOpcode();		unsigned Opcode = Op.getOpcode();

		// Do target-specific constant optimization.
		if (targetShrinkDemandedConstant(Op, Demanded, TLO))
		return TLO.New.getNode();

// FIXME: ISD::SELECT, ISD::SELECT_CC		// FIXME: ISD::SELECT, ISD::SELECT_CC
switch (Opcode) {		switch (Opcode) {
default:		default:
break;		break;
case ISD::XOR:		case ISD::XOR:
case ISD::AND:		case ISD::AND:
case ISD::OR: {		case ISD::OR: {
auto *Op1C = dyn_cast<ConstantSDNode>(Op.getOperand(1));		auto *Op1C = dyn_cast<ConstantSDNode>(Op.getOperand(1));
if (!Op1C)		if (!Op1C)
return false;		return false;

// If this is a 'not' op, don't touch it because that's a canonical form.		// If this is a 'not' op, don't touch it because that's a canonical form.
const APInt &C = Op1C->getAPIntValue();		const APInt &C = Op1C->getAPIntValue();
if (Opcode == ISD::XOR && (C \| ~Demanded).isAllOnesValue())		if (Opcode == ISD::XOR && (C \| ~Demanded).isAllOnesValue())
return false;		return false;

if (C.intersects(~Demanded)) {		if (C.intersects(~Demanded)) {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDValue NewC = DAG.getConstant(Demanded & C, DL, VT);		SDValue NewC = DAG.getConstant(Demanded & C, DL, VT);
SDValue NewOp = DAG.getNode(Opcode, DL, VT, Op.getOperand(0), NewC);		SDValue NewOp = DAG.getNode(Opcode, DL, VT, Op.getOperand(0), NewC);
return CombineTo(Op, NewOp);		return TLO.CombineTo(Op, NewOp);
}		}

break;		break;
}		}
}		}

return false;		return false;
}		}

/// Convert x+y to (VT)((SmallVT)x+(SmallVT)y) if the casts are free.		/// Convert x+y to (VT)((SmallVT)x+(SmallVT)y) if the casts are free.
/// This uses isZExtFree and ZERO_EXTEND for the widening cast, but it could be		/// This uses isZExtFree and ZERO_EXTEND for the widening cast, but it could be
/// generalized for targets with other types of implicit widening casts.		/// generalized for targets with other types of implicit widening casts.
bool TargetLowering::TargetLoweringOpt::ShrinkDemandedOp(SDValue Op,		bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned BitWidth,
unsigned BitWidth,
const APInt &Demanded,		const APInt &Demanded,
const SDLoc &dl) {		TargetLoweringOpt &TLO) const {
assert(Op.getNumOperands() == 2 &&		assert(Op.getNumOperands() == 2 &&
"ShrinkDemandedOp only supports binary operators!");		"ShrinkDemandedOp only supports binary operators!");
assert(Op.getNode()->getNumValues() == 1 &&		assert(Op.getNode()->getNumValues() == 1 &&
"ShrinkDemandedOp only supports nodes with one result!");		"ShrinkDemandedOp only supports nodes with one result!");

		SelectionDAG &DAG = TLO.DAG;
		SDLoc dl(Op);

// Early return, as this function cannot handle vector types.		// Early return, as this function cannot handle vector types.
if (Op.getValueType().isVector())		if (Op.getValueType().isVector())
return false;		return false;

// Don't do this if the node has another user, which may require the		// Don't do this if the node has another user, which may require the
// full value.		// full value.
if (!Op.getNode()->hasOneUse())		if (!Op.getNode()->hasOneUse())
return false;		return false;
Show All 13 Lines	if (TLI.isTruncateFree(Op.getValueType(), SmallVT) &&
SDValue X = DAG.getNode(Op.getOpcode(), dl, SmallVT,		SDValue X = DAG.getNode(Op.getOpcode(), dl, SmallVT,
DAG.getNode(ISD::TRUNCATE, dl, SmallVT,		DAG.getNode(ISD::TRUNCATE, dl, SmallVT,
Op.getNode()->getOperand(0)),		Op.getNode()->getOperand(0)),
DAG.getNode(ISD::TRUNCATE, dl, SmallVT,		DAG.getNode(ISD::TRUNCATE, dl, SmallVT,
Op.getNode()->getOperand(1)));		Op.getNode()->getOperand(1)));
bool NeedZext = DemandedSize > SmallVTBits;		bool NeedZext = DemandedSize > SmallVTBits;
SDValue Z = DAG.getNode(NeedZext ? ISD::ZERO_EXTEND : ISD::ANY_EXTEND,		SDValue Z = DAG.getNode(NeedZext ? ISD::ZERO_EXTEND : ISD::ANY_EXTEND,
dl, Op.getValueType(), X);		dl, Op.getValueType(), X);
return CombineTo(Op, Z);		return TLO.CombineTo(Op, Z);
}		}
}		}
return false;		return false;
}		}

bool		bool
TargetLowering::TargetLoweringOpt::SimplifyDemandedBits(SDNode *User,		TargetLowering::SimplifyDemandedBits(SDNode *User, unsigned OpIdx,
unsigned OpIdx,
const APInt &Demanded,		const APInt &Demanded,
DAGCombinerInfo &DCI) {		DAGCombinerInfo &DCI,
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		TargetLoweringOpt &TLO) const {
SDValue Op = User->getOperand(OpIdx);		SDValue Op = User->getOperand(OpIdx);
APInt KnownZero, KnownOne;		APInt KnownZero, KnownOne;

if (!TLI.SimplifyDemandedBits(Op, Demanded, KnownZero, KnownOne,		if (!SimplifyDemandedBits(Op, Demanded, KnownZero, KnownOne,
*this, 0, true))		TLO, 0, true))
return false;		return false;


// Old will not always be the same as Op. For example:		// Old will not always be the same as Op. For example:
//		//
// Demanded = 0xffffff		// Demanded = 0xffffff
// Op = i64 truncate (i32 and x, 0xffffff)		// Op = i64 truncate (i32 and x, 0xffffff)
// In this case simplify demand bits will want to replace the 'and' node		// In this case simplify demand bits will want to replace the 'and' node
// with the value 'x', which will give us:		// with the value 'x', which will give us:
// Old = i32 and x, 0xffffff		// Old = i32 and x, 0xffffff
// New = x		// New = x
if (Old.hasOneUse()) {		if (TLO.Old.hasOneUse()) {
// For the one use case, we just commit the change.		// For the one use case, we just commit the change.
DCI.CommitTargetLoweringOpt(*this);		DCI.CommitTargetLoweringOpt(TLO);
return true;		return true;
}		}

// If Old has more than one use then it must be Op, because the		// If Old has more than one use then it must be Op, because the
// AssumeSingleUse flag is not propogated to recursive calls of		// AssumeSingleUse flag is not propogated to recursive calls of
// SimplifyDemanded bits, so the only node with multiple use that		// SimplifyDemanded bits, so the only node with multiple use that
// it will attempt to combine will be opt.		// it will attempt to combine will be opt.
assert(Old == Op);		assert(TLO.Old == Op);

SmallVector <SDValue, 4> NewOps;		SmallVector <SDValue, 4> NewOps;
for (unsigned i = 0, e = User->getNumOperands(); i != e; ++i) {		for (unsigned i = 0, e = User->getNumOperands(); i != e; ++i) {
if (i == OpIdx) {		if (i == OpIdx) {
NewOps.push_back(New);		NewOps.push_back(TLO.New);
continue;		continue;
}		}
NewOps.push_back(User->getOperand(i));		NewOps.push_back(User->getOperand(i));
}		}
DAG.UpdateNodeOperands(User, NewOps);		TLO.DAG.UpdateNodeOperands(User, NewOps);
// Op has less users now, so we may be able to perform additional combines		// Op has less users now, so we may be able to perform additional combines
// with it.		// with it.
DCI.AddToWorklist(Op.getNode());		DCI.AddToWorklist(Op.getNode());
// User's operands have been updated, so we may be able to do new combines		// User's operands have been updated, so we may be able to do new combines
// with it.		// with it.
DCI.AddToWorklist(User);		DCI.AddToWorklist(User);
return true;		return true;
}		}
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	if (ConstantSDNode *RHSC = isConstOrConstSplat(Op.getOperand(1))) {
// Do not increment Depth here; that can cause an infinite loop.		// Do not increment Depth here; that can cause an infinite loop.
TLO.DAG.computeKnownBits(Op0, LHSZero, LHSOne, Depth);		TLO.DAG.computeKnownBits(Op0, LHSZero, LHSOne, Depth);
// If the LHS already has zeros where RHSC does, this and is dead.		// If the LHS already has zeros where RHSC does, this and is dead.
if ((LHSZero & NewMask) == (~RHSC->getAPIntValue() & NewMask))		if ((LHSZero & NewMask) == (~RHSC->getAPIntValue() & NewMask))
return TLO.CombineTo(Op, Op0);		return TLO.CombineTo(Op, Op0);

// If any of the set bits in the RHS are known zero on the LHS, shrink		// If any of the set bits in the RHS are known zero on the LHS, shrink
// the constant.		// the constant.
if (TLO.ShrinkDemandedConstant(Op, ~LHSZero & NewMask))		if (ShrinkDemandedConstant(Op, ~LHSZero & NewMask, TLO))
return true;		return true;

// Bitwise-not (xor X, -1) is a special case: we don't usually shrink its		// Bitwise-not (xor X, -1) is a special case: we don't usually shrink its
// constant, but if this 'and' is only clearing bits that were just set by		// constant, but if this 'and' is only clearing bits that were just set by
// the xor, then this 'and' can be eliminated by shrinking the mask of		// the xor, then this 'and' can be eliminated by shrinking the mask of
// the xor. For example, for a 32-bit X:		// the xor. For example, for a 32-bit X:
// and (xor (srl X, 31), -1), 1 --> xor (srl X, 31), 1		// and (xor (srl X, 31), -1), 1 --> xor (srl X, 31), 1
if (isBitwiseNot(Op0) && Op0.hasOneUse() &&		if (isBitwiseNot(Op0) && Op0.hasOneUse() &&
Show All 18 Lines	case ISD::AND:
if ((NewMask & ~KnownZero2 & KnownOne) == (~KnownZero2 & NewMask))		if ((NewMask & ~KnownZero2 & KnownOne) == (~KnownZero2 & NewMask))
return TLO.CombineTo(Op, Op.getOperand(0));		return TLO.CombineTo(Op, Op.getOperand(0));
if ((NewMask & ~KnownZero & KnownOne2) == (~KnownZero & NewMask))		if ((NewMask & ~KnownZero & KnownOne2) == (~KnownZero & NewMask))
return TLO.CombineTo(Op, Op.getOperand(1));		return TLO.CombineTo(Op, Op.getOperand(1));
// If all of the demanded bits in the inputs are known zeros, return zero.		// If all of the demanded bits in the inputs are known zeros, return zero.
if ((NewMask & (KnownZero\|KnownZero2)) == NewMask)		if ((NewMask & (KnownZero\|KnownZero2)) == NewMask)
return TLO.CombineTo(Op, TLO.DAG.getConstant(0, dl, Op.getValueType()));		return TLO.CombineTo(Op, TLO.DAG.getConstant(0, dl, Op.getValueType()));
// If the RHS is a constant, see if we can simplify it.		// If the RHS is a constant, see if we can simplify it.
if (TLO.ShrinkDemandedConstant(Op, ~KnownZero2 & NewMask))		if (ShrinkDemandedConstant(Op, ~KnownZero2 & NewMask, TLO))
return true;		return true;
// If the operation can be done in a smaller type, do so.		// If the operation can be done in a smaller type, do so.
if (TLO.ShrinkDemandedOp(Op, BitWidth, NewMask, dl))		if (ShrinkDemandedOp(Op, BitWidth, NewMask, TLO))
return true;		return true;

// Output known-1 bits are only known if set in both the LHS & RHS.		// Output known-1 bits are only known if set in both the LHS & RHS.
KnownOne &= KnownOne2;		KnownOne &= KnownOne2;
// Output known-0 are known to be clear if zero in either the LHS \| RHS.		// Output known-0 are known to be clear if zero in either the LHS \| RHS.
KnownZero \|= KnownZero2;		KnownZero \|= KnownZero2;
break;		break;
case ISD::OR:		case ISD::OR:
Show All 14 Lines	if ((NewMask & ~KnownOne & KnownZero2) == (~KnownOne & NewMask))
return TLO.CombineTo(Op, Op.getOperand(1));		return TLO.CombineTo(Op, Op.getOperand(1));
// If all of the potentially set bits on one side are known to be set on		// If all of the potentially set bits on one side are known to be set on
// the other side, just use the 'other' side.		// the other side, just use the 'other' side.
if ((NewMask & ~KnownZero & KnownOne2) == (~KnownZero & NewMask))		if ((NewMask & ~KnownZero & KnownOne2) == (~KnownZero & NewMask))
return TLO.CombineTo(Op, Op.getOperand(0));		return TLO.CombineTo(Op, Op.getOperand(0));
if ((NewMask & ~KnownZero2 & KnownOne) == (~KnownZero2 & NewMask))		if ((NewMask & ~KnownZero2 & KnownOne) == (~KnownZero2 & NewMask))
return TLO.CombineTo(Op, Op.getOperand(1));		return TLO.CombineTo(Op, Op.getOperand(1));
// If the RHS is a constant, see if we can simplify it.		// If the RHS is a constant, see if we can simplify it.
if (TLO.ShrinkDemandedConstant(Op, NewMask))		if (ShrinkDemandedConstant(Op, NewMask, TLO))
return true;		return true;
// If the operation can be done in a smaller type, do so.		// If the operation can be done in a smaller type, do so.
if (TLO.ShrinkDemandedOp(Op, BitWidth, NewMask, dl))		if (ShrinkDemandedOp(Op, BitWidth, NewMask, TLO))
return true;		return true;

// Output known-0 bits are only known if clear in both the LHS & RHS.		// Output known-0 bits are only known if clear in both the LHS & RHS.
KnownZero &= KnownZero2;		KnownZero &= KnownZero2;
// Output known-1 are known to be set if set in either the LHS \| RHS.		// Output known-1 are known to be set if set in either the LHS \| RHS.
KnownOne \|= KnownOne2;		KnownOne \|= KnownOne2;
break;		break;
case ISD::XOR:		case ISD::XOR:
if (SimplifyDemandedBits(Op.getOperand(1), NewMask, KnownZero,		if (SimplifyDemandedBits(Op.getOperand(1), NewMask, KnownZero,
KnownOne, TLO, Depth+1))		KnownOne, TLO, Depth+1))
return true;		return true;
assert((KnownZero & KnownOne) == 0 && "Bits known to be one AND zero?");		assert((KnownZero & KnownOne) == 0 && "Bits known to be one AND zero?");
if (SimplifyDemandedBits(Op.getOperand(0), NewMask, KnownZero2,		if (SimplifyDemandedBits(Op.getOperand(0), NewMask, KnownZero2,
KnownOne2, TLO, Depth+1))		KnownOne2, TLO, Depth+1))
return true;		return true;
assert((KnownZero2 & KnownOne2) == 0 && "Bits known to be one AND zero?");		assert((KnownZero2 & KnownOne2) == 0 && "Bits known to be one AND zero?");

// If all of the demanded bits are known zero on one side, return the other.		// If all of the demanded bits are known zero on one side, return the other.
// These bits cannot contribute to the result of the 'xor'.		// These bits cannot contribute to the result of the 'xor'.
if ((KnownZero & NewMask) == NewMask)		if ((KnownZero & NewMask) == NewMask)
return TLO.CombineTo(Op, Op.getOperand(0));		return TLO.CombineTo(Op, Op.getOperand(0));
if ((KnownZero2 & NewMask) == NewMask)		if ((KnownZero2 & NewMask) == NewMask)
return TLO.CombineTo(Op, Op.getOperand(1));		return TLO.CombineTo(Op, Op.getOperand(1));
// If the operation can be done in a smaller type, do so.		// If the operation can be done in a smaller type, do so.
if (TLO.ShrinkDemandedOp(Op, BitWidth, NewMask, dl))		if (ShrinkDemandedOp(Op, BitWidth, NewMask, TLO))
return true;		return true;

// If all of the unknown bits are known to be zero on one side or the other		// If all of the unknown bits are known to be zero on one side or the other
// (but not both) turn this into an inclusive or.		// (but not both) turn this into an inclusive or.
// e.g. (A & C1)^(B & C2) -> (A & C1)\|(B & C2) iff C1&C2 == 0		// e.g. (A & C1)^(B & C2) -> (A & C1)\|(B & C2) iff C1&C2 == 0
if ((NewMask & ~KnownZero & ~KnownZero2) == 0)		if ((NewMask & ~KnownZero & ~KnownZero2) == 0)
return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::OR, dl, Op.getValueType(),		return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::OR, dl, Op.getValueType(),
Op.getOperand(0),		Op.getOperand(0),
Show All 28 Lines	if (ConstantSDNode *C = isConstOrConstSplat(Op.getOperand(1))) {
if (Expanded != C->getAPIntValue()) {		if (Expanded != C->getAPIntValue()) {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDValue New = TLO.DAG.getNode(Op.getOpcode(), dl,VT, Op.getOperand(0),		SDValue New = TLO.DAG.getNode(Op.getOpcode(), dl,VT, Op.getOperand(0),
TLO.DAG.getConstant(Expanded, dl, VT));		TLO.DAG.getConstant(Expanded, dl, VT));
return TLO.CombineTo(Op, New);		return TLO.CombineTo(Op, New);
}		}
// If it already has all the bits set, nothing to change		// If it already has all the bits set, nothing to change
// but don't shrink either!		// but don't shrink either!
} else if (TLO.ShrinkDemandedConstant(Op, NewMask)) {		} else if (ShrinkDemandedConstant(Op, NewMask, TLO)) {
return true;		return true;
}		}
}		}

KnownZero = std::move(KnownZeroOut);		KnownZero = std::move(KnownZeroOut);
KnownOne = std::move(KnownOneOut);		KnownOne = std::move(KnownOneOut);
break;		break;
case ISD::SELECT:		case ISD::SELECT:
if (SimplifyDemandedBits(Op.getOperand(2), NewMask, KnownZero,		if (SimplifyDemandedBits(Op.getOperand(2), NewMask, KnownZero,
KnownOne, TLO, Depth+1))		KnownOne, TLO, Depth+1))
return true;		return true;
if (SimplifyDemandedBits(Op.getOperand(1), NewMask, KnownZero2,		if (SimplifyDemandedBits(Op.getOperand(1), NewMask, KnownZero2,
KnownOne2, TLO, Depth+1))		KnownOne2, TLO, Depth+1))
return true;		return true;
assert((KnownZero & KnownOne) == 0 && "Bits known to be one AND zero?");		assert((KnownZero & KnownOne) == 0 && "Bits known to be one AND zero?");
assert((KnownZero2 & KnownOne2) == 0 && "Bits known to be one AND zero?");		assert((KnownZero2 & KnownOne2) == 0 && "Bits known to be one AND zero?");

// If the operands are constants, see if we can simplify them.		// If the operands are constants, see if we can simplify them.
if (TLO.ShrinkDemandedConstant(Op, NewMask))		if (ShrinkDemandedConstant(Op, NewMask, TLO))
return true;		return true;

// Only known if known in both the LHS and RHS.		// Only known if known in both the LHS and RHS.
KnownOne &= KnownOne2;		KnownOne &= KnownOne2;
KnownZero &= KnownZero2;		KnownZero &= KnownZero2;
break;		break;
case ISD::SELECT_CC:		case ISD::SELECT_CC:
if (SimplifyDemandedBits(Op.getOperand(3), NewMask, KnownZero,		if (SimplifyDemandedBits(Op.getOperand(3), NewMask, KnownZero,
KnownOne, TLO, Depth+1))		KnownOne, TLO, Depth+1))
return true;		return true;
if (SimplifyDemandedBits(Op.getOperand(2), NewMask, KnownZero2,		if (SimplifyDemandedBits(Op.getOperand(2), NewMask, KnownZero2,
KnownOne2, TLO, Depth+1))		KnownOne2, TLO, Depth+1))
return true;		return true;
assert((KnownZero & KnownOne) == 0 && "Bits known to be one AND zero?");		assert((KnownZero & KnownOne) == 0 && "Bits known to be one AND zero?");
assert((KnownZero2 & KnownOne2) == 0 && "Bits known to be one AND zero?");		assert((KnownZero2 & KnownOne2) == 0 && "Bits known to be one AND zero?");

// If the operands are constants, see if we can simplify them.		// If the operands are constants, see if we can simplify them.
if (TLO.ShrinkDemandedConstant(Op, NewMask))		if (ShrinkDemandedConstant(Op, NewMask, TLO))
return true;		return true;

// Only known if known in both the LHS and RHS.		// Only known if known in both the LHS and RHS.
KnownOne &= KnownOne2;		KnownOne &= KnownOne2;
KnownZero &= KnownZero2;		KnownZero &= KnownZero2;
break;		break;
case ISD::SETCC: {		case ISD::SETCC: {
SDValue Op0 = Op.getOperand(0);		SDValue Op0 = Op.getOperand(0);
▲ Show 20 Lines • Show All 503 Lines • ▼ Show 20 Lines	case ISD::SUB: {
// of the highest bit demanded of them.		// of the highest bit demanded of them.
APInt LoMask = APInt::getLowBitsSet(BitWidth,		APInt LoMask = APInt::getLowBitsSet(BitWidth,
BitWidth - NewMask.countLeadingZeros());		BitWidth - NewMask.countLeadingZeros());
if (SimplifyDemandedBits(Op.getOperand(0), LoMask, KnownZero2,		if (SimplifyDemandedBits(Op.getOperand(0), LoMask, KnownZero2,
KnownOne2, TLO, Depth+1) \|\|		KnownOne2, TLO, Depth+1) \|\|
SimplifyDemandedBits(Op.getOperand(1), LoMask, KnownZero2,		SimplifyDemandedBits(Op.getOperand(1), LoMask, KnownZero2,
KnownOne2, TLO, Depth+1) \|\|		KnownOne2, TLO, Depth+1) \|\|
// See if the operation should be performed at a smaller bit width.		// See if the operation should be performed at a smaller bit width.
TLO.ShrinkDemandedOp(Op, BitWidth, NewMask, dl)) {		ShrinkDemandedOp(Op, BitWidth, NewMask, TLO)) {
const SDNodeFlags *Flags = Op.getNode()->getFlags();		const SDNodeFlags *Flags = Op.getNode()->getFlags();
if (Flags->hasNoSignedWrap() \|\| Flags->hasNoUnsignedWrap()) {		if (Flags->hasNoSignedWrap() \|\| Flags->hasNoUnsignedWrap()) {
// Disable the nsw and nuw flags. We can no longer guarantee that we		// Disable the nsw and nuw flags. We can no longer guarantee that we
// won't wrap after simplification.		// won't wrap after simplification.
SDNodeFlags NewFlags = *Flags;		SDNodeFlags NewFlags = *Flags;
NewFlags.setNoSignedWrap(false);		NewFlags.setNoSignedWrap(false);
NewFlags.setNoUnsignedWrap(false);		NewFlags.setNoUnsignedWrap(false);
SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), dl, Op.getValueType(),		SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), dl, Op.getValueType(),
▲ Show 20 Lines • Show All 2,612 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	public:

/// Determine which of the bits specified in Mask are known to be either zero		/// Determine which of the bits specified in Mask are known to be either zero
/// or one and return them in the KnownZero/KnownOne bitsets.		/// or one and return them in the KnownZero/KnownOne bitsets.
void computeKnownBitsForTargetNode(const SDValue Op, APInt &KnownZero,		void computeKnownBitsForTargetNode(const SDValue Op, APInt &KnownZero,
APInt &KnownOne, const APInt &DemandedElts,		APInt &KnownOne, const APInt &DemandedElts,
const SelectionDAG &DAG,		const SelectionDAG &DAG,
unsigned Depth = 0) const override;		unsigned Depth = 0) const override;

		bool targetShrinkDemandedConstant(SDValue Op, const APInt &Demanded,
		TargetLoweringOpt &TLO) const override;

MVT getScalarShiftAmountTy(const DataLayout &DL, EVT) const override;		MVT getScalarShiftAmountTy(const DataLayout &DL, EVT) const override;

/// Returns true if the target allows unaligned memory accesses of the		/// Returns true if the target allows unaligned memory accesses of the
/// specified type.		/// specified type.
bool allowsMisalignedMemoryAccesses(EVT VT, unsigned AddrSpace = 0,		bool allowsMisalignedMemoryAccesses(EVT VT, unsigned AddrSpace = 0,
unsigned Align = 1,		unsigned Align = 1,
bool *Fast = nullptr) const override;		bool *Fast = nullptr) const override;

▲ Show 20 Lines • Show All 359 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	#include <vector>			#include <vector>

	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "aarch64-lower"			#define DEBUG_TYPE "aarch64-lower"

	STATISTIC(NumTailCalls, "Number of tail calls");			STATISTIC(NumTailCalls, "Number of tail calls");
	STATISTIC(NumShiftInserts, "Number of vector shift inserts");			STATISTIC(NumShiftInserts, "Number of vector shift inserts");
				STATISTIC(NumOptimizedImms, "Number of times immediates were optimized");

	static cl::opt<bool>			static cl::opt<bool>
	EnableAArch64SlrGeneration("aarch64-shift-insert-generation", cl::Hidden,			EnableAArch64SlrGeneration("aarch64-shift-insert-generation", cl::Hidden,
	cl::desc("Allow AArch64 SLI/SRI formation"),			cl::desc("Allow AArch64 SLI/SRI formation"),
	cl::init(false));			cl::init(false));

	// FIXME: The necessary dtprel relocations don't seem to be supported			// FIXME: The necessary dtprel relocations don't seem to be supported
	// well in the GNU bfd and gold linkers at the moment. Therefore, by			// well in the GNU bfd and gold linkers at the moment. Therefore, by
	// default, for now, fall back to GeneralDynamic code generation.			// default, for now, fall back to GeneralDynamic code generation.
	cl::opt<bool> EnableAArch64ELFLocalDynamicTLSGeneration(			cl::opt<bool> EnableAArch64ELFLocalDynamicTLSGeneration(
	"aarch64-elf-ldtls-generation", cl::Hidden,			"aarch64-elf-ldtls-generation", cl::Hidden,
	cl::desc("Allow AArch64 Local Dynamic TLS code generation"),			cl::desc("Allow AArch64 Local Dynamic TLS code generation"),
	cl::init(false));			cl::init(false));

				static cl::opt<bool>
				EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden,
				cl::desc("Enable AArch64 logical imm instruction "
				"optimization"),
				cl::init(true));

	/// Value type used for condition codes.			/// Value type used for condition codes.
	static const MVT MVT_CC = MVT::i32;			static const MVT MVT_CC = MVT::i32;

	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,			AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
	const AArch64Subtarget &STI)			const AArch64Subtarget &STI)
	: TargetLowering(TM), Subtarget(&STI) {			: TargetLowering(TM), Subtarget(&STI) {
	// AArch64 doesn't have comparisons which set GPRs or setcc instructions, so			// AArch64 doesn't have comparisons which set GPRs or setcc instructions, so
	// we have to make something up. Arbitrarily, choose ZeroOrOne.			// we have to make something up. Arbitrarily, choose ZeroOrOne.
	▲ Show 20 Lines • Show All 666 Lines • ▼ Show 20 Lines

	EVT AArch64TargetLowering::getSetCCResultType(const DataLayout &, LLVMContext &,			EVT AArch64TargetLowering::getSetCCResultType(const DataLayout &, LLVMContext &,
	EVT VT) const {			EVT VT) const {
	if (!VT.isVector())			if (!VT.isVector())
	return MVT::i32;			return MVT::i32;
	return VT.changeVectorElementTypeToInteger();			return VT.changeVectorElementTypeToInteger();
	}			}

				static bool optimizeLogicalImm(SDValue Op, unsigned Size, uint64_t Imm,
				const APInt &Demanded,
				TargetLowering::TargetLoweringOpt &TLO,
				unsigned NewOpc) {
				uint64_t OldImm = Imm, NewImm, Enc;
				uint64_t Mask = ((uint64_t)(-1LL) >> (64 - Size));

				// Return if the immediate is already a bimm32 or bimm64.
				if (AArch64_AM::isLogicalImmediate(Imm & Mask, Size))
				return false;

				unsigned EltSize = Size;
				uint64_t DemandedBits = Demanded.getZExtValue();

				// Clear bits that are not demanded.
				Imm &= DemandedBits;

				while (true) {
				// The goal here is to set the non-demanded bits in a way that minimizes
				// the number of switching between 0 and 1. In order to achieve this goal,
				// we set the non-demanded bits to the value of the preceding demanded bits.
				// For example, if we have an immediate 0bx10xx0x1 ('x' indicates a
				// non-demanded bit), we copy bit0 (1) to the least significant 'x',
				// bit2 (0) to 'xx', and bit6 (1) to the most significant 'x'.
				// The final result is 0b11000011.
				uint64_t NonDemandedBits = ~DemandedBits;
				uint64_t InvertedImm = ~Imm & DemandedBits;
				uint64_t RotatedImm =
				((InvertedImm << 1) \| (InvertedImm >> (EltSize - 1) & 1)) &
				NonDemandedBits;
				uint64_t Sum = RotatedImm + NonDemandedBits;
				bool Carry = NonDemandedBits & ~Sum & (1 << (EltSize - 1));
				uint64_t Ones = (Sum + Carry) & NonDemandedBits;
				NewImm = (Imm \| Ones) & Mask;

				// If NewImm or its bitwise NOT is a shifted mask, it is a bitmask immediate
				// or all-ones or all-zeros, in which case we can stop searching. Otherwise,
				// we halve the element size and continue the search.
				if (isShiftedMask_64(NewImm) \|\| isShiftedMask_64(~(NewImm \| ~Mask)))
				break;

				// We cannot shrink the element size any further if it is 2-bits.
				if (EltSize == 2)
				return false;

				EltSize /= 2;
				Mask >>= EltSize;
				uint64_t Hi = Imm >> EltSize, DemandedBitsHi = DemandedBits >> EltSize;

				// Return if there is mismatch in any of the demanded bits of Imm and Hi.
				if (((Imm ^ Hi) & (DemandedBits & DemandedBitsHi) & Mask) != 0)
				return false;

				// Merge the upper and lower halves of Imm and DemandedBits.
				Imm \|= Hi;
				DemandedBits \|= DemandedBitsHi;
				}

				++NumOptimizedImms;

				// Replicate the element across the register width.
				while (EltSize < Size) {
				NewImm \|= NewImm << EltSize;
				EltSize *= 2;
				}

				(void)OldImm;
				assert(((OldImm ^ NewImm) & Demanded.getZExtValue()) == 0 &&
				"demanded bits should never be altered");

				// Create the new constant immediate node.
				EVT VT = Op.getValueType();
				unsigned Population = countPopulation(NewImm);
				SDLoc DL(Op);

				// If the new constant immediate is all-zeros or all-ones, let the target
				// independent DAG combine optimize this node.
				if (Population == 0 \|\| Population == Size)
				return TLO.CombineTo(Op.getOperand(1), TLO.DAG.getConstant(NewImm, DL, VT));

				// Otherwise, create a machine node so that target independent DAG combine
				// doesn't undo this optimization.
				Enc = AArch64_AM::encodeLogicalImmediate(NewImm, Size);
				SDValue EncConst = TLO.DAG.getTargetConstant(Enc, DL, VT);
				SDValue New(
				TLO.DAG.getMachineNode(NewOpc, DL, VT, Op.getOperand(0), EncConst), 0);

				return TLO.CombineTo(Op, New);
				}

				bool AArch64TargetLowering::targetShrinkDemandedConstant(
				SDValue Op, const APInt &Demanded, TargetLoweringOpt &TLO) const {
				// Delay this optimization to as late as possible.
				if (!TLO.LegalOps)
				return false;

				if (!EnableOptimizeLogicalImm)
				return false;

				EVT VT = Op.getValueType();
				if (VT.isVector())
				return false;

				unsigned Size = VT.getSizeInBits();
				assert((Size == 32 \|\| Size == 64) &&
				"i32 or i64 is expected after legalization.");

				// Exit early if we demand all bits.
				if (Demanded.countPopulation() == Size)
				return false;

				unsigned NewOpc;
				switch (Op.getOpcode()) {
				default:
				return false;
				case ISD::AND:
				NewOpc = Size == 32 ? AArch64::ANDWri : AArch64::ANDXri;
				break;
				case ISD::OR:
				NewOpc = Size == 32 ? AArch64::ORRWri : AArch64::ORRXri;
				break;
				case ISD::XOR:
				NewOpc = Size == 32 ? AArch64::EORWri : AArch64::EORXri;
				break;
				}
				ConstantSDNode *C = dyn_cast<ConstantSDNode>(Op.getOperand(1));
				if (!C)
				return false;
				uint64_t Imm = C->getZExtValue();
				return optimizeLogicalImm(Op, Size, Imm, Demanded, TLO, NewOpc);
				}

	/// computeKnownBitsForTargetNode - Determine which of the bits specified in			/// computeKnownBitsForTargetNode - Determine which of the bits specified in
	/// Mask are known to be either zero or one and return them in the			/// Mask are known to be either zero or one and return them in the
	/// KnownZero/KnownOne bitsets.			/// KnownZero/KnownOne bitsets.
	void AArch64TargetLowering::computeKnownBitsForTargetNode(			void AArch64TargetLowering::computeKnownBitsForTargetNode(
	const SDValue Op, APInt &KnownZero, APInt &KnownOne,			const SDValue Op, APInt &KnownZero, APInt &KnownOne,
	const APInt &DemandedElts, const SelectionDAG &DAG, unsigned Depth) const {			const APInt &DemandedElts, const SelectionDAG &DAG, unsigned Depth) const {
	switch (Op.getOpcode()) {			switch (Op.getOpcode()) {
	default:			default:
	▲ Show 20 Lines • Show All 10,031 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 2,309 Lines • ▼ Show 20 Lines	return VT.getSizeInBits() >= 24 && // Types less than 24-bit should be treated
(VT.getSizeInBits() - DAG.ComputeNumSignBits(Op)) < 24;		(VT.getSizeInBits() - DAG.ComputeNumSignBits(Op)) < 24;
}		}

static bool simplifyI24(SDNode *Node24, unsigned OpIdx,		static bool simplifyI24(SDNode *Node24, unsigned OpIdx,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI) {

SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDValue Op = Node24->getOperand(OpIdx);		SDValue Op = Node24->getOperand(OpIdx);
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

APInt Demanded = APInt::getLowBitsSet(VT.getSizeInBits(), 24);		APInt Demanded = APInt::getLowBitsSet(VT.getSizeInBits(), 24);
APInt KnownZero, KnownOne;		APInt KnownZero, KnownOne;
TargetLowering::TargetLoweringOpt TLO(DAG, true, true);		TargetLowering::TargetLoweringOpt TLO(DAG, true, true);
if (TLO.SimplifyDemandedBits(Node24, OpIdx, Demanded, DCI))		if (TLI.SimplifyDemandedBits(Node24, OpIdx, Demanded, DCI, TLO))
return true;		return true;

return false;		return false;
}		}

template <typename IntTy>		template <typename IntTy>
static SDValue constantFoldBFE(SelectionDAG &DAG, IntTy Src0, uint32_t Offset,		static SDValue constantFoldBFE(SelectionDAG &DAG, IntTy Src0, uint32_t Offset,
uint32_t Width, const SDLoc &DL) {		uint32_t Width, const SDLoc &DL) {
▲ Show 20 Lines • Show All 1,024 Lines • ▼ Show 20 Lines	if (BitsFrom.hasOneUse()) {
APInt Demanded = APInt::getBitsSet(32,		APInt Demanded = APInt::getBitsSet(32,
OffsetVal,		OffsetVal,
OffsetVal + WidthVal);		OffsetVal + WidthVal);

APInt KnownZero, KnownOne;		APInt KnownZero, KnownOne;
TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),		TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),
!DCI.isBeforeLegalizeOps());		!DCI.isBeforeLegalizeOps());
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (TLO.ShrinkDemandedConstant(BitsFrom, Demanded) \|\|		if (TLI.ShrinkDemandedConstant(BitsFrom, Demanded, TLO) \|\|
TLI.SimplifyDemandedBits(BitsFrom, Demanded,		TLI.SimplifyDemandedBits(BitsFrom, Demanded,
KnownZero, KnownOne, TLO)) {		KnownZero, KnownOne, TLO)) {
DCI.CommitTargetLoweringOpt(TLO);		DCI.CommitTargetLoweringOpt(TLO);
}		}
}		}

break;		break;
}		}
▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 30,201 Lines • ▼ Show 20 Lines	if (N->getOpcode() == ISD::VSELECT && DCI.isBeforeLegalizeOps() &&
if (VT == MVT::v32i8 && !Subtarget.hasAVX2())		if (VT == MVT::v32i8 && !Subtarget.hasAVX2())
return SDValue();		return SDValue();

assert(BitWidth >= 8 && BitWidth <= 64 && "Invalid mask size");		assert(BitWidth >= 8 && BitWidth <= 64 && "Invalid mask size");
APInt DemandedMask(APInt::getSignMask(BitWidth));		APInt DemandedMask(APInt::getSignMask(BitWidth));
APInt KnownZero, KnownOne;		APInt KnownZero, KnownOne;
TargetLowering::TargetLoweringOpt TLO(DAG, DCI.isBeforeLegalize(),		TargetLowering::TargetLoweringOpt TLO(DAG, DCI.isBeforeLegalize(),
DCI.isBeforeLegalizeOps());		DCI.isBeforeLegalizeOps());
if (TLO.ShrinkDemandedConstant(Cond, DemandedMask) \|\|		if (TLI.ShrinkDemandedConstant(Cond, DemandedMask, TLO) \|\|
TLI.SimplifyDemandedBits(Cond, DemandedMask, KnownZero, KnownOne,		TLI.SimplifyDemandedBits(Cond, DemandedMask, KnownZero, KnownOne,
TLO)) {		TLO)) {
// If we changed the computation somewhere in the DAG, this change will		// If we changed the computation somewhere in the DAG, this change will
// affect all users of Cond. Make sure it is fine and update all the nodes		// affect all users of Cond. Make sure it is fine and update all the nodes
// so that we do not use the generic VSELECT anymore. Otherwise, we may		// so that we do not use the generic VSELECT anymore. Otherwise, we may
// perform wrong optimizations as we messed with the actual expectation		// perform wrong optimizations as we messed with the actual expectation
// for the vector boolean values.		// for the vector boolean values.
if (Cond != TLO.Old) {		if (Cond != TLO.Old) {
▲ Show 20 Lines • Show All 3,553 Lines • ▼ Show 20 Lines	static SDValue combineBT(SDNode *N, SelectionDAG &DAG,
SDValue Op1 = N->getOperand(1);		SDValue Op1 = N->getOperand(1);
if (Op1.hasOneUse()) {		if (Op1.hasOneUse()) {
unsigned BitWidth = Op1.getValueSizeInBits();		unsigned BitWidth = Op1.getValueSizeInBits();
APInt DemandedMask = APInt::getLowBitsSet(BitWidth, Log2_32(BitWidth));		APInt DemandedMask = APInt::getLowBitsSet(BitWidth, Log2_32(BitWidth));
APInt KnownZero, KnownOne;		APInt KnownZero, KnownOne;
TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),		TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),
!DCI.isBeforeLegalizeOps());		!DCI.isBeforeLegalizeOps());
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (TLO.ShrinkDemandedConstant(Op1, DemandedMask) \|\|		if (TLI.ShrinkDemandedConstant(Op1, DemandedMask, TLO) \|\|
TLI.SimplifyDemandedBits(Op1, DemandedMask, KnownZero, KnownOne, TLO))		TLI.SimplifyDemandedBits(Op1, DemandedMask, KnownZero, KnownOne, TLO))
DCI.CommitTargetLoweringOpt(TLO);		DCI.CommitTargetLoweringOpt(TLO);
}		}
return SDValue();		return SDValue();
}		}

static SDValue combineSignExtendInReg(SDNode *N, SelectionDAG &DAG,		static SDValue combineSignExtendInReg(SDNode *N, SelectionDAG &DAG,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
▲ Show 20 Lines • Show All 2,298 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/XCore/XCoreISelLowering.cpp

Show First 20 Lines • Show All 1,599 Lines • ▼ Show 20 Lines	case Intrinsic::xcore_chkct: {
// These instructions ignore the high bits.		// These instructions ignore the high bits.
if (OutVal.hasOneUse()) {		if (OutVal.hasOneUse()) {
unsigned BitWidth = OutVal.getValueSizeInBits();		unsigned BitWidth = OutVal.getValueSizeInBits();
APInt DemandedMask = APInt::getLowBitsSet(BitWidth, 8);		APInt DemandedMask = APInt::getLowBitsSet(BitWidth, 8);
APInt KnownZero, KnownOne;		APInt KnownZero, KnownOne;
TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),		TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),
!DCI.isBeforeLegalizeOps());		!DCI.isBeforeLegalizeOps());
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (TLO.ShrinkDemandedConstant(OutVal, DemandedMask) \|\|		if (TLI.ShrinkDemandedConstant(OutVal, DemandedMask, TLO) \|\|
TLI.SimplifyDemandedBits(OutVal, DemandedMask, KnownZero, KnownOne,		TLI.SimplifyDemandedBits(OutVal, DemandedMask, KnownZero, KnownOne,
TLO))		TLO))
DCI.CommitTargetLoweringOpt(TLO);		DCI.CommitTargetLoweringOpt(TLO);
}		}
break;		break;
}		}
case Intrinsic::xcore_setpt: {		case Intrinsic::xcore_setpt: {
SDValue Time = N->getOperand(3);		SDValue Time = N->getOperand(3);
// This instruction ignores the high bits.		// This instruction ignores the high bits.
if (Time.hasOneUse()) {		if (Time.hasOneUse()) {
unsigned BitWidth = Time.getValueSizeInBits();		unsigned BitWidth = Time.getValueSizeInBits();
APInt DemandedMask = APInt::getLowBitsSet(BitWidth, 16);		APInt DemandedMask = APInt::getLowBitsSet(BitWidth, 16);
APInt KnownZero, KnownOne;		APInt KnownZero, KnownOne;
TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),		TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),
!DCI.isBeforeLegalizeOps());		!DCI.isBeforeLegalizeOps());
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (TLO.ShrinkDemandedConstant(Time, DemandedMask) \|\|		if (TLI.ShrinkDemandedConstant(Time, DemandedMask, TLO) \|\|
TLI.SimplifyDemandedBits(Time, DemandedMask, KnownZero, KnownOne,		TLI.SimplifyDemandedBits(Time, DemandedMask, KnownZero, KnownOne,
TLO))		TLO))
DCI.CommitTargetLoweringOpt(TLO);		DCI.CommitTargetLoweringOpt(TLO);
}		}
break;		break;
}		}
}		}
break;		break;
▲ Show 20 Lines • Show All 316 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/optimize-imm.ll

				; RUN: llc -o - %s -mtriple=aarch64-- \| FileCheck %s

				; CHECK-LABEL: and1:
				; CHECK: and {{w[0-9]+}}, w0, #0xfffffffd

				define void @and1(i32 %a, i8* nocapture %p) {
				entry:
				%and = and i32 %a, 253
				%conv = trunc i32 %and to i8
				store i8 %conv, i8* %p, align 1
				ret void
				}

				; (a & 0x3dfd) \| 0xffffc000
				;
				; CHECK-LABEL: and2:
				; CHECK: and {{w[0-9]+}}, w0, #0xfdfdfdfd

				define i32 @and2(i32 %a) {
				entry:
				%and = and i32 %a, 15869
				%or = or i32 %and, -16384
				ret i32 %or
				}

				; (a & 0x19) \| 0xffffffc0
				;
				; CHECK-LABEL: and3:
				; CHECK: and {{w[0-9]+}}, w0, #0x99999999

				define i32 @and3(i32 %a) {
				entry:
				%and = and i32 %a, 25
				%or = or i32 %and, -64
				ret i32 %or
				}

				; (a & 0xc5600) \| 0xfff1f1ff
				;
				; CHECK-LABEL: and4:
				; CHECK: and {{w[0-9]+}}, w0, #0xfffc07ff

				define i32 @and4(i32 %a) {
				entry:
				%and = and i32 %a, 787968
				%or = or i32 %and, -921089
				ret i32 %or
				}

				; Make sure we don't shrink or optimize an XOR's immediate operand if the
				; immediate is -1. Instruction selection turns (and ((xor $mask, -1), $v0)) into
				; a BIC.

				; CHECK-LABEL: xor1:
				; CHECK: orr [[R0:w[0-9]+]], wzr, #0x38
				; CHECK: bic {{w[0-9]+}}, [[R0]], w0, lsl #3

				define i32 @xor1(i32 %a) {
				entry:
				%shl = shl i32 %a, 3
				%xor = and i32 %shl, 56
				%and = xor i32 %xor, 56
				ret i32 %and
				}