This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Enable division-by-constant optimization for wide types
AcceptedPublic

Authored by nhaehnle on Sep 22 2016, 5:07 AM.

Download Raw Diff

Details

Reviewers

spatel
• tstellarAMD
venkatra
bkramer
arsenm
ast
javed.absar

Summary

This relies on previous support for expanding MULH[US] / [US]MUL_LOHI. Instead
of doing division-by-constant only when those instructions are legal, targets
should now use isIntDivCheap to signal that they do not want this expansion.

This change allows 64-bit division-by-constant to use the more efficient
multiply and shift lowering on AMDGPU.

This also affects a lowering on SPARC in a way that may or may not be more
efficient, see the change in the corresponding test case for the effect. I'd
appreciate some feedback on that.

The vector case is not enabled yet even though it should be correct and will
likely allow better overall code generation eventually. Enabling it gives some
regressions in X86 tests, mostly due to what looks like insufficient
peep-holing when vNi64 multiplies are scalarized.

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 72153.Sep 22 2016, 5:07 AM

nhaehnle retitled this revision from to [SelectionDAG] Expand MULHU and enable division-by-constant for wide types.

nhaehnle updated this object.

nhaehnle added reviewers: spatel, bkramer, venkatra, arsenm, • tstellarAMD.

nhaehnle added a subscriber: llvm-commits.

Herald added subscribers: nhaehnle, wdng, jyknight. · View Herald TranscriptSep 22 2016, 5:07 AM

The reason for excluding the vector case is that it hits another problem with MULHU in some X86 vector div tests, and affects additional X86 test cases. I'd prefer to keep things simple for this change.

efriedma added a subscriber: efriedma.Sep 22 2016, 10:47 AM

efriedma added inline comments.

test/CodeGen/SPARC/rem.ll
62	This is generating 8 multiply instructions; something is going wrong in your algorithm. (It should only take four multiply instructions to perform a double-width multiply.)

I looked a bit more closely, and the issue with the Sparc test is that the code uses isOperationLegalOrCustom(ISD::UMUL_LOHI, MVT::i32) to check - this is taken unchanged from expandMUL - but UMUL_LOHI is Expand. When I force my code in expandUMUL_LOHI to use the smaller size UMUL_LOHI, I get only 4 multiplies.

I don't see a clear picture of how multiplication legalization is supposed to work. Currently, MULHU can become UMUL_LOHI in LegalizeDAG::ExpandNode, and UMUL_LOHI can convert to a MUL in a wider type in the DAGCombiner, or to a MUL + MULHU, but only if one of the resulting ops can be combined further. Each of these steps only happen when the resulting operations are LegalOrCustom.

Furthermore, UMUL_LOHI is marked as Expand in the targets that I've looked at, but there isn't actually any code for it in LegalizeDAG::ExpandNode. It all seems a bit messy :(

In part I get the impression that the LegalizeAction just doesn't contain enough information. If a target sets UMUL_LOHI to Expand, should that be a sequence of multiplies of the half-sized integer type, or should it be a single multiply in a larger type? Perhaps Promote should be used to indicate the second option?

In D24822#550503, @nhaehnle wrote:

In part I get the impression that the LegalizeAction just doesn't contain enough information. If a target sets UMUL_LOHI to Expand, should that be a sequence of multiplies of the half-sized integer type, or should it be a single multiply in a larger type? Perhaps Promote should be used to indicate the second option?

It's even worse because at least AMDGPU would want to legalize UMUL_LOHI on MVT::i32 to MUL + MULHU. So there are at least three plausible ways of handling a non-legal UMUL_LOHI. Perhaps this should be made explicit with an enum that the TargetLowering can choose from.

Using Promote to indicate that a larger multiply should be used seems reasonable.

Not sure it really makes sense to say that there are three ways to perform the operation; UMUL_LOHI and MULHU are essentially the same operation, in the same way that DIV and DIVREM are the same operation.

Move the bulk of the multiplication changes to D24956.

Herald added a subscriber: arsenm. · View Herald TranscriptSep 27 2016, 3:38 AM

+ BPF test changes because the sdiv lowering fails later now.

ast added inline comments.Sep 27 2016, 8:04 AM

test/CodeGen/BPF/sdiv_error.ll
3 ↗	(On Diff #72621)	'cannot select' is unreadable to C programmer comparing to 'Unsupported signed div'. Usability of BPF is the hardest problem we're facing, so we don't want to lose those error messages. Hence similar to the SDIV error, can you please add the same error for SMUL_LOHI in BPFDAGToDAGISel::Select() ? Also keeping the same hint "Please convert to unsigned div/mod". Thanks

nhaehnle added inline comments.Sep 27 2016, 12:38 PM

test/CodeGen/BPF/sdiv_error.ll
3 ↗	(On Diff #72621)	Hmm, relying on backend error messages for usability is maybe not the best idea... Not sure how to change this. Perhaps adding an isSigned parameter to isIntDivCheap?

ast requested changes to this revision.Sep 27 2016, 1:40 PM

ast edited edge metadata.

ast added inline comments.

test/CodeGen/BPF/sdiv_error.ll
3 ↗	(On Diff #72621)	it's not the matter of optimization. There is no sdiv instruction. So please keep the backend error.

This revision now requires changes to proceed.Sep 27 2016, 1:40 PM

I still think it's pretty misguided to rely on errors from the backend,
given that this would be trivial to catch in the frontend where you have
more context for useful error messages anyway, but whatever.

I'm going to take the isIntDivCheap route because yes, it does have to do
with optimizations: you're relying on no optimizations being applied to
sdiv.

Herald added subscribers: dschuff, jfb. · View Herald TranscriptSep 28 2016, 1:13 AM

n.bozhenov added a subscriber: n.bozhenov.Sep 28 2016, 7:38 AM

lgtm
there are different front-ends and we cannot control them all, so backend has to have end-user understandable errors, though it's not pretty.

This revision is now accepted and ready to land.Sep 28 2016, 9:05 AM

nhaehnle mentioned this in D25289: AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes.Oct 5 2016, 11:38 AM

nhaehnle added a parent revision: D25289: AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes.Oct 5 2016, 11:39 AM

Diffusion mentioned this in rL284224: AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes.Oct 14 2016, 3:39 AM

Rebase on latest version of D24956.

Herald edited edge metadata. · View Herald TranscriptNov 24 2016, 1:34 AM

Diffusion mentioned this in rL289050: [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI.Dec 8 2016, 6:18 AM

What happened to this patch?

Herald added a reviewer: javed.absar. · View Herald TranscriptJun 26 2018, 5:12 PM

Herald added subscribers: fedor.sergeev, aheejin, jgravelle-google and 2 others. · View Herald Transcript

@nhaehnle If you could rebase this and enable vector idiv to show the x86 regressions I may be able to help

RKSimon mentioned this in D87976: Support the division-by-constant strength reduction for more integer types.Sep 27 2020, 10:48 AM

nagisa added a subscriber: nagisa.Jun 5 2021, 5:10 AM

Herald added subscribers: ecnelises, kerbowa, pengfei and 2 others. · View Herald TranscriptJun 5 2021, 5:10 AM

Revision Contents

Path

Size

include/

llvm/

Target/

TargetLowering.h

17 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

33 lines

TargetLowering.cpp

203 lines

test/

CodeGen/

AMDGPU/

udiv.ll

21 lines

SPARC/

rem.ll

42 lines

Diff 72153

include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 2,987 Lines • ▼ Show 20 Lines	virtual SDValue getRecipEstimate(SDValue Operand, DAGCombinerInfo &DCI,
unsigned &RefinementSteps) const {		unsigned &RefinementSteps) const {
return SDValue();		return SDValue();
}		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Legalization utility functions		// Legalization utility functions
//		//

		/// Expand a MUL or UMUL_LOHI of n-bit values into two or four nodes,
		/// respectively, each computing an n/2-bit part of the result.
		/// \param Result A vector that will be filled with the parts of the result
		/// in little-endian order.
		/// \param HalfVT The value type to use for the result nodes.
		/// \param LL Low bits of the LHS of the MUL. You can use this parameter
		/// if you want to control how low bits are extracted from the LHS.
		/// \param LH High bits of the LHS of the MUL. See LL for meaning.
		/// \param RL Low bits of the RHS of the MUL. See LL for meaning
		/// \param RH High bits of the RHS of the MUL. See LL for meaning.
		/// \returns true if the node has been expanded, false if it has not
		bool expandUMUL_LOHI(unsigned Opcode, EVT VT, SDLoc dl, SDValue LHS,
		SDValue RHS, SmallVectorImpl<SDValue> &Result,
		EVT HalfVT, SelectionDAG &DAG, SDValue LL = SDValue(),
		SDValue LH = SDValue(), SDValue RL = SDValue(),
		SDValue RH = SDValue()) const;

/// Expand a MUL into two nodes. One that computes the high bits of		/// Expand a MUL into two nodes. One that computes the high bits of
/// the result and one that computes the low bits.		/// the result and one that computes the low bits.
/// \param HiLoVT The value type to use for the Lo and Hi nodes.		/// \param HiLoVT The value type to use for the Lo and Hi nodes.
/// \param LL Low bits of the LHS of the MUL. You can use this parameter		/// \param LL Low bits of the LHS of the MUL. You can use this parameter
/// if you want to control how low bits are extracted from the LHS.		/// if you want to control how low bits are extracted from the LHS.
/// \param LH High bits of the LHS of the MUL. See LL for meaning.		/// \param LH High bits of the LHS of the MUL. See LL for meaning.
/// \param RL Low bits of the RHS of the MUL. See LL for meaning		/// \param RL Low bits of the RHS of the MUL. See LL for meaning
/// \param RH High bits of the RHS of the MUL. See LL for meaning.		/// \param RH High bits of the RHS of the MUL. See LL for meaning.
▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 3,300 Lines • ▼ Show 20 Lines	case ISD::SDIV: {
break;		break;
}		}
case ISD::MULHU:		case ISD::MULHU:
case ISD::MULHS: {		case ISD::MULHS: {
unsigned ExpandOpcode = Node->getOpcode() == ISD::MULHU ? ISD::UMUL_LOHI :		unsigned ExpandOpcode = Node->getOpcode() == ISD::MULHU ? ISD::UMUL_LOHI :
ISD::SMUL_LOHI;		ISD::SMUL_LOHI;
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
SDVTList VTs = DAG.getVTList(VT, VT);		SDVTList VTs = DAG.getVTList(VT, VT);
assert(TLI.isOperationLegalOrCustom(ExpandOpcode, VT) &&		bool HasExpandOpcode = TLI.isOperationLegalOrCustom(ExpandOpcode, VT);
		assert((HasExpandOpcode \|\| ExpandOpcode == ISD::UMUL_LOHI) &&
"If this wasn't legal, it shouldn't have been created!");		"If this wasn't legal, it shouldn't have been created!");

		if (HasExpandOpcode) {
Tmp1 = DAG.getNode(ExpandOpcode, dl, VTs, Node->getOperand(0),		Tmp1 = DAG.getNode(ExpandOpcode, dl, VTs, Node->getOperand(0),
Node->getOperand(1));		Node->getOperand(1));
Results.push_back(Tmp1.getValue(1));		Results.push_back(Tmp1.getValue(1));
break;		break;
}		}

		if (TLI.isOperationLegalOrCustom(ISD::ZERO_EXTEND, VT) &&
		TLI.isOperationLegalOrCustom(ISD::ANY_EXTEND, VT) &&
		TLI.isOperationLegalOrCustom(ISD::SHL, VT) &&
		TLI.isOperationLegalOrCustom(ISD::OR, VT)) {
		SmallVector<SDValue, 4> Halves;
		EVT HalfType = VT.getHalfSizedIntegerVT(*DAG.getContext());
		if (TLI.expandUMUL_LOHI(ISD::UMUL_LOHI, VT, Node, Node->getOperand(0),
		Node->getOperand(1), Halves, HalfType, DAG)) {
		SDValue Lo = DAG.getNode(ISD::ZERO_EXTEND, dl, VT, Halves[2]);
		SDValue Hi = DAG.getNode(ISD::ANY_EXTEND, dl, VT, Halves[3]);
		SDValue Shift = DAG.getConstant(
		HalfType.getSizeInBits(), dl,
		TLI.getShiftAmountTy(HalfType, DAG.getDataLayout()));
		Hi = DAG.getNode(ISD::SHL, dl, VT, Hi, Shift);
		Results.push_back(DAG.getNode(ISD::OR, dl, VT, Lo, Hi));
		break;
		}
		}

		break;
		}
case ISD::MUL: {		case ISD::MUL: {
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
SDVTList VTs = DAG.getVTList(VT, VT);		SDVTList VTs = DAG.getVTList(VT, VT);
// See if multiply or divide can be lowered using two-result operations.		// See if multiply or divide can be lowered using two-result operations.
// We just need the low half of the multiply; try both the signed		// We just need the low half of the multiply; try both the signed
// and unsigned forms. If the target supports both SMUL_LOHI and		// and unsigned forms. If the target supports both SMUL_LOHI and
// UMUL_LOHI, form a preference by checking which forms of plain		// UMUL_LOHI, form a preference by checking which forms of plain
// MULH it supports.		// MULH it supports.
▲ Show 20 Lines • Show All 1,169 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 2,959 Lines • ▼ Show 20 Lines	if (magics.a != 0 && !Divisor[0]) {
Created->push_back(Q.getNode());		Created->push_back(Q.getNode());

// Get magic number for the shifted divisor.		// Get magic number for the shifted divisor.
magics = Divisor.lshr(Shift).magicu(Shift);		magics = Divisor.lshr(Shift).magicu(Shift);
assert(magics.a == 0 && "Should use cheap fixup now");		assert(magics.a == 0 && "Should use cheap fixup now");
}		}

// Multiply the numerator (operand 0) by the magic value		// Multiply the numerator (operand 0) by the magic value
// FIXME: We should support doing a MUL in a wider type		// FIXME: Support expansion of MULHU for vector types
if (IsAfterLegalization ? isOperationLegal(ISD::MULHU, VT) :		if (IsAfterLegalization
isOperationLegalOrCustom(ISD::MULHU, VT))		? isOperationLegal(ISD::MULHU, VT)
		: (isOperationLegalOrCustom(ISD::MULHU, VT) \|\|
		(VT.isScalarInteger() &&
		isOperationLegalOrCustom(
		ISD::MULHU, VT.getHalfSizedIntegerVT(*DAG.getContext())))))
Q = DAG.getNode(ISD::MULHU, dl, VT, Q, DAG.getConstant(magics.m, dl, VT));		Q = DAG.getNode(ISD::MULHU, dl, VT, Q, DAG.getConstant(magics.m, dl, VT));
else if (IsAfterLegalization ? isOperationLegal(ISD::UMUL_LOHI, VT) :		else if (IsAfterLegalization ? isOperationLegal(ISD::UMUL_LOHI, VT) :
isOperationLegalOrCustom(ISD::UMUL_LOHI, VT))		isOperationLegalOrCustom(ISD::UMUL_LOHI, VT))
Q = SDValue(DAG.getNode(ISD::UMUL_LOHI, dl, DAG.getVTList(VT, VT), Q,		Q = SDValue(DAG.getNode(ISD::UMUL_LOHI, dl, DAG.getVTList(VT, VT), Q,
DAG.getConstant(magics.m, dl, VT)).getNode(), 1);		DAG.getConstant(magics.m, dl, VT)).getNode(), 1);
else		else
return SDValue(); // No mulhu or equvialent		return SDValue(); // No mulhu or equvialent

Show All 31 Lines	verifyReturnAddressArgumentIsConstant(SDValue Op, SelectionDAG &DAG) const {

return false;		return false;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Legalization Utilities		// Legalization Utilities
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool TargetLowering::expandMUL(SDNode *N, SDValue &Lo, SDValue &Hi, EVT HiLoVT,		bool TargetLowering::expandUMUL_LOHI(unsigned Opcode, EVT VT, SDLoc dl,
SelectionDAG &DAG, SDValue LL, SDValue LH,		SDValue LHS, SDValue RHS,
SDValue RL, SDValue RH) const {		SmallVectorImpl<SDValue> &Result,
EVT VT = N->getValueType(0);		EVT HalfVT, SelectionDAG &DAG, SDValue LL,
SDLoc dl(N);		SDValue LH, SDValue RL, SDValue RH) const {
		assert(Opcode == ISD::MUL \|\| Opcode == ISD::UMUL_LOHI);
bool HasMULHS = isOperationLegalOrCustom(ISD::MULHS, HiLoVT);
bool HasMULHU = isOperationLegalOrCustom(ISD::MULHU, HiLoVT);		bool HasMULHS = isOperationLegalOrCustom(ISD::MULHS, HalfVT);
bool HasSMUL_LOHI = isOperationLegalOrCustom(ISD::SMUL_LOHI, HiLoVT);		bool HasMULHU = isOperationLegalOrCustom(ISD::MULHU, HalfVT);
bool HasUMUL_LOHI = isOperationLegalOrCustom(ISD::UMUL_LOHI, HiLoVT);		bool HasSMUL_LOHI = isOperationLegalOrCustom(ISD::SMUL_LOHI, HalfVT);
		bool HasUMUL_LOHI = isOperationLegalOrCustom(ISD::UMUL_LOHI, HalfVT);
if (HasMULHU \|\| HasMULHS \|\| HasUMUL_LOHI \|\| HasSMUL_LOHI) {		if (HasMULHU \|\| HasMULHS \|\| HasUMUL_LOHI \|\| HasSMUL_LOHI) {
unsigned OuterBitSize = VT.getSizeInBits();		unsigned OuterBitSize = VT.getSizeInBits();
unsigned InnerBitSize = HiLoVT.getSizeInBits();		unsigned InnerBitSize = HalfVT.getSizeInBits();
unsigned LHSSB = DAG.ComputeNumSignBits(N->getOperand(0));		unsigned LHSSB = DAG.ComputeNumSignBits(LHS);
unsigned RHSSB = DAG.ComputeNumSignBits(N->getOperand(1));		unsigned RHSSB = DAG.ComputeNumSignBits(RHS);

// LL, LH, RL, and RH must be either all NULL or all set to a value.		// LL, LH, RL, and RH must be either all NULL or all set to a value.
assert((LL.getNode() && LH.getNode() && RL.getNode() && RH.getNode()) \|\|		assert((LL.getNode() && LH.getNode() && RL.getNode() && RH.getNode()) \|\|
(!LL.getNode() && !LH.getNode() && !RL.getNode() && !RH.getNode()));		(!LL.getNode() && !LH.getNode() && !RL.getNode() && !RH.getNode()));

if (!LL.getNode() && !RL.getNode() &&		if (!LL.getNode() && !RL.getNode() &&
isOperationLegalOrCustom(ISD::TRUNCATE, HiLoVT)) {		isOperationLegalOrCustom(ISD::TRUNCATE, HalfVT)) {
LL = DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, N->getOperand(0));		LL = DAG.getNode(ISD::TRUNCATE, dl, HalfVT, LHS);
RL = DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, N->getOperand(1));		RL = DAG.getNode(ISD::TRUNCATE, dl, HalfVT, RHS);
}		}

if (!LL.getNode())		if (!LL.getNode())
return false;		return false;

APInt HighMask = APInt::getHighBitsSet(OuterBitSize, InnerBitSize);		APInt HighMask = APInt::getHighBitsSet(OuterBitSize, InnerBitSize);
if (DAG.MaskedValueIsZero(N->getOperand(0), HighMask) &&		if (DAG.MaskedValueIsZero(LHS, HighMask) &&
DAG.MaskedValueIsZero(N->getOperand(1), HighMask)) {		DAG.MaskedValueIsZero(RHS, HighMask)) {
// The inputs are both zero-extended.		// The inputs are both zero-extended.
		bool Expanded = false;
if (HasUMUL_LOHI) {		if (HasUMUL_LOHI) {
// We can emit a umul_lohi.		// We can emit a umul_lohi.
Lo = DAG.getNode(ISD::UMUL_LOHI, dl, DAG.getVTList(HiLoVT, HiLoVT), LL,		SDValue Mul = DAG.getNode(ISD::UMUL_LOHI, dl,
RL);		DAG.getVTList(HalfVT, HalfVT), LL, RL);
Hi = SDValue(Lo.getNode(), 1);		Result.push_back(Mul);
return true;		Result.push_back(SDValue(Mul.getNode(), 1));
}		Expanded = true;
if (HasMULHU) {		} else if (HasMULHU) {
// We can emit a mulhu+mul.		// We can emit a mulhu+mul.
Lo = DAG.getNode(ISD::MUL, dl, HiLoVT, LL, RL);		Result.push_back(DAG.getNode(ISD::MUL, dl, HalfVT, LL, RL));
Hi = DAG.getNode(ISD::MULHU, dl, HiLoVT, LL, RL);		Result.push_back(DAG.getNode(ISD::MULHU, dl, HalfVT, LL, RL));
		Expanded = true;
		}
		if (Expanded) {
		if (Opcode != ISD::MUL) {
		SDValue Zero = DAG.getConstant(0, dl, HalfVT);
		Result.push_back(Zero);
		Result.push_back(Zero);
		}
return true;		return true;
}		}
}		}
if (LHSSB > InnerBitSize && RHSSB > InnerBitSize) {		if (LHSSB > InnerBitSize && RHSSB > InnerBitSize && Opcode == ISD::MUL) {
// The input values are both sign-extended.		// The input values are both sign-extended.
if (HasSMUL_LOHI) {		if (HasSMUL_LOHI) {
// We can emit a smul_lohi.		// We can emit a smul_lohi.
Lo = DAG.getNode(ISD::SMUL_LOHI, dl, DAG.getVTList(HiLoVT, HiLoVT), LL,		SDValue Mul = DAG.getNode(ISD::SMUL_LOHI, dl,
RL);		DAG.getVTList(HalfVT, HalfVT), LL, RL);
Hi = SDValue(Lo.getNode(), 1);		Result.push_back(Mul);
		Result.push_back(SDValue(Mul.getNode(), 1));
return true;		return true;
}		} else if (HasMULHS) {
if (HasMULHS) {
// We can emit a mulhs+mul.		// We can emit a mulhs+mul.
Lo = DAG.getNode(ISD::MUL, dl, HiLoVT, LL, RL);		Result.push_back(DAG.getNode(ISD::MUL, dl, HalfVT, LL, RL));
Hi = DAG.getNode(ISD::MULHS, dl, HiLoVT, LL, RL);		Result.push_back(DAG.getNode(ISD::MULHS, dl, HalfVT, LL, RL));
return true;		return true;
}		}
}		}

if (!LH.getNode() && !RH.getNode() &&		if (!LH.getNode() && !RH.getNode() &&
isOperationLegalOrCustom(ISD::SRL, VT) &&		isOperationLegalOrCustom(ISD::SRL, VT) &&
isOperationLegalOrCustom(ISD::TRUNCATE, HiLoVT)) {		isOperationLegalOrCustom(ISD::TRUNCATE, HalfVT)) {
auto &DL = DAG.getDataLayout();		auto &DL = DAG.getDataLayout();
unsigned ShiftAmt = VT.getSizeInBits() - HiLoVT.getSizeInBits();		unsigned ShiftAmt = VT.getSizeInBits() - HalfVT.getSizeInBits();
SDValue Shift = DAG.getConstant(ShiftAmt, dl, getShiftAmountTy(VT, DL));		SDValue Shift = DAG.getConstant(ShiftAmt, dl, getShiftAmountTy(VT, DL));
LH = DAG.getNode(ISD::SRL, dl, VT, N->getOperand(0), Shift);		LH = DAG.getNode(ISD::SRL, dl, VT, LHS, Shift);
LH = DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, LH);		LH = DAG.getNode(ISD::TRUNCATE, dl, HalfVT, LH);
RH = DAG.getNode(ISD::SRL, dl, VT, N->getOperand(1), Shift);		RH = DAG.getNode(ISD::SRL, dl, VT, RHS, Shift);
RH = DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, RH);		RH = DAG.getNode(ISD::TRUNCATE, dl, HalfVT, RH);
}		}

if (!LH.getNode())		if (!LH.getNode())
return false;		return false;

		if (HasUMUL_LOHI \|\| HasMULHU) {
		SDValue Next;
if (HasUMUL_LOHI) {		if (HasUMUL_LOHI) {
// Lo,Hi = umul LHS, RHS.
SDValue UMulLOHI = DAG.getNode(ISD::UMUL_LOHI, dl,		SDValue UMulLOHI = DAG.getNode(ISD::UMUL_LOHI, dl,
DAG.getVTList(HiLoVT, HiLoVT), LL, RL);		DAG.getVTList(HalfVT, HalfVT), LL, RL);
		Result.push_back(UMulLOHI);
		Next = UMulLOHI.getValue(1);
		} else {
		Result.push_back(DAG.getNode(ISD::MUL, dl, HalfVT, LL, RL));
		Next = DAG.getNode(ISD::MULHU, dl, HalfVT, LL, RL);
		}

		if (Opcode == ISD::MUL) {
		RH = DAG.getNode(ISD::MUL, dl, HalfVT, LL, RH);
		LH = DAG.getNode(ISD::MUL, dl, HalfVT, LH, RL);
		Next = DAG.getNode(ISD::ADD, dl, HalfVT, Next, RH);
		Next = DAG.getNode(ISD::ADD, dl, HalfVT, Next, LH);
		Result.push_back(Next);
		return true;
		}

		SDValue Lo, Hi;
		if (HasUMUL_LOHI) {
		SDValue UMulLOHI = DAG.getNode(ISD::UMUL_LOHI, dl,
		DAG.getVTList(HalfVT, HalfVT), LL, RH);
Lo = UMulLOHI;		Lo = UMulLOHI;
Hi = UMulLOHI.getValue(1);		Hi = UMulLOHI.getValue(1);
RH = DAG.getNode(ISD::MUL, dl, HiLoVT, LL, RH);		} else {
LH = DAG.getNode(ISD::MUL, dl, HiLoVT, LH, RL);		Lo = DAG.getNode(ISD::MUL, dl, HalfVT, LL, RH);
Hi = DAG.getNode(ISD::ADD, dl, HiLoVT, Hi, RH);		Hi = DAG.getNode(ISD::MULHU, dl, HalfVT, LL, RH);
Hi = DAG.getNode(ISD::ADD, dl, HiLoVT, Hi, LH);
return true;
}		}
if (HasMULHU) {
Lo = DAG.getNode(ISD::MUL, dl, HiLoVT, LL, RL);		SDVTList VTList = DAG.getVTList(HalfVT, MVT::Glue);
Hi = DAG.getNode(ISD::MULHU, dl, HiLoVT, LL, RL);		SDValue SumLo, SumHi;
RH = DAG.getNode(ISD::MUL, dl, HiLoVT, LL, RH);		SumHi = Hi;
LH = DAG.getNode(ISD::MUL, dl, HiLoVT, LH, RL);		SumLo = DAG.getNode(ISD::ADDC, dl, VTList, Next, Lo);
Hi = DAG.getNode(ISD::ADD, dl, HiLoVT, Hi, RH);
Hi = DAG.getNode(ISD::ADD, dl, HiLoVT, Hi, LH);		if (HasUMUL_LOHI) {
		SDValue UMulLOHI = DAG.getNode(ISD::UMUL_LOHI, dl,
		DAG.getVTList(HalfVT, HalfVT), LH, RL);
		Lo = UMulLOHI;
		Hi = UMulLOHI.getValue(1);
		} else {
		Lo = DAG.getNode(ISD::MUL, dl, HalfVT, LH, RL);
		Hi = DAG.getNode(ISD::MULHU, dl, HalfVT, LH, RL);
		}

		SumHi = DAG.getNode(ISD::ADDE, dl, VTList, SumHi, Hi, SumLo.getValue(1));
		SumLo = DAG.getNode(ISD::ADDC, dl, VTList, SumLo, Lo);
		Result.push_back(SumLo);

		SDValue Carry = SumHi.getValue(1);

		if (HasUMUL_LOHI) {
		SDValue UMulLOHI = DAG.getNode(ISD::UMUL_LOHI, dl,
		DAG.getVTList(HalfVT, HalfVT), LH, RH);
		Lo = UMulLOHI;
		Hi = UMulLOHI.getValue(1);
		} else {
		Lo = DAG.getNode(ISD::MUL, dl, HalfVT, LH, RH);
		Hi = DAG.getNode(ISD::MULHU, dl, HalfVT, LH, RH);
		}

		SDValue Zero = DAG.getConstant(0, dl, HalfVT);
		SumLo = DAG.getNode(ISD::ADDE, dl, VTList, SumHi, Lo, SumLo.getValue(1));
		SumHi = DAG.getNode(ISD::ADDE, dl, VTList, Hi, Zero, Carry);
		SumHi =
		DAG.getNode(ISD::ADDE, dl, VTList, SumHi, Zero, SumLo.getValue(1));
		Result.push_back(SumLo);
		Result.push_back(SumHi);
return true;		return true;
}		}
}		}
return false;		return false;
}		}

		bool TargetLowering::expandMUL(SDNode *N, SDValue &Lo, SDValue &Hi, EVT HiLoVT,
		SelectionDAG &DAG, SDValue LL, SDValue LH,
		SDValue RL, SDValue RH) const {
		SmallVector<SDValue, 2> Result;
		bool Ok =
		expandUMUL_LOHI(N->getOpcode(), N->getValueType(0), N, N->getOperand(0),
		N->getOperand(1), Result, HiLoVT, DAG, LL, LH, RL, RH);
		if (Result.size() >= 2) {
		Lo = Result[0];
		Hi = Result[1];
		}
		return Ok;
		}

bool TargetLowering::expandFP_TO_SINT(SDNode *Node, SDValue &Result,		bool TargetLowering::expandFP_TO_SINT(SDNode *Node, SDValue &Result,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
EVT VT = Node->getOperand(0).getValueType();		EVT VT = Node->getOperand(0).getValueType();
EVT NVT = Node->getValueType(0);		EVT NVT = Node->getValueType(0);
SDLoc dl(SDValue(Node, 0));		SDLoc dl(SDValue(Node, 0));

// FIXME: Only f32 to i64 conversions are supported.		// FIXME: Only f32 to i64 conversions are supported.
if (VT != MVT::f32 \|\| NVT != MVT::i64)		if (VT != MVT::f32 \|\| NVT != MVT::i64)
▲ Show 20 Lines • Show All 484 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/udiv.ll

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	define void @v_udiv_i24(i32 addrspace(1)* %out, i24 addrspace(1)* %in) {
%den_ptr = getelementptr i24, i24 addrspace(1)* %in, i24 1		%den_ptr = getelementptr i24, i24 addrspace(1)* %in, i24 1
%num = load i24, i24 addrspace(1) * %in		%num = load i24, i24 addrspace(1) * %in
%den = load i24, i24 addrspace(1) * %den_ptr		%den = load i24, i24 addrspace(1) * %den_ptr
%result = udiv i24 %num, %den		%result = udiv i24 %num, %den
%result.ext = zext i24 %result to i32		%result.ext = zext i24 %result to i32
store i32 %result.ext, i32 addrspace(1)* %out		store i32 %result.ext, i32 addrspace(1)* %out
ret void		ret void
}		}

		; FUNC-LABEL: {{^}}udiv_i32_const:
		; SI: v_mov_b32_e32 [[MAGIC:v[0-9]+]], 0x24924925
		; SI-NOT: v_rcp
		define void @udiv_i32_const(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
		%num = load i32, i32 addrspace(1)* %in
		%result = udiv i32 %num, 7
		store i32 %result, i32 addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}udiv_i64_const:
		; SI-DAG: s_mov_b32 [[MAGIC_HI:s[0-9]+]], 0x24924924
		; SI-DAG: s_mov_b32 [[MAGIC_LO:s[0-9]+]], 0x92492493
		; SI-NOT: v_rcp
		define void @udiv_i64_const(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
		%num = load i64, i64 addrspace(1)* %in
		%result = udiv i64 %num, 7
		store i64 %result, i64 addrspace(1)* %out
		ret void
		}

test/CodeGen/SPARC/rem.ll

	Show All 18 Lines

	define i64 @test2(i64 %X, i64 %Y) {			define i64 @test2(i64 %X, i64 %Y) {
	%tmp1 = urem i64 %X, %Y			%tmp1 = urem i64 %X, %Y
	ret i64 %tmp1			ret i64 %tmp1
	}			}

	; PR18150			; PR18150
	; CHECK-LABEL: test3			; CHECK-LABEL: test3
	; CHECK: sethi 2545, [[R0:%[gilo][0-7]]]			; CHECK: sethi 2545, %o1
	; CHECK: or [[R0]], 379, [[R1:%[gilo][0-7]]]			; CHECK-NEXT: or %o1, 379, %o1
	; CHECK: mulx %o0, [[R1]], [[R2:%[gilo][0-7]]]			; CHECK-NEXT: mulx %o0, %o1, %o0
	; CHECK: udivx [[R2]], 1021, [[R3:%[gilo][0-7]]]			; CHECK-NEXT: sethi 12324, %o1
	; CHECK: mulx [[R3]], 1021, [[R4:%[gilo][0-7]]]			; CHECK-NEXT: or %o1, 108, %o1
	; CHECK: sub [[R2]], [[R4]], %o0			; CHECK-NEXT: smul %o0, %o1, %o2
				; CHECK-NEXT: srl %o0, 0, %o3
				; CHECK-NEXT: sethi 1331003, %o4
				; CHECK-NEXT: or %o4, 435, %o4
				; CHECK-NEXT: mulx %o3, %o4, %o5
				; CHECK-NEXT: srlx %o5, 32, %o5
				; CHECK-NEXT: srlx %o0, 32, %g2
				; CHECK-NEXT: mulx %g2, %o4, %g3
				; CHECK-NEXT: srlx %g3, 32, %g3
				; CHECK-NEXT: mulx %o3, %o1, %o3
				; CHECK-NEXT: srlx %o3, 32, %o3
				; CHECK-NEXT: mulx %g2, %o1, %g4
				; CHECK-NEXT: srlx %g4, 32, %g4
				; CHECK-NEXT: addcc %o5, %o2, %o2
				; CHECK-NEXT: addxcc %o3, %g3, %o3
				; CHECK-NEXT: addxcc %g4, 0, %o5
				; CHECK-NEXT: smul %g2, %o4, %o4
				; CHECK-NEXT: smul %g2, %o1, %o1
				; CHECK-NEXT: addcc %o2, %o4, %o2
				; CHECK-NEXT: addxcc %o3, %o1, %o1
				; CHECK-NEXT: addxcc %o5, 0, %o2
				; CHECK-NEXT: srl %o1, 0, %o1
				; CHECK-NEXT: sllx %o2, 32, %o2
				; CHECK-NEXT: or %o1, %o2, %o1
				; CHECK-NEXT: sub %o0, %o1, %o2
				; CHECK-NEXT: srlx %o2, 1, %o2
				; CHECK-NEXT: add %o2, %o1, %o1
				; CHECK-NEXT: srlx %o1, 9, %o1
				; CHECK-NEXT: mulx %o1, 1021, %o1
				; CHECK-NEXT: retl
				; CHECK-NEXT: sub %o0, %o1, %o0
				efriedmaUnsubmitted Not Done Reply Inline Actions This is generating 8 multiply instructions; something is going wrong in your algorithm. (It should only take four multiply instructions to perform a double-width multiply.) efriedma: This is generating 8 multiply instructions; something is going wrong in your algorithm. (It…

	define i64 @test3(i64 %b) {			define i64 @test3(i64 %b) {
	entry:			entry:
	%mul = mul i64 %b, 2606459			%mul = mul i64 %b, 2606459
	%rem = urem i64 %mul, 1021			%rem = urem i64 %mul, 1021
	ret i64 %rem			ret i64 %rem
	}			}