This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Enable division-by-constant optimization for wide types
AcceptedPublic

Authored by nhaehnle on Sep 22 2016, 5:07 AM.

Download Raw Diff

Details

Reviewers

spatel
• tstellarAMD
venkatra
bkramer
arsenm
ast
javed.absar

Summary

This relies on previous support for expanding MULH[US] / [US]MUL_LOHI. Instead
of doing division-by-constant only when those instructions are legal, targets
should now use isIntDivCheap to signal that they do not want this expansion.

This change allows 64-bit division-by-constant to use the more efficient
multiply and shift lowering on AMDGPU.

This also affects a lowering on SPARC in a way that may or may not be more
efficient, see the change in the corresponding test case for the effect. I'd
appreciate some feedback on that.

The vector case is not enabled yet even though it should be correct and will
likely allow better overall code generation eventually. Enabling it gives some
regressions in X86 tests, mostly due to what looks like insufficient
peep-holing when vNi64 multiplies are scalarized.

Diff Detail

Build Status

Buildable 1582
Build 1582: arc lint + arc unit

Event Timeline

nhaehnle updated this revision to Diff 72153.Sep 22 2016, 5:07 AM

nhaehnle retitled this revision from to [SelectionDAG] Expand MULHU and enable division-by-constant for wide types.

nhaehnle updated this object.

nhaehnle added reviewers: spatel, bkramer, venkatra, arsenm, • tstellarAMD.

nhaehnle added a subscriber: llvm-commits.

Herald added subscribers: nhaehnle, wdng, jyknight. · View Herald TranscriptSep 22 2016, 5:07 AM

The reason for excluding the vector case is that it hits another problem with MULHU in some X86 vector div tests, and affects additional X86 test cases. I'd prefer to keep things simple for this change.

efriedma added a subscriber: efriedma.Sep 22 2016, 10:47 AM

efriedma added inline comments.

test/CodeGen/SPARC/rem.ll
62	This is generating 8 multiply instructions; something is going wrong in your algorithm. (It should only take four multiply instructions to perform a double-width multiply.)

I looked a bit more closely, and the issue with the Sparc test is that the code uses isOperationLegalOrCustom(ISD::UMUL_LOHI, MVT::i32) to check - this is taken unchanged from expandMUL - but UMUL_LOHI is Expand. When I force my code in expandUMUL_LOHI to use the smaller size UMUL_LOHI, I get only 4 multiplies.

I don't see a clear picture of how multiplication legalization is supposed to work. Currently, MULHU can become UMUL_LOHI in LegalizeDAG::ExpandNode, and UMUL_LOHI can convert to a MUL in a wider type in the DAGCombiner, or to a MUL + MULHU, but only if one of the resulting ops can be combined further. Each of these steps only happen when the resulting operations are LegalOrCustom.

Furthermore, UMUL_LOHI is marked as Expand in the targets that I've looked at, but there isn't actually any code for it in LegalizeDAG::ExpandNode. It all seems a bit messy :(

In part I get the impression that the LegalizeAction just doesn't contain enough information. If a target sets UMUL_LOHI to Expand, should that be a sequence of multiplies of the half-sized integer type, or should it be a single multiply in a larger type? Perhaps Promote should be used to indicate the second option?

In D24822#550503, @nhaehnle wrote:

In part I get the impression that the LegalizeAction just doesn't contain enough information. If a target sets UMUL_LOHI to Expand, should that be a sequence of multiplies of the half-sized integer type, or should it be a single multiply in a larger type? Perhaps Promote should be used to indicate the second option?

It's even worse because at least AMDGPU would want to legalize UMUL_LOHI on MVT::i32 to MUL + MULHU. So there are at least three plausible ways of handling a non-legal UMUL_LOHI. Perhaps this should be made explicit with an enum that the TargetLowering can choose from.

Using Promote to indicate that a larger multiply should be used seems reasonable.

Not sure it really makes sense to say that there are three ways to perform the operation; UMUL_LOHI and MULHU are essentially the same operation, in the same way that DIV and DIVREM are the same operation.

Move the bulk of the multiplication changes to D24956.

Herald added a subscriber: arsenm. · View Herald TranscriptSep 27 2016, 3:38 AM

+ BPF test changes because the sdiv lowering fails later now.

ast added inline comments.Sep 27 2016, 8:04 AM

test/CodeGen/BPF/sdiv_error.ll
3 ↗	(On Diff #72621)	'cannot select' is unreadable to C programmer comparing to 'Unsupported signed div'. Usability of BPF is the hardest problem we're facing, so we don't want to lose those error messages. Hence similar to the SDIV error, can you please add the same error for SMUL_LOHI in BPFDAGToDAGISel::Select() ? Also keeping the same hint "Please convert to unsigned div/mod". Thanks

nhaehnle added inline comments.Sep 27 2016, 12:38 PM

test/CodeGen/BPF/sdiv_error.ll
3 ↗	(On Diff #72621)	Hmm, relying on backend error messages for usability is maybe not the best idea... Not sure how to change this. Perhaps adding an isSigned parameter to isIntDivCheap?

ast requested changes to this revision.Sep 27 2016, 1:40 PM

ast edited edge metadata.

ast added inline comments.

test/CodeGen/BPF/sdiv_error.ll
3 ↗	(On Diff #72621)	it's not the matter of optimization. There is no sdiv instruction. So please keep the backend error.

This revision now requires changes to proceed.Sep 27 2016, 1:40 PM

I still think it's pretty misguided to rely on errors from the backend,
given that this would be trivial to catch in the frontend where you have
more context for useful error messages anyway, but whatever.

I'm going to take the isIntDivCheap route because yes, it does have to do
with optimizations: you're relying on no optimizations being applied to
sdiv.

Herald added subscribers: dschuff, jfb. · View Herald TranscriptSep 28 2016, 1:13 AM

n.bozhenov added a subscriber: n.bozhenov.Sep 28 2016, 7:38 AM

lgtm
there are different front-ends and we cannot control them all, so backend has to have end-user understandable errors, though it's not pretty.

This revision is now accepted and ready to land.Sep 28 2016, 9:05 AM

nhaehnle mentioned this in D25289: AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes.Oct 5 2016, 11:38 AM

nhaehnle added a parent revision: D25289: AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes.Oct 5 2016, 11:39 AM

Diffusion mentioned this in rL284224: AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes.Oct 14 2016, 3:39 AM

Rebase on latest version of D24956.

Herald edited edge metadata. · View Herald TranscriptNov 24 2016, 1:34 AM

Diffusion mentioned this in rL289050: [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI.Dec 8 2016, 6:18 AM

What happened to this patch?

Herald added a reviewer: javed.absar. · View Herald TranscriptJun 26 2018, 5:12 PM

Herald added subscribers: fedor.sergeev, aheejin, jgravelle-google and 2 others. · View Herald Transcript

@nhaehnle If you could rebase this and enable vector idiv to show the x86 regressions I may be able to help

RKSimon mentioned this in D87976: Support the division-by-constant strength reduction for more integer types.Sep 27 2020, 10:48 AM

nagisa added a subscriber: nagisa.Jun 5 2021, 5:10 AM

Herald added subscribers: ecnelises, kerbowa, pengfei and 2 others. · View Herald TranscriptJun 5 2021, 5:10 AM

Revision Contents

Path

Size

include/

llvm/

Target/

TargetLowering.h

2 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

10 lines

TargetLowering.cpp

49 lines

Target/

AArch64/

AArch64ISelLowering.h

2 lines

AArch64ISelLowering.cpp

5 lines

AMDGPU/

SOPInstructions.td

7 lines

BPF/

BPFISelLowering.h

2 lines

BPFISelLowering.cpp

8 lines

WebAssembly/

WebAssemblyISelLowering.h

2 lines

WebAssemblyISelLowering.cpp

3 lines

X86/

X86ISelLowering.h

2 lines

X86ISelLowering.cpp

3 lines

test/

CodeGen/

AMDGPU/

sdiv.ll

21 lines

udiv.ll

21 lines

SPARC/

rem.ll

39 lines

Diff 79195

include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	shouldExpandBuildVectorWithShuffles(EVT /* VT */,
unsigned DefinedValues) const {		unsigned DefinedValues) const {
return DefinedValues < 3;		return DefinedValues < 3;
}		}

/// Return true if integer divide is usually cheaper than a sequence of		/// Return true if integer divide is usually cheaper than a sequence of
/// several shifts, adds, and multiplies for this target.		/// several shifts, adds, and multiplies for this target.
/// The definition of "cheaper" may depend on whether we're optimizing		/// The definition of "cheaper" may depend on whether we're optimizing
/// for speed or for size.		/// for speed or for size.
virtual bool isIntDivCheap(EVT VT, AttributeSet Attr) const {		virtual bool isIntDivCheap(EVT VT, AttributeSet Attr, bool Signed) const {
return false;		return false;
}		}

/// Return true if the target can handle a standalone remainder operation.		/// Return true if the target can handle a standalone remainder operation.
virtual bool hasStandaloneRem(EVT VT) const {		virtual bool hasStandaloneRem(EVT VT) const {
return true;		return true;
}		}

▲ Show 20 Lines • Show All 2,919 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,363 Lines • ▼ Show 20 Lines	if (N1C && !N1C->isNullValue() && !N1C->isOpaque() &&
AddToWorklist(SRA.getNode());		AddToWorklist(SRA.getNode());
return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), SRA);		return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), SRA);
}		}

// If integer divide is expensive and we satisfy the requirements, emit an		// If integer divide is expensive and we satisfy the requirements, emit an
// alternate sequence. Targets may check function attributes for size/speed		// alternate sequence. Targets may check function attributes for size/speed
// trade-offs.		// trade-offs.
AttributeSet Attr = DAG.getMachineFunction().getFunction()->getAttributes();		AttributeSet Attr = DAG.getMachineFunction().getFunction()->getAttributes();
if (N1C && !TLI.isIntDivCheap(N->getValueType(0), Attr))		if (N1C && !TLI.isIntDivCheap(N->getValueType(0), Attr, true))
if (SDValue Op = BuildSDIV(N))		if (SDValue Op = BuildSDIV(N))
return Op;		return Op;

// sdiv, srem -> sdivrem		// sdiv, srem -> sdivrem
// If the divisor is constant, then return DIVREM only if isIntDivCheap() is true.		// If the divisor is constant, then return DIVREM only if isIntDivCheap() is true.
// Otherwise, we break the simplification logic in visitREM().		// Otherwise, we break the simplification logic in visitREM().
if (!N1C \|\| TLI.isIntDivCheap(N->getValueType(0), Attr))		if (!N1C \|\| TLI.isIntDivCheap(N->getValueType(0), Attr, true))
if (SDValue DivRem = useDivRem(N))		if (SDValue DivRem = useDivRem(N))
return DivRem;		return DivRem;

// undef / X -> 0		// undef / X -> 0
if (N0.isUndef())		if (N0.isUndef())
return DAG.getConstant(0, DL, VT);		return DAG.getConstant(0, DL, VT);
// X / undef -> undef		// X / undef -> undef
if (N1.isUndef())		if (N1.isUndef())
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (ConstantSDNode *SHC = isConstOrConstSplat(N1.getOperand(0))) {
AddToWorklist(Add.getNode());		AddToWorklist(Add.getNode());
return DAG.getNode(ISD::SRL, DL, VT, N0, Add);		return DAG.getNode(ISD::SRL, DL, VT, N0, Add);
}		}
}		}
}		}

// fold (udiv x, c) -> alternate		// fold (udiv x, c) -> alternate
AttributeSet Attr = DAG.getMachineFunction().getFunction()->getAttributes();		AttributeSet Attr = DAG.getMachineFunction().getFunction()->getAttributes();
if (N1C && !TLI.isIntDivCheap(N->getValueType(0), Attr))		if (N1C && !TLI.isIntDivCheap(N->getValueType(0), Attr, false))
if (SDValue Op = BuildUDIV(N))		if (SDValue Op = BuildUDIV(N))
return Op;		return Op;

// sdiv, srem -> sdivrem		// sdiv, srem -> sdivrem
// If the divisor is constant, then return DIVREM only if isIntDivCheap() is true.		// If the divisor is constant, then return DIVREM only if isIntDivCheap() is true.
// Otherwise, we break the simplification logic in visitREM().		// Otherwise, we break the simplification logic in visitREM().
if (!N1C \|\| TLI.isIntDivCheap(N->getValueType(0), Attr))		if (!N1C \|\| TLI.isIntDivCheap(N->getValueType(0), Attr, false))
if (SDValue DivRem = useDivRem(N))		if (SDValue DivRem = useDivRem(N))
return DivRem;		return DivRem;

// undef / X -> 0		// undef / X -> 0
if (N0.isUndef())		if (N0.isUndef())
return DAG.getConstant(0, DL, VT);		return DAG.getConstant(0, DL, VT);
// X / undef -> undef		// X / undef -> undef
if (N1.isUndef())		if (N1.isUndef())
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitREM(SDNode *N) {
// If X/C can be simplified by the division-by-constant logic, lower		// If X/C can be simplified by the division-by-constant logic, lower
// X%C to the equivalent of X-X/C*C.		// X%C to the equivalent of X-X/C*C.
// To avoid mangling nodes, this simplification requires that the combine()		// To avoid mangling nodes, this simplification requires that the combine()
// call for the speculative DIV must not cause a DIVREM conversion. We guard		// call for the speculative DIV must not cause a DIVREM conversion. We guard
// against this by skipping the simplification if isIntDivCheap(). When		// against this by skipping the simplification if isIntDivCheap(). When
// div is not cheap, combine will not return a DIVREM. Regardless,		// div is not cheap, combine will not return a DIVREM. Regardless,
// checking cheapness here makes sense since the simplification results in		// checking cheapness here makes sense since the simplification results in
// fatter code.		// fatter code.
if (N1C && !N1C->isNullValue() && !TLI.isIntDivCheap(VT, Attr)) {		if (N1C && !N1C->isNullValue() && !TLI.isIntDivCheap(VT, Attr, isSigned)) {
unsigned DivOpcode = isSigned ? ISD::SDIV : ISD::UDIV;		unsigned DivOpcode = isSigned ? ISD::SDIV : ISD::UDIV;
SDValue Div = DAG.getNode(DivOpcode, DL, VT, N0, N1);		SDValue Div = DAG.getNode(DivOpcode, DL, VT, N0, N1);
AddToWorklist(Div.getNode());		AddToWorklist(Div.getNode());
SDValue OptimizedDiv = combine(Div.getNode());		SDValue OptimizedDiv = combine(Div.getNode());
if (OptimizedDiv.getNode() && OptimizedDiv.getNode() != Div.getNode()) {		if (OptimizedDiv.getNode() && OptimizedDiv.getNode() != Div.getNode()) {
assert((OptimizedDiv.getOpcode() != ISD::UDIVREM) &&		assert((OptimizedDiv.getOpcode() != ISD::UDIVREM) &&
(OptimizedDiv.getOpcode() != ISD::SDIVREM));		(OptimizedDiv.getOpcode() != ISD::SDIVREM));
SDValue Mul = DAG.getNode(ISD::MUL, DL, VT, OptimizedDiv, N1);		SDValue Mul = DAG.getNode(ISD::MUL, DL, VT, OptimizedDiv, N1);
▲ Show 20 Lines • Show All 12,945 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 2,915 Lines • ▼ Show 20 Lines	static SDValue BuildExactSDIV(const TargetLowering &TLI, SDValue Op1, APInt d,
return Mul;		return Mul;
}		}

SDValue TargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,		SDValue TargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,
SelectionDAG &DAG,		SelectionDAG &DAG,
std::vector<SDNode > Created) const {		std::vector<SDNode > Created) const {
AttributeSet Attr = DAG.getMachineFunction().getFunction()->getAttributes();		AttributeSet Attr = DAG.getMachineFunction().getFunction()->getAttributes();
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (TLI.isIntDivCheap(N->getValueType(0), Attr))		if (TLI.isIntDivCheap(N->getValueType(0), Attr, true))
return SDValue(N,0); // Lower SDIV as SDIV		return SDValue(N,0); // Lower SDIV as SDIV
return SDValue();		return SDValue();
}		}

/// \brief Given an ISD::SDIV node expressing a divide by constant,		/// \brief Given an ISD::SDIV node expressing a divide by constant,
/// return a DAG expression to select that will generate the same value by		/// return a DAG expression to select that will generate the same value by
/// multiplying by a magic number.		/// multiplying by a magic number.
/// Ref: "Hacker's Delight" or "The PowerPC Compiler Writer's Guide".		/// Ref: "Hacker's Delight" or "The PowerPC Compiler Writer's Guide".
Show All 12 Lines	SDValue TargetLowering::BuildSDIV(SDNode *N, const APInt &Divisor,

// If the sdiv has an 'exact' bit we can use a simpler lowering.		// If the sdiv has an 'exact' bit we can use a simpler lowering.
if (cast<BinaryWithFlagsSDNode>(N)->Flags.hasExact())		if (cast<BinaryWithFlagsSDNode>(N)->Flags.hasExact())
return BuildExactSDIV(this, N->getOperand(0), Divisor, dl, DAG, Created);		return BuildExactSDIV(this, N->getOperand(0), Divisor, dl, DAG, Created);

APInt::ms magics = Divisor.magic();		APInt::ms magics = Divisor.magic();

// Multiply the numerator (operand 0) by the magic value		// Multiply the numerator (operand 0) by the magic value
// FIXME: We should support doing a MUL in a wider type		// FIXME: expand using MULHS for vector types after addressing possible
		// regressions in X86 backend.
		unsigned Opcode;
		if (IsAfterLegalization ? isOperationLegal(ISD::MULHS, VT)
		: isOperationLegalOrCustom(ISD::MULHS, VT))
		Opcode = ISD::MULHS;
		else if (IsAfterLegalization ? isOperationLegal(ISD::SMUL_LOHI, VT)
		: isOperationLegalOrCustom(ISD::SMUL_LOHI, VT))
		Opcode = ISD::SMUL_LOHI;
		else if (!IsAfterLegalization && !VT.isVector())
		Opcode = ISD::MULHS;
		else
		return SDValue();

SDValue Q;		SDValue Q;
if (IsAfterLegalization ? isOperationLegal(ISD::MULHS, VT) :		if (Opcode == ISD::MULHS)
isOperationLegalOrCustom(ISD::MULHS, VT))
Q = DAG.getNode(ISD::MULHS, dl, VT, N->getOperand(0),		Q = DAG.getNode(ISD::MULHS, dl, VT, N->getOperand(0),
DAG.getConstant(magics.m, dl, VT));		DAG.getConstant(magics.m, dl, VT));
else if (IsAfterLegalization ? isOperationLegal(ISD::SMUL_LOHI, VT) :		else
isOperationLegalOrCustom(ISD::SMUL_LOHI, VT))
Q = SDValue(DAG.getNode(ISD::SMUL_LOHI, dl, DAG.getVTList(VT, VT),		Q = SDValue(DAG.getNode(ISD::SMUL_LOHI, dl, DAG.getVTList(VT, VT),
N->getOperand(0),		N->getOperand(0),
DAG.getConstant(magics.m, dl, VT)).getNode(), 1);		DAG.getConstant(magics.m, dl, VT)).getNode(), 1);
else
return SDValue(); // No mulhs or equvialent
// If d > 0 and m < 0, add the numerator		// If d > 0 and m < 0, add the numerator
if (Divisor.isStrictlyPositive() && magics.m.isNegative()) {		if (Divisor.isStrictlyPositive() && magics.m.isNegative()) {
Q = DAG.getNode(ISD::ADD, dl, VT, Q, N->getOperand(0));		Q = DAG.getNode(ISD::ADD, dl, VT, Q, N->getOperand(0));
Created->push_back(Q.getNode());		Created->push_back(Q.getNode());
}		}
// If d < 0 and m > 0, subtract the numerator.		// If d < 0 and m > 0, subtract the numerator.
if (Divisor.isNegative() && magics.m.isStrictlyPositive()) {		if (Divisor.isNegative() && magics.m.isStrictlyPositive()) {
Q = DAG.getNode(ISD::SUB, dl, VT, Q, N->getOperand(0));		Q = DAG.getNode(ISD::SUB, dl, VT, Q, N->getOperand(0));
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	if (magics.a != 0 && !Divisor[0]) {
Created->push_back(Q.getNode());		Created->push_back(Q.getNode());

// Get magic number for the shifted divisor.		// Get magic number for the shifted divisor.
magics = Divisor.lshr(Shift).magicu(Shift);		magics = Divisor.lshr(Shift).magicu(Shift);
assert(magics.a == 0 && "Should use cheap fixup now");		assert(magics.a == 0 && "Should use cheap fixup now");
}		}

// Multiply the numerator (operand 0) by the magic value		// Multiply the numerator (operand 0) by the magic value
// FIXME: We should support doing a MUL in a wider type		// FIXME: expand using MULHU for vector types after addressing possible
if (IsAfterLegalization ? isOperationLegal(ISD::MULHU, VT) :		// regressions in X86 backend.
isOperationLegalOrCustom(ISD::MULHU, VT))		unsigned Opcode;
		if (IsAfterLegalization ? isOperationLegal(ISD::MULHU, VT)
		: isOperationLegalOrCustom(ISD::MULHU, VT))
		Opcode = ISD::MULHU;
		else if (IsAfterLegalization ? isOperationLegal(ISD::UMUL_LOHI, VT)
		: isOperationLegalOrCustom(ISD::UMUL_LOHI, VT))
		Opcode = ISD::UMUL_LOHI;
		else if (!IsAfterLegalization && !VT.isVector())
		Opcode = ISD::MULHU;
		else
		return SDValue();

		if (Opcode == ISD::MULHU)
Q = DAG.getNode(ISD::MULHU, dl, VT, Q, DAG.getConstant(magics.m, dl, VT));		Q = DAG.getNode(ISD::MULHU, dl, VT, Q, DAG.getConstant(magics.m, dl, VT));
else if (IsAfterLegalization ? isOperationLegal(ISD::UMUL_LOHI, VT) :		else
isOperationLegalOrCustom(ISD::UMUL_LOHI, VT))
Q = SDValue(DAG.getNode(ISD::UMUL_LOHI, dl, DAG.getVTList(VT, VT), Q,		Q = SDValue(DAG.getNode(ISD::UMUL_LOHI, dl, DAG.getVTList(VT, VT), Q,
DAG.getConstant(magics.m, dl, VT)).getNode(), 1);		DAG.getConstant(magics.m, dl, VT)).getNode(), 1);
else
return SDValue(); // No mulhu or equivalent

Created->push_back(Q.getNode());		Created->push_back(Q.getNode());

if (magics.a == 0) {		if (magics.a == 0) {
assert(magics.s < Divisor.getBitWidth() &&		assert(magics.s < Divisor.getBitWidth() &&
"We shouldn't generate an undefined shift!");		"We shouldn't generate an undefined shift!");
return DAG.getNode(		return DAG.getNode(
ISD::SRL, dl, VT, Q,		ISD::SRL, dl, VT, Q,
▲ Show 20 Lines • Show All 736 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	public:
/// If a physical register, this returns the register that receives the		/// If a physical register, this returns the register that receives the
/// exception typeid on entry to a landing pad.		/// exception typeid on entry to a landing pad.
unsigned		unsigned
getExceptionSelectorRegister(const Constant *PersonalityFn) const override {		getExceptionSelectorRegister(const Constant *PersonalityFn) const override {
// FIXME: This is a guess. Has this been defined yet?		// FIXME: This is a guess. Has this been defined yet?
return AArch64::X1;		return AArch64::X1;
}		}

bool isIntDivCheap(EVT VT, AttributeSet Attr) const override;		bool isIntDivCheap(EVT VT, AttributeSet Attr, bool Signed) const override;

bool isCheapToSpeculateCttz() const override {		bool isCheapToSpeculateCttz() const override {
return true;		return true;
}		}

bool isCheapToSpeculateCtlz() const override {		bool isCheapToSpeculateCtlz() const override {
return true;		return true;
}		}
▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,605 Lines • ▼ Show 20 Lines	static SDValue performXorCombine(SDNode *N, SelectionDAG &DAG,
return performIntegerAbsCombine(N, DAG);		return performIntegerAbsCombine(N, DAG);
}		}

SDValue		SDValue
AArch64TargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,		AArch64TargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,
SelectionDAG &DAG,		SelectionDAG &DAG,
std::vector<SDNode > Created) const {		std::vector<SDNode > Created) const {
AttributeSet Attr = DAG.getMachineFunction().getFunction()->getAttributes();		AttributeSet Attr = DAG.getMachineFunction().getFunction()->getAttributes();
if (isIntDivCheap(N->getValueType(0), Attr))		if (isIntDivCheap(N->getValueType(0), Attr, true))
return SDValue(N,0); // Lower SDIV as SDIV		return SDValue(N,0); // Lower SDIV as SDIV

// fold (sdiv X, pow2)		// fold (sdiv X, pow2)
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
if ((VT != MVT::i32 && VT != MVT::i64) \|\|		if ((VT != MVT::i32 && VT != MVT::i64) \|\|
!(Divisor.isPowerOf2() \|\| (-Divisor).isPowerOf2()))		!(Divisor.isPowerOf2() \|\| (-Divisor).isPowerOf2()))
return SDValue();		return SDValue();

▲ Show 20 Lines • Show All 2,993 Lines • ▼ Show 20 Lines	for (const MCPhysReg I = IStart; I; ++I) {
// Insert the copy-back instructions right before the terminator.		// Insert the copy-back instructions right before the terminator.
for (auto *Exit : Exits)		for (auto *Exit : Exits)
BuildMI(*Exit, Exit->getFirstTerminator(), DebugLoc(),		BuildMI(*Exit, Exit->getFirstTerminator(), DebugLoc(),
TII->get(TargetOpcode::COPY), *I)		TII->get(TargetOpcode::COPY), *I)
.addReg(NewVR);		.addReg(NewVR);
}		}
}		}

bool AArch64TargetLowering::isIntDivCheap(EVT VT, AttributeSet Attr) const {		bool AArch64TargetLowering::isIntDivCheap(EVT VT, AttributeSet Attr,
		bool Signed) const {
// Integer division on AArch64 is expensive. However, when aggressively		// Integer division on AArch64 is expensive. However, when aggressively
// optimizing for code size, we prefer to use a div instruction, as it is		// optimizing for code size, we prefer to use a div instruction, as it is
// usually smaller than the alternative sequence.		// usually smaller than the alternative sequence.
// The exception to this is vector division. Since AArch64 doesn't have vector		// The exception to this is vector division. Since AArch64 doesn't have vector
// integer division, leaving the division as-is is a loss even in terms of		// integer division, leaving the division as-is is a loss even in terms of
// size, because it will have to be scalarized, while the alternative code		// size, because it will have to be scalarized, while the alternative code
// sequence can be performed in vector form.		// sequence can be performed in vector form.
bool OptSize =		bool OptSize =
Attr.hasAttribute(AttributeSet::FunctionIndex, Attribute::MinSize);		Attr.hasAttribute(AttributeSet::FunctionIndex, Attribute::MinSize);
return OptSize && !VT.isVector();		return OptSize && !VT.isVector();
}		}

lib/Target/AMDGPU/SOPInstructions.td

	Show First 20 Lines • Show All 904 Lines • ▼ Show 20 Lines

	// V_ADD_I32_e32/S_ADD_U32 produces carry in VCC/SCC. For the vector			// V_ADD_I32_e32/S_ADD_U32 produces carry in VCC/SCC. For the vector
	// case, the sgpr-copies pass will fix this to use the vector version.			// case, the sgpr-copies pass will fix this to use the vector version.
	def : Pat <			def : Pat <
	(i32 (addc i32:$src0, i32:$src1)),			(i32 (addc i32:$src0, i32:$src1)),
	(S_ADD_U32 $src0, $src1)			(S_ADD_U32 $src0, $src1)
	>;			>;

				// Similarly for V_SUB_I32/S_SUB_U32.
				def : Pat <
				(i32 (subc i32:$src0, i32:$src1)),
				(S_SUB_U32 $src0, $src1)
				>;

	// FIXME: We need to use COPY_TO_REGCLASS to work-around the fact that			// FIXME: We need to use COPY_TO_REGCLASS to work-around the fact that
	// REG_SEQUENCE patterns don't support instructions with multiple			// REG_SEQUENCE patterns don't support instructions with multiple
	// outputs.			// outputs.
	def : Pat<			def : Pat<
	(i64 (zext i16:$src)),			(i64 (zext i16:$src)),
	(REG_SEQUENCE SReg_64,			(REG_SEQUENCE SReg_64,
	(i32 (COPY_TO_REGCLASS (S_AND_B32 $src, (S_MOV_B32 (i32 0xffff))), SGPR_32)), sub0,			(i32 (COPY_TO_REGCLASS (S_AND_B32 $src, (S_MOV_B32 (i32 0xffff))), SGPR_32)), sub0,
	(S_MOV_B32 (i32 0)), sub1)			(S_MOV_B32 (i32 0)), sub1)
	>;			>;

	def : Pat <			def : Pat <
	(i64 (sext i16:$src)),			(i64 (sext i16:$src)),
	(REG_SEQUENCE SReg_64, (i32 (S_SEXT_I32_I16 $src)), sub0,			(REG_SEQUENCE SReg_64, (i32 (S_SEXT_I32_I16 $src)), sub0,
	(i32 (COPY_TO_REGCLASS (S_ASHR_I32 (i32 (S_SEXT_I32_I16 $src)), (S_MOV_B32 (i32 31))), SGPR_32)), sub1)			(i32 (COPY_TO_REGCLASS (S_ASHR_I32 (i32 (S_SEXT_I32_I16 $src)), (S_MOV_B32 (i32 31))), SGPR_32)), sub1)
	>;			>;

	def : Pat<			def : Pat<
	(i32 (zext i16:$src)),			(i32 (zext i16:$src)),
	(S_AND_B32 (S_MOV_B32 (i32 0xffff)), $src)			(S_AND_B32 (S_MOV_B32 (i32 0xffff)), $src)
	>;			>;



	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// SOPP Patterns			// SOPP Patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def : Pat <			def : Pat <
	(int_amdgcn_s_waitcnt i32:$simm16),			(int_amdgcn_s_waitcnt i32:$simm16),
	(S_WAITCNT (as_i16imm $simm16))			(S_WAITCNT (as_i16imm $simm16))
	>;			>;
	▲ Show 20 Lines • Show All 282 Lines • Show Last 20 Lines

lib/Target/BPF/BPFISelLowering.h

Show All 40 Lines	public:

// This method returns the name of a target specific DAG node.		// This method returns the name of a target specific DAG node.
const char *getTargetNodeName(unsigned Opcode) const override;		const char *getTargetNodeName(unsigned Opcode) const override;

MachineBasicBlock *		MachineBasicBlock *
EmitInstrWithCustomInserter(MachineInstr &MI,		EmitInstrWithCustomInserter(MachineInstr &MI,
MachineBasicBlock *BB) const override;		MachineBasicBlock *BB) const override;

		bool isIntDivCheap(EVT VT, AttributeSet Attr, bool Signed) const override;

private:		private:
SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;

// Lower the result values of a call, copying them out of physregs into vregs		// Lower the result values of a call, copying them out of physregs into vregs
SDValue LowerCallResult(SDValue Chain, SDValue InFlag,		SDValue LowerCallResult(SDValue Chain, SDValue InFlag,
CallingConv::ID CallConv, bool IsVarArg,		CallingConv::ID CallConv, bool IsVarArg,
Show All 37 Lines

lib/Target/BPF/BPFISelLowering.cpp

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
setPrefFunctionAlignment(3);		setPrefFunctionAlignment(3);

// inline memcpy() for kernel to see explicit copy		// inline memcpy() for kernel to see explicit copy
MaxStoresPerMemset = MaxStoresPerMemsetOptSize = 128;		MaxStoresPerMemset = MaxStoresPerMemsetOptSize = 128;
MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = 128;		MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = 128;
MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = 128;		MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = 128;
}		}

		bool BPFTargetLowering::isIntDivCheap(EVT VT, AttributeSet Attr,
		bool Signed) const {
		// We don't want to apply optimizations to SDIV, so that the resulting
		// error messages about not having signed division do not depend on
		// optimizations.
		return Signed;
		}

SDValue BPFTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {		SDValue BPFTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
case ISD::BR_CC:		case ISD::BR_CC:
return LowerBR_CC(Op, DAG);		return LowerBR_CC(Op, DAG);
case ISD::GlobalAddress:		case ISD::GlobalAddress:
return LowerGlobalAddress(Op, DAG);		return LowerGlobalAddress(Op, DAG);
case ISD::SELECT_CC:		case ISD::SELECT_CC:
return LowerSELECT_CC(Op, DAG);		return LowerSELECT_CC(Op, DAG);
▲ Show 20 Lines • Show All 454 Lines • Show Last 20 Lines

lib/Target/WebAssembly/WebAssemblyISelLowering.h

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	std::pair<unsigned, const TargetRegisterClass *> getRegForInlineAsmConstraint(
const TargetRegisterInfo *TRI, StringRef Constraint,		const TargetRegisterInfo *TRI, StringRef Constraint,
MVT VT) const override;		MVT VT) const override;
bool isCheapToSpeculateCttz() const override;		bool isCheapToSpeculateCttz() const override;
bool isCheapToSpeculateCtlz() const override;		bool isCheapToSpeculateCtlz() const override;
bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,		bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,
unsigned AS) const override;		unsigned AS) const override;
bool allowsMisalignedMemoryAccesses(EVT, unsigned AddrSpace, unsigned Align,		bool allowsMisalignedMemoryAccesses(EVT, unsigned AddrSpace, unsigned Align,
bool *Fast) const override;		bool *Fast) const override;
bool isIntDivCheap(EVT VT, AttributeSet Attr) const override;		bool isIntDivCheap(EVT VT, AttributeSet Attr, bool Signed) const override;

SDValue LowerCall(CallLoweringInfo &CLI,		SDValue LowerCall(CallLoweringInfo &CLI,
SmallVectorImpl<SDValue> &InVals) const override;		SmallVectorImpl<SDValue> &InVals) const override;
bool CanLowerReturn(CallingConv::ID CallConv, MachineFunction &MF,		bool CanLowerReturn(CallingConv::ID CallConv, MachineFunction &MF,
bool isVarArg,		bool isVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
LLVMContext &Context) const override;		LLVMContext &Context) const override;
SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,		SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
Show All 29 Lines

lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

Show First 20 Lines • Show All 247 Lines • ▼ Show 20 Lines	bool WebAssemblyTargetLowering::allowsMisalignedMemoryAccesses(
// may be a performance impact. We tell LLVM they're "fast" because		// may be a performance impact. We tell LLVM they're "fast" because
// for the kinds of things that LLVM uses this for (merging adjacent stores		// for the kinds of things that LLVM uses this for (merging adjacent stores
// of constants, etc.), WebAssembly implementations will either want the		// of constants, etc.), WebAssembly implementations will either want the
// unaligned access or they'll split anyway.		// unaligned access or they'll split anyway.
if (Fast) *Fast = true;		if (Fast) *Fast = true;
return true;		return true;
}		}

bool WebAssemblyTargetLowering::isIntDivCheap(EVT VT, AttributeSet Attr) const {		bool WebAssemblyTargetLowering::isIntDivCheap(EVT VT, AttributeSet Attr,
		bool Signed) const {
// The current thinking is that wasm engines will perform this optimization,		// The current thinking is that wasm engines will perform this optimization,
// so we can save on code size.		// so we can save on code size.
return true;		return true;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// WebAssembly Lowering private implementation.		// WebAssembly Lowering private implementation.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 443 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,024 Lines • ▼ Show 20 Lines	public:
SDValue BuildFILD(SDValue Op, EVT SrcVT, SDValue Chain, SDValue StackSlot,		SDValue BuildFILD(SDValue Op, EVT SrcVT, SDValue Chain, SDValue StackSlot,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;

bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const override;		bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const override;

/// \brief Customize the preferred legalization strategy for certain types.		/// \brief Customize the preferred legalization strategy for certain types.
LegalizeTypeAction getPreferredVectorAction(EVT VT) const override;		LegalizeTypeAction getPreferredVectorAction(EVT VT) const override;

bool isIntDivCheap(EVT VT, AttributeSet Attr) const override;		bool isIntDivCheap(EVT VT, AttributeSet Attr, bool Signed) const override;

bool supportSwiftError() const override;		bool supportSwiftError() const override;

unsigned getMaxSupportedInterleaveFactor() const override { return 4; }		unsigned getMaxSupportedInterleaveFactor() const override { return 4; }

/// \brief Lower interleaved load(s) into target specific		/// \brief Lower interleaved load(s) into target specific
/// instructions/intrinsics.		/// instructions/intrinsics.
bool lowerInterleavedLoad(LoadInst *LI,		bool lowerInterleavedLoad(LoadInst *LI,
▲ Show 20 Lines • Show All 249 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 32,759 Lines • ▼ Show 20 Lines
	// vmovaps %ymm1, (%r8) can use port 2, 3, or 7.			// vmovaps %ymm1, (%r8) can use port 2, 3, or 7.
	if (isLegalAddressingMode(DL, AM, Ty, AS))			if (isLegalAddressingMode(DL, AM, Ty, AS))
	// Scale represents reg2 * scale, thus account for 1			// Scale represents reg2 * scale, thus account for 1
	// as soon as we use a second register.			// as soon as we use a second register.
	return AM.Scale != 0;			return AM.Scale != 0;
	return -1;			return -1;
	}			}

	bool X86TargetLowering::isIntDivCheap(EVT VT, AttributeSet Attr) const {			bool X86TargetLowering::isIntDivCheap(EVT VT, AttributeSet Attr,
				bool Signed) const {
	// Integer division on x86 is expensive. However, when aggressively optimizing			// Integer division on x86 is expensive. However, when aggressively optimizing
	// for code size, we prefer to use a div instruction, as it is usually smaller			// for code size, we prefer to use a div instruction, as it is usually smaller
	// than the alternative sequence.			// than the alternative sequence.
	// The exception to this is vector division. Since x86 doesn't have vector			// The exception to this is vector division. Since x86 doesn't have vector
	// integer division, leaving the division as-is is a loss even in terms of			// integer division, leaving the division as-is is a loss even in terms of
	// size, because it will have to be scalarized, while the alternative code			// size, because it will have to be scalarized, while the alternative code
	// sequence can be performed in vector form.			// sequence can be performed in vector form.
	bool OptSize = Attr.hasAttribute(AttributeSet::FunctionIndex,			bool OptSize = Attr.hasAttribute(AttributeSet::FunctionIndex,
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/sdiv.ll

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	define void @v_sdiv_i25(i32 addrspace(1)* %out, i25 addrspace(1)* %in) {
%num = load i25, i25 addrspace(1) * %in		%num = load i25, i25 addrspace(1) * %in
%den = load i25, i25 addrspace(1) * %den_ptr		%den = load i25, i25 addrspace(1) * %den_ptr
%result = sdiv i25 %num, %den		%result = sdiv i25 %num, %den
%result.ext = sext i25 %result to i32		%result.ext = sext i25 %result to i32
store i32 %result.ext, i32 addrspace(1)* %out		store i32 %result.ext, i32 addrspace(1)* %out
ret void		ret void
}		}

		; FUNC-LABEL: {{^}}sdiv_i32_const:
		; SI: v_mov_b32_e32 [[MAGIC:v[0-9]+]], 0x92492493
		; SI-NOT: v_rcp
		define void @sdiv_i32_const(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
		%num = load i32, i32 addrspace(1)* %in
		%result = sdiv i32 %num, 7
		store i32 %result, i32 addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}sdiv_i64_const:
		; SI-DAG: s_mov_b32 [[MAGIC_LO:s[0-9]+]], 0x24924925
		; SI-DAG: s_mov_b32 [[MAGIC_HI:s[0-9]+]], 0x49249249
		; SI-NOT: v_rcp
		define void @sdiv_i64_const(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
		%num = load i64, i64 addrspace(1)* %in
		%result = sdiv i64 %num, 7
		store i64 %result, i64 addrspace(1)* %out
		ret void
		}

; Tests for 64-bit divide bypass.		; Tests for 64-bit divide bypass.
; define void @test_get_quotient(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {		; define void @test_get_quotient(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {
; %result = sdiv i64 %a, %b		; %result = sdiv i64 %a, %b
; store i64 %result, i64 addrspace(1)* %out, align 8		; store i64 %result, i64 addrspace(1)* %out, align 8
; ret void		; ret void
; }		; }

; define void @test_get_remainder(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {		; define void @test_get_remainder(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind {
Show All 25 Lines

test/CodeGen/AMDGPU/udiv.ll

	Show First 20 Lines • Show All 152 Lines • ▼ Show 20 Lines
	; SI: v_mul_hi_u32			; SI: v_mul_hi_u32

	define void @scalarize_mulhu_4xi32(<4 x i32> addrspace(1)* nocapture readonly %in, <4 x i32> addrspace(1)* nocapture %out) {			define void @scalarize_mulhu_4xi32(<4 x i32> addrspace(1)* nocapture readonly %in, <4 x i32> addrspace(1)* nocapture %out) {
	%1 = load <4 x i32>, <4 x i32> addrspace(1)* %in, align 16			%1 = load <4 x i32>, <4 x i32> addrspace(1)* %in, align 16
	%2 = udiv <4 x i32> %1, <i32 53668, i32 53668, i32 53668, i32 53668>			%2 = udiv <4 x i32> %1, <i32 53668, i32 53668, i32 53668, i32 53668>
	store <4 x i32> %2, <4 x i32> addrspace(1)* %out, align 16			store <4 x i32> %2, <4 x i32> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

				; FUNC-LABEL: {{^}}udiv_i32_const:
				; SI: v_mov_b32_e32 [[MAGIC:v[0-9]+]], 0x24924925
				; SI-NOT: v_rcp
				define void @udiv_i32_const(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
				%num = load i32, i32 addrspace(1)* %in
				%result = udiv i32 %num, 7
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}udiv_i64_const:
				; SI-DAG: s_mov_b32 [[MAGIC_HI:s[0-9]+]], 0x24924924
				; SI-DAG: s_mov_b32 [[MAGIC_LO:s[0-9]+]], 0x92492493
				; SI-NOT: v_rcp
				define void @udiv_i64_const(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
				%num = load i64, i64 addrspace(1)* %in
				%result = udiv i64 %num, 7
				store i64 %result, i64 addrspace(1)* %out
				ret void
				}

test/CodeGen/SPARC/rem.ll

	Show All 18 Lines

	define i64 @test2(i64 %X, i64 %Y) {			define i64 @test2(i64 %X, i64 %Y) {
	%tmp1 = urem i64 %X, %Y			%tmp1 = urem i64 %X, %Y
	ret i64 %tmp1			ret i64 %tmp1
	}			}

	; PR18150			; PR18150
	; CHECK-LABEL: test3			; CHECK-LABEL: test3
	; CHECK: sethi 2545, [[R0:%[gilo][0-7]]]			; CHECK: sethi 2545, %o1
	; CHECK: or [[R0]], 379, [[R1:%[gilo][0-7]]]			; CHECK-NEXT: or %o1, 379, %o1
	; CHECK: mulx %o0, [[R1]], [[R2:%[gilo][0-7]]]			; CHECK-NEXT: mulx %o0, %o1, %o0
	; CHECK: udivx [[R2]], 1021, [[R3:%[gilo][0-7]]]			; CHECK-NEXT: srl %o0, 0, %o1
	; CHECK: mulx [[R3]], 1021, [[R4:%[gilo][0-7]]]			; CHECK-NEXT: sethi 12324, %o2
	; CHECK: sub [[R2]], [[R4]], %o0			; CHECK-NEXT: or %o2, 108, %o2
				; CHECK-NEXT: mulx %o1, %o2, %o3
				; CHECK-NEXT: sethi 1331003, %o4
				; CHECK-NEXT: or %o4, 435, %o4
				; CHECK-NEXT: mulx %o1, %o4, %o1
				; CHECK-NEXT: srlx %o1, 32, %o1
				; CHECK-NEXT: add %o1, %o3, %o1
				; CHECK-NEXT: srlx %o1, 32, %o3
				; CHECK-NEXT: srlx %o0, 32, %o5
				; CHECK-NEXT: mulx %o5, %o4, %o4
				; CHECK-NEXT: srlx %o4, 32, %g2
				; CHECK-NEXT: mulx %o5, %o2, %o2
				; CHECK-NEXT: srlx %o2, 32, %o5
				; CHECK-NEXT: addcc %o1, %o4, %o1
				; CHECK-NEXT: addxcc %o3, %g2, %o1
				; CHECK-NEXT: addxcc %o5, 0, %o3
				; CHECK-NEXT: sllx %o3, 32, %o3
				; CHECK-NEXT: srl %o2, 0, %o2
				; CHECK-NEXT: or %o2, %o3, %o2
				; CHECK-NEXT: srl %o1, 0, %o1
				; CHECK-NEXT: add %o1, %o2, %o1
				; CHECK-NEXT: sub %o0, %o1, %o2
				; CHECK-NEXT: srlx %o2, 1, %o2
				; CHECK-NEXT: add %o2, %o1, %o1
				; CHECK-NEXT: srlx %o1, 9, %o1
				; CHECK-NEXT: mulx %o1, 1021, %o1
				; CHECK-NEXT: retl
				; CHECK-NEXT: sub %o0, %o1, %o0

	define i64 @test3(i64 %b) {			define i64 @test3(i64 %b) {
	entry:			entry:
				efriedmaUnsubmitted Not Done Reply Inline Actions This is generating 8 multiply instructions; something is going wrong in your algorithm. (It should only take four multiply instructions to perform a double-width multiply.) efriedma: This is generating 8 multiply instructions; something is going wrong in your algorithm. (It…
	%mul = mul i64 %b, 2606459			%mul = mul i64 %b, 2606459
	%rem = urem i64 %mul, 1021			%rem = urem i64 %mul, 1021
	ret i64 %rem			ret i64 %rem
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Enable division-by-constant optimization for wide typesAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 79195

include/llvm/Target/TargetLowering.h

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/CodeGen/SelectionDAG/TargetLowering.cpp

lib/Target/AArch64/AArch64ISelLowering.h

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AMDGPU/SOPInstructions.td

lib/Target/BPF/BPFISelLowering.h

lib/Target/BPF/BPFISelLowering.cpp

lib/Target/WebAssembly/WebAssemblyISelLowering.h

lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

lib/Target/X86/X86ISelLowering.h

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/AMDGPU/sdiv.ll

test/CodeGen/AMDGPU/udiv.ll

test/CodeGen/SPARC/rem.ll

[SelectionDAG] Enable division-by-constant optimization for wide types
AcceptedPublic