This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
4/10
TargetLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
RISCV/
-
split-udiv-by-constant.ll
-
split-urem-by-constant.ll
-
X86/
-
divide-by-constant.ll
-
divmod128.ll

Differential D135541

[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant.
ClosedPublic

Authored by craig.topper on Oct 9 2022, 11:15 AM.

Download Raw Diff

Details

Reviewers

RKSimon
efriedma
nickdesaulniers

Commits

rG1fa8fd4c33cb: Recommit "[TargetLowering][RISCV][X86] Support even divisors in…
rGf6a7b4782090: [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant.
rGd4facda414b6: [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant.

Summary

If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.Oct 9 2022, 11:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 9 2022, 11:15 AM

Herald added subscribers: sunshaoce, VincentWu, StephenFan and 30 others. · View Herald Transcript

craig.topper requested review of this revision.Oct 9 2022, 11:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 9 2022, 11:15 AM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B191177: Diff 466383.Oct 9 2022, 11:56 AM

RKSimon added inline comments.Oct 10 2022, 5:21 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7261	Would we benefit at all from creating a ISD::FSHL node here?

craig.topper added inline comments.Oct 10 2022, 9:15 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7261	We don't use FSHL/FSHR in ExpandShiftByConstant so I think we should be ok. Looks like DAGCombiner is matching it to FSHL for X86.

LGTM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7175	Might be helpful to mention the even->odd case here as well?
7261	OK, I have a vague memory of trying to get the legalizers to use funnel-shifts for those cases in the past - I can't remember why though!

This revision is now accepted and ready to land.Oct 10 2022, 10:25 AM

This revision was landed with ongoing or failed builds.Oct 10 2022, 11:02 AM

Closed by commit rGd4facda414b6: [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant. (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rGd4facda414b6: [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant..

FYI, I'm seeing some failures from this internally in some tests that do str <-> floating point conversion. I'll try to reduce a test case now; I'm not sure yet if this patch is doing something wrong.

craig.topper added a reverting change: rG0121b1a4ac8d: Revert "[TargetLowering][RISCV][X86] Support even divisors in….Oct 10 2022, 2:53 PM

craig.topper reopened this revision.Oct 10 2022, 2:59 PM

This revision is now accepted and ready to land.Oct 10 2022, 2:59 PM

craig.topper planned changes to this revision.Oct 10 2022, 2:59 PM

This comment has been deleted.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303	Should this not be using the original divisor (whether by shifting back by TrailingZeros or making a copy you use elsewhere instead of doing an lshrInPlace)?

craig.topper added inline comments.Oct 10 2022, 3:03 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303	Yes! Thank you, that's probably the bug.

craig.topper added inline comments.Oct 10 2022, 3:35 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303	Well I think that just exposed worse problems in the algorithm. The multiplicative inverse doesn't exist if Divisor is even. It needs to be coprime with 1<<BitWidth. But they will both have 2 as a common factor.

efriedma added inline comments.Oct 10 2022, 4:18 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303	After the subtraction, you have something like "udiv exact X, Y". So you can use the same algorithm as BuildExactSDIV (i.e. shift right X and Y, then multiply X by the inverse of Y).

In D135541#3847905, @rupprecht wrote:

FYI, I'm seeing some failures from this internally in some tests that do str <-> floating point conversion. I'll try to reduce a test case now; I'm not sure yet if this patch is doing something wrong.

Small-ish repro which uses absl:

#include <iostream>
#include "absl/strings/str_format.h"

int main() {
  std::cout << absl::StrFormat("%.1g", 1e+15) << "\n";
  std::cout << absl::StrFormat("%.1g", 1e+20) << "\n";
  std::cout << absl::StrFormat("%.1g", 1e+25) << "\n";
  std::cout << absl::StrFormat("%.1g", 1e+30) << "\n";
  std::cout << absl::StrFormat("%.1g", 1e+35) << "\n";
}

Before, it prints them as-is. After, it prints just the first three and then crashes:

1e+15
3e+28
3e+35
assert.h assertion failed at absl/strings/internal/str_format/float_conversion.cc:1006 in void absl::str_format_internal::(anonymous namespace)::Buffer::push_front(char): begin > data

I made a godbolt link, but trunk there hasn't caught up to this commit yet: https://godbolt.org/z/zbM1McdzM

craig.topper added inline comments.Oct 10 2022, 8:07 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303	I already shifted the dividend and the divisor to get the remainder. I think I should subtract the uncorrected remainder from the shifted dividend and do the multiplicative inverse to calculate the quotient on that. Then correct the remainder with the part shifted off earlier.

efriedma added inline comments.Oct 10 2022, 9:13 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7302–7303	Yes, that's equivalent.

Change how quotient is calculated. Still need to clean up some comments and do some testing.

This revision is now accepted and ready to land.Oct 10 2022, 9:32 PM

I patched this in, and my failing repros from before are now passing, so LGTM from me. Thanks!

Harbormaster completed remote builds in B191421: Diff 466695.Oct 10 2022, 10:14 PM

craig.topper planned changes to this revision.Oct 11 2022, 4:04 PM

This revision was not accepted when it landed; it landed in state Changes Planned.Oct 22 2022, 11:47 PM

Closed by commit rGf6a7b4782090: [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant. (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rGf6a7b4782090: [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant..

This caused miscompilations for me on 32 bit architectures (i386 and arm - while it didn't seem to trigger any such issues on 64 bit). I don't know offhand what/where the issue is yet though - does that aspect ring a bell for anyone, or do I need to dig further to see what's going on? (I guess it'd be best to revert this for now?)

craig.topper added a reverting change: rG65aaecca8842: Revert "[TargetLowering][RISCV][X86] Support even divisors in….Oct 24 2022, 7:13 AM

Thanks for the revert!

I didn’t have time to narrow it down further yet, but it can be reproduced with these steps:

git clone git://source.ffmpeg.org/ffmpeg
cd ffmpeg
./configure --cc=“clang -target …” --samples=$(pwd)/../samples
make -j$(nproc)
make fate-rsync
make fate-sub-lrc-remux

I don’t know yet which object files contain the breakage here.

craig.topper added a commit: rG1fa8fd4c33cb: Recommit "[TargetLowering][RISCV][X86] Support even divisors in….Oct 24 2022, 10:09 AM

In D135541#3879320, @mstorsjo wrote:
Thanks for the revert!

I didn’t have time to narrow it down further yet, but it can be reproduced with these steps:
git clone git://source.ffmpeg.org/ffmpeg
cd ffmpeg
./configure --cc=“clang -target …” --samples=$(pwd)/../samples
make -j$(nproc)
make fate-rsync
make fate-sub-lrc-remux
I don’t know yet which object files contain the breakage here.

For this particular testcase, the changed behaviour is in libavformat/lrcenc.o.

The changed behaviour can be seen in https://martin.st/temp/lrcenc-preproc.c, compiled with clang -target i686-w64-mingw32 -c lrcenc-preproc.c -O2. (I haven't traced the code to see exactly what inputs makes what difference in behaviour though.)

In D135541#3880648, @mstorsjo wrote:
In D135541#3879320, @mstorsjo wrote:
Thanks for the revert!

I didn’t have time to narrow it down further yet, but it can be reproduced with these steps:
git clone git://source.ffmpeg.org/ffmpeg
cd ffmpeg
./configure --cc=“clang -target …” --samples=$(pwd)/../samples
make -j$(nproc)
make fate-rsync
make fate-sub-lrc-remux
I don’t know yet which object files contain the breakage here.
For this particular testcase, the changed behaviour is in libavformat/lrcenc.o.

The changed behaviour can be seen in https://martin.st/temp/lrcenc-preproc.c, compiled with clang -target i686-w64-mingw32 -c lrcenc-preproc.c -O2. (I haven't traced the code to see exactly what inputs makes what difference in behaviour though.)

Thanks. I found one bug in how the remainder was calculated. I inadvertently copied the LSBs from after a shift instead of from before it. I recommited that earlier today. Hopefully it hasn't caused any additional failures?

In D135541#3881050, @craig.topper wrote:

Thanks. I found one bug in how the remainder was calculated. I inadvertently copied the LSBs from after a shift instead of from before it. I recommited that earlier today. Hopefully it hasn't caused any additional failures?

No, this time around it seems to have run successfully everywhere in my tests. Thanks!

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

120 lines

test/

CodeGen/

RISCV/

split-udiv-by-constant.ll

63 lines

split-urem-by-constant.ll

50 lines

X86/

divide-by-constant.ll

58 lines

divmod128.ll

102 lines

Diff 469965

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,162 Lines • ▼ Show 20 Lines
//		//
// If (1 << (BitWidth / 2)) % Constant == 1, then the remainder		// If (1 << (BitWidth / 2)) % Constant == 1, then the remainder
// can be computed		// can be computed
// as:		// as:
// Sum += __builtin_uadd_overflow(Lo, High, &Sum);		// Sum += __builtin_uadd_overflow(Lo, High, &Sum);
// Remainder = Sum % Constant		// Remainder = Sum % Constant
// This is based on "Remainder by Summing Digits" from Hacker's Delight.		// This is based on "Remainder by Summing Digits" from Hacker's Delight.
//		//
// For division, we can compute the remainder, subtract it from the dividend,		// For division, we can compute the remainder using the algorithm described
// and then multiply by the multiplicative inverse modulo (1 << (BitWidth / 2)).		// above, subtract it from the dividend to get an exact multiple of Constant.
		// Then multiply that extact multiply by the multiplicative inverse modulo
		// (1 << (BitWidth / 2)) to get the quotient.

		RKSimonUnsubmitted Not Done Reply Inline Actions Might be helpful to mention the even->odd case here as well? RKSimon: Might be helpful to mention the even->odd case here as well?
		// If Constant is even, we can shift right the dividend and the divisor by the
		// number of trailing zeros in Constant before applying the remainder algorithm.
		// If we're after the quotient, we can subtract this value from the shifted
		// dividend and multiply by the multiplicative inverse of the shifted divisor.
		// If we want the remainder, we shift the value left by the number of trailing
		// zeros and add the bits that were shifted out of the dividend.
bool TargetLowering::expandDIVREMByConstant(SDNode *N,		bool TargetLowering::expandDIVREMByConstant(SDNode *N,
SmallVectorImpl<SDValue> &Result,		SmallVectorImpl<SDValue> &Result,
EVT HiLoVT, SelectionDAG &DAG,		EVT HiLoVT, SelectionDAG &DAG,
SDValue LL, SDValue LH) const {		SDValue LL, SDValue LH) const {
unsigned Opcode = N->getOpcode();		unsigned Opcode = N->getOpcode();
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

// TODO: Support signed division/remainder.		// TODO: Support signed division/remainder.
if (Opcode == ISD::SREM \|\| Opcode == ISD::SDIV \|\| Opcode == ISD::SDIVREM)		if (Opcode == ISD::SREM \|\| Opcode == ISD::SDIV \|\| Opcode == ISD::SDIVREM)
return false;		return false;
assert(		assert(
(Opcode == ISD::UREM \|\| Opcode == ISD::UDIV \|\| Opcode == ISD::UDIVREM) &&		(Opcode == ISD::UREM \|\| Opcode == ISD::UDIV \|\| Opcode == ISD::UDIVREM) &&
"Unexpected opcode");		"Unexpected opcode");

auto *CN = dyn_cast<ConstantSDNode>(N->getOperand(1));		auto *CN = dyn_cast<ConstantSDNode>(N->getOperand(1));
if (!CN)		if (!CN)
return false;		return false;

const APInt &Divisor = CN->getAPIntValue();		APInt Divisor = CN->getAPIntValue();
unsigned BitWidth = Divisor.getBitWidth();		unsigned BitWidth = Divisor.getBitWidth();
unsigned HBitWidth = BitWidth / 2;		unsigned HBitWidth = BitWidth / 2;
assert(VT.getScalarSizeInBits() == BitWidth &&		assert(VT.getScalarSizeInBits() == BitWidth &&
HiLoVT.getScalarSizeInBits() == HBitWidth && "Unexpected VTs");		HiLoVT.getScalarSizeInBits() == HBitWidth && "Unexpected VTs");

// Divisor needs to less than (1 << HBitWidth).		// Divisor needs to less than (1 << HBitWidth).
APInt HalfMaxPlus1 = APInt::getOneBitSet(BitWidth, HBitWidth);		APInt HalfMaxPlus1 = APInt::getOneBitSet(BitWidth, HBitWidth);
if (Divisor.uge(HalfMaxPlus1))		if (Divisor.uge(HalfMaxPlus1))
return false;		return false;

// We depend on the UREM by constant optimization in DAGCombiner that requires		// We depend on the UREM by constant optimization in DAGCombiner that requires
// high multiply.		// high multiply.
if (!isOperationLegalOrCustom(ISD::MULHU, HiLoVT) &&		if (!isOperationLegalOrCustom(ISD::MULHU, HiLoVT) &&
!isOperationLegalOrCustom(ISD::UMUL_LOHI, HiLoVT))		!isOperationLegalOrCustom(ISD::UMUL_LOHI, HiLoVT))
return false;		return false;

// Don't expand if optimizing for size.		// Don't expand if optimizing for size.
if (DAG.shouldOptForSize())		if (DAG.shouldOptForSize())
return false;		return false;

// Early out for 0, 1 or even divisors.		// Early out for 0 or 1 divisors.
if (Divisor.ule(1) \|\| Divisor[0] == 0)		if (Divisor.ule(1))
return false;		return false;

		// If the divisor is even, shift it until it becomes odd.
		unsigned TrailingZeros = 0;
		if (!Divisor[0]) {
		TrailingZeros = Divisor.countTrailingZeros();
		Divisor.lshrInPlace(TrailingZeros);
		}

SDLoc dl(N);		SDLoc dl(N);
SDValue Sum;		SDValue Sum;
		SDValue PartialRem;

// If (1 << HBitWidth) % divisor == 1, we can add the two halves together and		// If (1 << HBitWidth) % divisor == 1, we can add the two halves together and
// then add in the carry.		// then add in the carry.
// TODO: If we can't split it in half, we might be able to split into 3 or		// TODO: If we can't split it in half, we might be able to split into 3 or
// more pieces using a smaller bit width.		// more pieces using a smaller bit width.
if (HalfMaxPlus1.urem(Divisor).isOneValue()) {		if (HalfMaxPlus1.urem(Divisor).isOneValue()) {
assert(!LL == !LH && "Expected both input halves or no input halves!");		assert(!LL == !LH && "Expected both input halves or no input halves!");
if (!LL) {		if (!LL) {
LL = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiLoVT, N->getOperand(0),		LL = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiLoVT, N->getOperand(0),
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));
LH = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiLoVT, N->getOperand(0),		LH = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiLoVT, N->getOperand(0),
DAG.getIntPtrConstant(1, dl));		DAG.getIntPtrConstant(1, dl));
}		}

		// Shift the input by the number of TrailingZeros in the divisor. The
		// shifted out bits will be added to the remainder later.
		if (TrailingZeros) {
		LL = DAG.getNode(
		ISD::OR, dl, HiLoVT,
		DAG.getNode(ISD::SRL, dl, HiLoVT, LL,
		DAG.getShiftAmountConstant(TrailingZeros, HiLoVT, dl)),
		DAG.getNode(ISD::SHL, dl, HiLoVT, LH,
		DAG.getShiftAmountConstant(HBitWidth - TrailingZeros,
		HiLoVT, dl)));
		LH = DAG.getNode(ISD::SRL, dl, HiLoVT, LH,
		DAG.getShiftAmountConstant(TrailingZeros, HiLoVT, dl));

		RKSimonUnsubmitted Not Done Reply Inline Actions Would we benefit at all from creating a ISD::FSHL node here? RKSimon: Would we benefit at all from creating a ISD::FSHL node here?
		craig.topperAuthorUnsubmitted Done Reply Inline Actions We don't use FSHL/FSHR in ExpandShiftByConstant so I think we should be ok. Looks like DAGCombiner is matching it to FSHL for X86. craig.topper: We don't use FSHL/FSHR in ExpandShiftByConstant so I think we should be ok. Looks like…
		RKSimonUnsubmitted Not Done Reply Inline Actions OK, I have a vague memory of trying to get the legalizers to use funnel-shifts for those cases in the past - I can't remember why though! RKSimon: OK, I have a vague memory of trying to get the legalizers to use funnel-shifts for those cases…
		// Save the shifted off bits if we need the remainder.
		if (Opcode != ISD::UDIV) {
		APInt Mask = APInt::getLowBitsSet(HBitWidth, TrailingZeros);
		PartialRem = DAG.getNode(ISD::AND, dl, HiLoVT, LL,
		DAG.getConstant(Mask, dl, HiLoVT));
		}
		}

// Use addcarry if we can, otherwise use a compare to detect overflow.		// Use addcarry if we can, otherwise use a compare to detect overflow.
EVT SetCCType =		EVT SetCCType =
getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), HiLoVT);		getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), HiLoVT);
if (isOperationLegalOrCustom(ISD::ADDCARRY, HiLoVT)) {		if (isOperationLegalOrCustom(ISD::ADDCARRY, HiLoVT)) {
SDVTList VTList = DAG.getVTList(HiLoVT, SetCCType);		SDVTList VTList = DAG.getVTList(HiLoVT, SetCCType);
Sum = DAG.getNode(ISD::UADDO, dl, VTList, LL, LH);		Sum = DAG.getNode(ISD::UADDO, dl, VTList, LL, LH);
Sum = DAG.getNode(ISD::ADDCARRY, dl, VTList, Sum,		Sum = DAG.getNode(ISD::ADDCARRY, dl, VTList, Sum,
DAG.getConstant(0, dl, HiLoVT), Sum.getValue(1));		DAG.getConstant(0, dl, HiLoVT), Sum.getValue(1));
Show All 15 Lines	bool TargetLowering::expandDIVREMByConstant(SDNode *N,
// If we didn't find a sum, we can't do the expansion.		// If we didn't find a sum, we can't do the expansion.
if (!Sum)		if (!Sum)
return false;		return false;

// Perform a HiLoVT urem on the Sum using truncated divisor.		// Perform a HiLoVT urem on the Sum using truncated divisor.
SDValue RemL =		SDValue RemL =
DAG.getNode(ISD::UREM, dl, HiLoVT, Sum,		DAG.getNode(ISD::UREM, dl, HiLoVT, Sum,
DAG.getConstant(Divisor.trunc(HBitWidth), dl, HiLoVT));		DAG.getConstant(Divisor.trunc(HBitWidth), dl, HiLoVT));
// High half of the remainder is 0.
SDValue RemH = DAG.getConstant(0, dl, HiLoVT);		SDValue RemH = DAG.getConstant(0, dl, HiLoVT);

// If we only want remainder, we're done.		if (Opcode != ISD::UREM) {
		jrtc27Unsubmitted Not Done Reply Inline Actions Should this not be using the original divisor (whether by shifting back by TrailingZeros or making a copy you use elsewhere instead of doing an lshrInPlace)? jrtc27: Should this not be using the original divisor (whether by shifting back by TrailingZeros or…
		craig.topperAuthorUnsubmitted Done Reply Inline Actions Yes! Thank you, that's probably the bug. craig.topper: Yes! Thank you, that's probably the bug.
		craig.topperAuthorUnsubmitted Done Reply Inline Actions Well I think that just exposed worse problems in the algorithm. The multiplicative inverse doesn't exist if Divisor is even. It needs to be coprime with 1<<BitWidth. But they will both have 2 as a common factor. craig.topper: Well I think that just exposed worse problems in the algorithm. The multiplicative inverse…
		efriedmaUnsubmitted Not Done Reply Inline Actions After the subtraction, you have something like "udiv exact X, Y". So you can use the same algorithm as BuildExactSDIV (i.e. shift right X and Y, then multiply X by the inverse of Y). efriedma: After the subtraction, you have something like "udiv exact X, Y". So you can use the same…
		craig.topperAuthorUnsubmitted Done Reply Inline Actions I already shifted the dividend and the divisor to get the remainder. I think I should subtract the uncorrected remainder from the shifted dividend and do the multiplicative inverse to calculate the quotient on that. Then correct the remainder with the part shifted off earlier. craig.topper: I already shifted the dividend and the divisor to get the remainder. I think I should subtract…
		efriedmaUnsubmitted Not Done Reply Inline Actions Yes, that's equivalent. efriedma: Yes, that's equivalent.
if (Opcode == ISD::UREM) {		// Subtract the remainder from the shifted dividend.
Result.push_back(RemL);		SDValue Dividend = DAG.getNode(ISD::BUILD_PAIR, dl, VT, LL, LH);
Result.push_back(RemH);
return true;
}

// Otherwise, we need to compute the quotient.

// Join the remainder halves.
SDValue Rem = DAG.getNode(ISD::BUILD_PAIR, dl, VT, RemL, RemH);		SDValue Rem = DAG.getNode(ISD::BUILD_PAIR, dl, VT, RemL, RemH);

// Subtract the remainder from the input.		Dividend = DAG.getNode(ISD::SUB, dl, VT, Dividend, Rem);
SDValue In = DAG.getNode(ISD::SUB, dl, VT, N->getOperand(0), Rem);

// Multiply by the multiplicative inverse of the divisor modulo		// Multiply by the multiplicative inverse of the divisor modulo
// (1 << BitWidth).		// (1 << BitWidth).
APInt Mod = APInt::getSignedMinValue(BitWidth + 1);		APInt Mod = APInt::getSignedMinValue(BitWidth + 1);
APInt MulFactor = Divisor.zext(BitWidth + 1);		APInt MulFactor = Divisor.zext(BitWidth + 1);
MulFactor = MulFactor.multiplicativeInverse(Mod);		MulFactor = MulFactor.multiplicativeInverse(Mod);
MulFactor = MulFactor.trunc(BitWidth);		MulFactor = MulFactor.trunc(BitWidth);

SDValue Quotient =		SDValue Quotient = DAG.getNode(ISD::MUL, dl, VT, Dividend,
DAG.getNode(ISD::MUL, dl, VT, In, DAG.getConstant(MulFactor, dl, VT));		DAG.getConstant(MulFactor, dl, VT));

// Split the quotient into low and high parts.		// Split the quotient into low and high parts.
SDValue QuotL = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiLoVT, Quotient,		SDValue QuotL = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiLoVT, Quotient,
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));
SDValue QuotH = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiLoVT, Quotient,		SDValue QuotH = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, HiLoVT, Quotient,
DAG.getIntPtrConstant(1, dl));		DAG.getIntPtrConstant(1, dl));
Result.push_back(QuotL);		Result.push_back(QuotL);
Result.push_back(QuotH);		Result.push_back(QuotH);
// For DIVREM, also return the remainder parts.		}
if (Opcode == ISD::UDIVREM) {
		if (Opcode != ISD::UDIV) {
		// If we shifted the input, shift the remainder left and add the bits we
		// shifted off the input.
		if (TrailingZeros) {
		APInt Mask = APInt::getLowBitsSet(HBitWidth, TrailingZeros);
		RemL = DAG.getNode(ISD::SHL, dl, HiLoVT, RemL,
		DAG.getShiftAmountConstant(TrailingZeros, HiLoVT, dl));
		RemL = DAG.getNode(ISD::ADD, dl, HiLoVT, RemL, PartialRem);
		}
Result.push_back(RemL);		Result.push_back(RemL);
Result.push_back(RemH);		Result.push_back(DAG.getConstant(0, dl, HiLoVT));
}		}

return true;		return true;
}		}

// Check that (every element of) Z is undef or not an exact multiple of BW.		// Check that (every element of) Z is undef or not an exact multiple of BW.
static bool isNonZeroModBitWidthOrUndef(SDValue Z, unsigned BW) {		static bool isNonZeroModBitWidthOrUndef(SDValue Z, unsigned BW) {
return ISD::matchUnaryPredicate(		return ISD::matchUnaryPredicate(
▲ Show 20 Lines • Show All 2,778 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/split-udiv-by-constant.ll

	Show First 20 Lines • Show All 496 Lines • ▼ Show 20 Lines
	; RV64-NEXT: ret			; RV64-NEXT: ret
	%a = udiv iXLen2 %x, 65537			%a = udiv iXLen2 %x, 65537
	ret iXLen2 %a			ret iXLen2 %a
	}			}

	define iXLen2 @test_udiv_12(iXLen2 %x) nounwind {			define iXLen2 @test_udiv_12(iXLen2 %x) nounwind {
	; RV32-LABEL: test_udiv_12:			; RV32-LABEL: test_udiv_12:
	; RV32: # %bb.0:			; RV32: # %bb.0:
	; RV32-NEXT: addi sp, sp, -16			; RV32-NEXT: slli a2, a1, 30
	; RV32-NEXT: sw ra, 12(sp) # 4-byte Folded Spill			; RV32-NEXT: srli a0, a0, 2
	; RV32-NEXT: li a2, 12			; RV32-NEXT: or a0, a0, a2
	; RV32-NEXT: li a3, 0			; RV32-NEXT: srli a1, a1, 2
	; RV32-NEXT: call __udivdi3@plt			; RV32-NEXT: add a2, a0, a1
	; RV32-NEXT: lw ra, 12(sp) # 4-byte Folded Reload			; RV32-NEXT: sltu a3, a2, a0
	; RV32-NEXT: addi sp, sp, 16			; RV32-NEXT: add a2, a2, a3
				; RV32-NEXT: lui a3, 699051
				; RV32-NEXT: addi a4, a3, -1365
				; RV32-NEXT: mulhu a5, a2, a4
				; RV32-NEXT: srli a6, a5, 1
				; RV32-NEXT: andi a5, a5, -2
				; RV32-NEXT: add a5, a5, a6
				; RV32-NEXT: sub a2, a2, a5
				; RV32-NEXT: sub a5, a0, a2
				; RV32-NEXT: addi a3, a3, -1366
				; RV32-NEXT: mul a3, a5, a3
				; RV32-NEXT: mulhu a6, a5, a4
				; RV32-NEXT: add a3, a6, a3
				; RV32-NEXT: sltu a0, a0, a2
				; RV32-NEXT: sub a0, a1, a0
				; RV32-NEXT: mul a0, a0, a4
				; RV32-NEXT: add a1, a3, a0
				; RV32-NEXT: mul a0, a5, a4
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: test_udiv_12:			; RV64-LABEL: test_udiv_12:
	; RV64: # %bb.0:			; RV64: # %bb.0:
	; RV64-NEXT: addi sp, sp, -16			; RV64-NEXT: slli a2, a1, 62
	; RV64-NEXT: sd ra, 8(sp) # 8-byte Folded Spill			; RV64-NEXT: srli a0, a0, 2
	; RV64-NEXT: li a2, 12			; RV64-NEXT: or a0, a0, a2
	; RV64-NEXT: li a3, 0			; RV64-NEXT: srli a1, a1, 2
	; RV64-NEXT: call __udivti3@plt			; RV64-NEXT: lui a2, %hi(.LCPI10_0)
	; RV64-NEXT: ld ra, 8(sp) # 8-byte Folded Reload			; RV64-NEXT: ld a2, %lo(.LCPI10_0)(a2)
	; RV64-NEXT: addi sp, sp, 16			; RV64-NEXT: add a3, a0, a1
				; RV64-NEXT: sltu a4, a3, a0
				; RV64-NEXT: add a3, a3, a4
				; RV64-NEXT: mulhu a4, a3, a2
				; RV64-NEXT: srli a5, a4, 1
				; RV64-NEXT: andi a4, a4, -2
				; RV64-NEXT: lui a6, %hi(.LCPI10_1)
				; RV64-NEXT: ld a6, %lo(.LCPI10_1)(a6)
				; RV64-NEXT: add a4, a4, a5
				; RV64-NEXT: sub a3, a3, a4
				; RV64-NEXT: sub a4, a0, a3
				; RV64-NEXT: mul a5, a4, a6
				; RV64-NEXT: mulhu a6, a4, a2
				; RV64-NEXT: add a5, a6, a5
				; RV64-NEXT: sltu a0, a0, a3
				; RV64-NEXT: sub a0, a1, a0
				; RV64-NEXT: mul a0, a0, a2
				; RV64-NEXT: add a1, a5, a0
				; RV64-NEXT: mul a0, a4, a2
	; RV64-NEXT: ret			; RV64-NEXT: ret
	%a = udiv iXLen2 %x, 12			%a = udiv iXLen2 %x, 12
	ret iXLen2 %a			ret iXLen2 %a
	}			}

llvm/test/CodeGen/RISCV/split-urem-by-constant.ll

	Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines
	; RV64-NEXT: ret			; RV64-NEXT: ret
	%a = urem iXLen2 %x, 65537			%a = urem iXLen2 %x, 65537
	ret iXLen2 %a			ret iXLen2 %a
	}			}

	define iXLen2 @test_urem_12(iXLen2 %x) nounwind {			define iXLen2 @test_urem_12(iXLen2 %x) nounwind {
	; RV32-LABEL: test_urem_12:			; RV32-LABEL: test_urem_12:
	; RV32: # %bb.0:			; RV32: # %bb.0:
	; RV32-NEXT: addi sp, sp, -16			; RV32-NEXT: slli a2, a1, 30
	; RV32-NEXT: sw ra, 12(sp) # 4-byte Folded Spill			; RV32-NEXT: srli a0, a0, 2
	; RV32-NEXT: li a2, 12			; RV32-NEXT: or a0, a0, a2
	; RV32-NEXT: li a3, 0			; RV32-NEXT: srli a1, a1, 2
	; RV32-NEXT: call __umoddi3@plt			; RV32-NEXT: add a1, a0, a1
	; RV32-NEXT: lw ra, 12(sp) # 4-byte Folded Reload			; RV32-NEXT: sltu a2, a1, a0
	; RV32-NEXT: addi sp, sp, 16			; RV32-NEXT: add a1, a1, a2
				; RV32-NEXT: lui a2, 699051
				; RV32-NEXT: addi a2, a2, -1365
				; RV32-NEXT: mulhu a2, a1, a2
				; RV32-NEXT: srli a3, a2, 1
				; RV32-NEXT: andi a2, a2, -2
				; RV32-NEXT: add a2, a2, a3
				; RV32-NEXT: sub a1, a1, a2
				; RV32-NEXT: slli a1, a1, 2
				; RV32-NEXT: andi a0, a0, 3
				; RV32-NEXT: or a0, a1, a0
				; RV32-NEXT: li a1, 0
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: test_urem_12:			; RV64-LABEL: test_urem_12:
	; RV64: # %bb.0:			; RV64: # %bb.0:
	; RV64-NEXT: addi sp, sp, -16			; RV64-NEXT: slli a2, a1, 62
	; RV64-NEXT: sd ra, 8(sp) # 8-byte Folded Spill			; RV64-NEXT: srli a0, a0, 2
	; RV64-NEXT: li a2, 12			; RV64-NEXT: or a0, a0, a2
	; RV64-NEXT: li a3, 0			; RV64-NEXT: srli a1, a1, 2
	; RV64-NEXT: call __umodti3@plt			; RV64-NEXT: lui a2, %hi(.LCPI10_0)
	; RV64-NEXT: ld ra, 8(sp) # 8-byte Folded Reload			; RV64-NEXT: ld a2, %lo(.LCPI10_0)(a2)
	; RV64-NEXT: addi sp, sp, 16			; RV64-NEXT: add a1, a0, a1
				; RV64-NEXT: sltu a3, a1, a0
				; RV64-NEXT: add a1, a1, a3
				; RV64-NEXT: mulhu a2, a1, a2
				; RV64-NEXT: srli a3, a2, 1
				; RV64-NEXT: andi a2, a2, -2
				; RV64-NEXT: add a2, a2, a3
				; RV64-NEXT: sub a1, a1, a2
				; RV64-NEXT: slli a1, a1, 2
				; RV64-NEXT: andi a0, a0, 3
				; RV64-NEXT: or a0, a1, a0
				; RV64-NEXT: li a1, 0
	; RV64-NEXT: ret			; RV64-NEXT: ret
	%a = urem iXLen2 %x, 12			%a = urem iXLen2 %x, 12
	ret iXLen2 %a			ret iXLen2 %a
	}			}

llvm/test/CodeGen/X86/divide-by-constant.ll

	Show First 20 Lines • Show All 729 Lines • ▼ Show 20 Lines
	entry:			entry:
	%rem = urem i64 %x, 65537			%rem = urem i64 %x, 65537
	ret i64 %rem			ret i64 %rem
	}			}

	define i64 @urem_i64_12(i64 %x) nounwind {			define i64 @urem_i64_12(i64 %x) nounwind {
	; X32-LABEL: urem_i64_12:			; X32-LABEL: urem_i64_12:
	; X32: # %bb.0: # %entry			; X32: # %bb.0: # %entry
	; X32-NEXT: subl $12, %esp			; X32-NEXT: pushl %esi
	; X32-NEXT: pushl $0			; X32-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X32-NEXT: pushl $12			; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: pushl {{[0-9]+}}(%esp)			; X32-NEXT: shrdl $2, %ecx, %esi
	; X32-NEXT: pushl {{[0-9]+}}(%esp)			; X32-NEXT: shrl $2, %ecx
	; X32-NEXT: calll __umoddi3			; X32-NEXT: addl %esi, %ecx
	; X32-NEXT: addl $28, %esp			; X32-NEXT: adcl $0, %ecx
				; X32-NEXT: movl $-1431655765, %edx # imm = 0xAAAAAAAB
				; X32-NEXT: movl %ecx, %eax
				; X32-NEXT: mull %edx
				; X32-NEXT: shrl %edx
				; X32-NEXT: leal (%edx,%edx,2), %eax
				; X32-NEXT: subl %eax, %ecx
				; X32-NEXT: andl $3, %esi
				; X32-NEXT: leal (%esi,%ecx,4), %eax
				; X32-NEXT: xorl %edx, %edx
				; X32-NEXT: popl %esi
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: urem_i64_12:			; X64-LABEL: urem_i64_12:
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movabsq $-6148914691236517205, %rcx # imm = 0xAAAAAAAAAAAAAAAB			; X64-NEXT: movabsq $-6148914691236517205, %rcx # imm = 0xAAAAAAAAAAAAAAAB
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: shrq %rdx			; X64-NEXT: shrq %rdx
	▲ Show 20 Lines • Show All 358 Lines • ▼ Show 20 Lines
	entry:			entry:
	%rem = udiv i64 %x, 65537			%rem = udiv i64 %x, 65537
	ret i64 %rem			ret i64 %rem
	}			}

	define i64 @udiv_i64_12(i64 %x) nounwind {			define i64 @udiv_i64_12(i64 %x) nounwind {
	; X32-LABEL: udiv_i64_12:			; X32-LABEL: udiv_i64_12:
	; X32: # %bb.0: # %entry			; X32: # %bb.0: # %entry
	; X32-NEXT: subl $12, %esp			; X32-NEXT: pushl %ebx
	; X32-NEXT: pushl $0			; X32-NEXT: pushl %edi
	; X32-NEXT: pushl $12			; X32-NEXT: pushl %esi
	; X32-NEXT: pushl {{[0-9]+}}(%esp)			; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: pushl {{[0-9]+}}(%esp)			; X32-NEXT: movl {{[0-9]+}}(%esp), %edi
	; X32-NEXT: calll __udivdi3			; X32-NEXT: shrdl $2, %edi, %ecx
	; X32-NEXT: addl $28, %esp			; X32-NEXT: shrl $2, %edi
				; X32-NEXT: movl %ecx, %esi
				; X32-NEXT: addl %edi, %esi
				; X32-NEXT: adcl $0, %esi
				; X32-NEXT: movl $-1431655765, %ebx # imm = 0xAAAAAAAB
				; X32-NEXT: movl %esi, %eax
				; X32-NEXT: mull %ebx
				; X32-NEXT: shrl %edx
				; X32-NEXT: leal (%edx,%edx,2), %eax
				; X32-NEXT: subl %eax, %esi
				; X32-NEXT: subl %esi, %ecx
				; X32-NEXT: sbbl $0, %edi
				; X32-NEXT: movl %ecx, %eax
				; X32-NEXT: mull %ebx
				; X32-NEXT: imull $-1431655766, %ecx, %ecx # imm = 0xAAAAAAAA
				; X32-NEXT: addl %ecx, %edx
				; X32-NEXT: imull $-1431655765, %edi, %ecx # imm = 0xAAAAAAAB
				; X32-NEXT: addl %ecx, %edx
				; X32-NEXT: popl %esi
				; X32-NEXT: popl %edi
				; X32-NEXT: popl %ebx
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: udiv_i64_12:			; X64-LABEL: udiv_i64_12:
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movabsq $-6148914691236517205, %rcx # imm = 0xAAAAAAAAAAAAAAAB			; X64-NEXT: movabsq $-6148914691236517205, %rcx # imm = 0xAAAAAAAAAAAAAAAB
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rdx, %rax			; X64-NEXT: movq %rdx, %rax
	Show All 35 Lines

llvm/test/CodeGen/X86/divmod128.ll

	Show First 20 Lines • Show All 419 Lines • ▼ Show 20 Lines
	entry:			entry:
	%rem = urem i128 %x, 65537			%rem = urem i128 %x, 65537
	ret i128 %rem			ret i128 %rem
	}			}

	define i128 @urem_i128_12(i128 %x) nounwind {			define i128 @urem_i128_12(i128 %x) nounwind {
	; X86-64-LABEL: urem_i128_12:			; X86-64-LABEL: urem_i128_12:
	; X86-64: # %bb.0: # %entry			; X86-64: # %bb.0: # %entry
	; X86-64-NEXT: pushq %rax			; X86-64-NEXT: shrdq $2, %rsi, %rdi
	; X86-64-NEXT: movl $12, %edx			; X86-64-NEXT: shrq $2, %rsi
	; X86-64-NEXT: xorl %ecx, %ecx			; X86-64-NEXT: addq %rdi, %rsi
	; X86-64-NEXT: callq __umodti3@PLT			; X86-64-NEXT: adcq $0, %rsi
	; X86-64-NEXT: popq %rcx			; X86-64-NEXT: movabsq $-6148914691236517205, %rcx # imm = 0xAAAAAAAAAAAAAAAB
				; X86-64-NEXT: movq %rsi, %rax
				; X86-64-NEXT: mulq %rcx
				; X86-64-NEXT: shrq %rdx
				; X86-64-NEXT: leaq (%rdx,%rdx,2), %rax
				; X86-64-NEXT: subq %rax, %rsi
				; X86-64-NEXT: andl $3, %edi
				; X86-64-NEXT: leaq (%rdi,%rsi,4), %rax
				; X86-64-NEXT: xorl %edx, %edx
	; X86-64-NEXT: retq			; X86-64-NEXT: retq
	;			;
	; WIN64-LABEL: urem_i128_12:			; WIN64-LABEL: urem_i128_12:
	; WIN64: # %bb.0: # %entry			; WIN64: # %bb.0: # %entry
	; WIN64-NEXT: subq $72, %rsp			; WIN64-NEXT: movq %rdx, %r8
	; WIN64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)			; WIN64-NEXT: shrdq $2, %rdx, %rcx
	; WIN64-NEXT: movq %rcx, {{[0-9]+}}(%rsp)			; WIN64-NEXT: shrq $2, %r8
	; WIN64-NEXT: movq $12, {{[0-9]+}}(%rsp)			; WIN64-NEXT: addq %rcx, %r8
	; WIN64-NEXT: movq $0, {{[0-9]+}}(%rsp)			; WIN64-NEXT: adcq $0, %r8
	; WIN64-NEXT: leaq {{[0-9]+}}(%rsp), %rcx			; WIN64-NEXT: movabsq $-6148914691236517205, %rdx # imm = 0xAAAAAAAAAAAAAAAB
	; WIN64-NEXT: leaq {{[0-9]+}}(%rsp), %rdx			; WIN64-NEXT: movq %r8, %rax
	; WIN64-NEXT: callq __umodti3			; WIN64-NEXT: mulq %rdx
	; WIN64-NEXT: movq %xmm0, %rax			; WIN64-NEXT: shrq %rdx
	; WIN64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]			; WIN64-NEXT: leaq (%rdx,%rdx,2), %rax
	; WIN64-NEXT: movq %xmm0, %rdx			; WIN64-NEXT: subq %rax, %r8
	; WIN64-NEXT: addq $72, %rsp			; WIN64-NEXT: andl $3, %ecx
				; WIN64-NEXT: leaq (%rcx,%r8,4), %rax
				; WIN64-NEXT: xorl %edx, %edx
	; WIN64-NEXT: retq			; WIN64-NEXT: retq
	entry:			entry:
	%rem = urem i128 %x, 12			%rem = urem i128 %x, 12
	ret i128 %rem			ret i128 %rem
	}			}

	define i128 @udiv_i128_3(i128 %x) nounwind {			define i128 @udiv_i128_3(i128 %x) nounwind {
	; X86-64-LABEL: udiv_i128_3:			; X86-64-LABEL: udiv_i128_3:
	▲ Show 20 Lines • Show All 425 Lines • ▼ Show 20 Lines
	entry:			entry:
	%rem = udiv i128 %x, 65537			%rem = udiv i128 %x, 65537
	ret i128 %rem			ret i128 %rem
	}			}

	define i128 @udiv_i128_12(i128 %x) nounwind {			define i128 @udiv_i128_12(i128 %x) nounwind {
	; X86-64-LABEL: udiv_i128_12:			; X86-64-LABEL: udiv_i128_12:
	; X86-64: # %bb.0: # %entry			; X86-64: # %bb.0: # %entry
	; X86-64-NEXT: pushq %rax			; X86-64-NEXT: shrdq $2, %rsi, %rdi
	; X86-64-NEXT: movl $12, %edx			; X86-64-NEXT: shrq $2, %rsi
	; X86-64-NEXT: xorl %ecx, %ecx			; X86-64-NEXT: movq %rdi, %rcx
	; X86-64-NEXT: callq __udivti3@PLT			; X86-64-NEXT: addq %rsi, %rcx
	; X86-64-NEXT: popq %rcx			; X86-64-NEXT: adcq $0, %rcx
				; X86-64-NEXT: movabsq $-6148914691236517205, %r8 # imm = 0xAAAAAAAAAAAAAAAB
				; X86-64-NEXT: movq %rcx, %rax
				; X86-64-NEXT: mulq %r8
				; X86-64-NEXT: shrq %rdx
				; X86-64-NEXT: leaq (%rdx,%rdx,2), %rax
				; X86-64-NEXT: subq %rax, %rcx
				; X86-64-NEXT: subq %rcx, %rdi
				; X86-64-NEXT: sbbq $0, %rsi
				; X86-64-NEXT: movabsq $-6148914691236517206, %rcx # imm = 0xAAAAAAAAAAAAAAAA
				; X86-64-NEXT: imulq %rdi, %rcx
				; X86-64-NEXT: movq %rdi, %rax
				; X86-64-NEXT: mulq %r8
				; X86-64-NEXT: addq %rcx, %rdx
				; X86-64-NEXT: imulq %rsi, %r8
				; X86-64-NEXT: addq %r8, %rdx
	; X86-64-NEXT: retq			; X86-64-NEXT: retq
	;			;
	; WIN64-LABEL: udiv_i128_12:			; WIN64-LABEL: udiv_i128_12:
	; WIN64: # %bb.0: # %entry			; WIN64: # %bb.0: # %entry
	; WIN64-NEXT: subq $72, %rsp			; WIN64-NEXT: movq %rdx, %r8
	; WIN64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)			; WIN64-NEXT: shrdq $2, %rdx, %rcx
	; WIN64-NEXT: movq %rcx, {{[0-9]+}}(%rsp)			; WIN64-NEXT: shrq $2, %r8
	; WIN64-NEXT: movq $12, {{[0-9]+}}(%rsp)			; WIN64-NEXT: movq %rcx, %r9
	; WIN64-NEXT: movq $0, {{[0-9]+}}(%rsp)			; WIN64-NEXT: addq %r8, %r9
	; WIN64-NEXT: leaq {{[0-9]+}}(%rsp), %rcx			; WIN64-NEXT: adcq $0, %r9
	; WIN64-NEXT: leaq {{[0-9]+}}(%rsp), %rdx			; WIN64-NEXT: movabsq $-6148914691236517205, %r10 # imm = 0xAAAAAAAAAAAAAAAB
	; WIN64-NEXT: callq __udivti3			; WIN64-NEXT: movq %r9, %rax
	; WIN64-NEXT: movq %xmm0, %rax			; WIN64-NEXT: mulq %r10
	; WIN64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]			; WIN64-NEXT: shrq %rdx
	; WIN64-NEXT: movq %xmm0, %rdx			; WIN64-NEXT: leaq (%rdx,%rdx,2), %rax
	; WIN64-NEXT: addq $72, %rsp			; WIN64-NEXT: subq %rax, %r9
				; WIN64-NEXT: subq %r9, %rcx
				; WIN64-NEXT: sbbq $0, %r8
				; WIN64-NEXT: movabsq $-6148914691236517206, %r9 # imm = 0xAAAAAAAAAAAAAAAA
				; WIN64-NEXT: imulq %rcx, %r9
				; WIN64-NEXT: movq %rcx, %rax
				; WIN64-NEXT: mulq %r10
				; WIN64-NEXT: addq %r9, %rdx
				; WIN64-NEXT: imulq %r10, %r8
				; WIN64-NEXT: addq %r8, %rdx
	; WIN64-NEXT: retq			; WIN64-NEXT: retq
	entry:			entry:
	%rem = udiv i128 %x, 12			%rem = udiv i128 %x, 12
	ret i128 %rem			ret i128 %rem
	}			}

	; Make sure we don't inline expand for minsize.			; Make sure we don't inline expand for minsize.
	define i128 @urem_i128_3_minsize(i128 %x) nounwind minsize {			define i128 @urem_i128_3_minsize(i128 %x) nounwind minsize {
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines