This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3934	Just be curious. Why the alignment is 4 for all target?
3938	Not sure if the pointer is "i32 " or "i129 ".
3963	Will the parameter passed in register for i386?
3965	Given the return value is void, why set the S/ZExt?
llvm/test/CodeGen/X86/udivmodei5.ll
3	Generate case for i386?
29	Is there any ABI description that shows how to pass i129 parameter or return i129 value?

mgehre-amd added inline comments.Feb 23 2022, 12:15 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3934	No particular reason. The interface to the __udivei4 functions is specified as `unsigned int` right now, so I wanted this to be aligned like an `int`. I guess we should somehow obtain the alignment of an `int` on the target platform? (How?) Alternatively, we can make __udivei4 take a `uint32_t`, so we don't need to guess what an `int` is in this target, and then use `DAG.getDataLayout().getABITypeAlign(Type::getInt32Ty())`?
3938	The pointer here will be i256* (after i129 is expanded to i256). The `__udivei4` argument is `unsigned int[]` to allow for any bitsize.
3963	I got this from the code hat does i128 division lowering. Its not really clear to me when to set this flag, and it seems removing it has no effect on the generated assembly.
3965	Good catch, I will remove those.
llvm/test/CodeGen/X86/udivmodei5.ll
3	Thanks, will do
29	My observation is that those get passed around as pointers, and my interpretation is that they are handled like big structs. Also we only seem to generated power-of-2 sizes after SelectionDAG, so we are passing a i256 here.

Add test case of i386
Remove SExt, ZExt and setInRegister from CallLoweringInfo

mgehre-amd marked 3 inline comments as not done.Feb 23 2022, 12:18 AM

LuoYuanke added reviewers: craig.topper, RKSimon, pengfei.Feb 23 2022, 12:23 AM

Harbormaster completed remote builds in B151007: Diff 410728.Feb 23 2022, 12:43 AM

LuoYuanke added inline comments.Feb 23 2022, 1:54 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3938	Is it reasonable to have the pointer be "i32 *" so that is align the prototype of `__udivei4`?
llvm/test/CodeGen/X86/udivmodei5.ll
29	OK, backend would call TLI->getNumRegistersForCallingConv() to calculate how many virtual register are needed and follow the calling convention to allocate register or memory.

LuoYuanke added inline comments.Feb 23 2022, 3:38 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3962	I notice in RFC (https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329) the prototype of the function is `void __udivei4(unsigned int quo, unsigned int a, unsigned int *b, unsigned int bits);`. I didn't find the code to pass `quo` and the code to load the value from `quo` after the call.

LuoYuanke added inline comments.Feb 23 2022, 3:40 AM

llvm/test/CodeGen/X86/udivmodei5.ll
99	Is the result returned as value or from a pointer?

mgehre-amd added inline comments.Feb 23 2022, 5:43 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3962	You are right, I did something stupid here. It turns out that I'm telling here that __udivei4 returns a `i256` (or similar) where instead the first argument is the output argument providing a `i256`. In the assembly, both come out the same, so my tests worked. But I will change this here to clearly load back the result from the output parameter.
llvm/test/CodeGen/X86/udivmodei5.ll
99	The result is written to the pointer given by the first argument. I.e. for `void __udivei4(unsigned int quo, unsigned int a, unsigned int *b, unsigned int bits)`, the result is stored to `quo`.

Don't declare the return value of __divei4 to be the large integer. Use the first output parameter.
Don't hard code the alignment of the arguments to be 4

Harbormaster completed remote builds in B151035: Diff 410784.Feb 23 2022, 6:07 AM

Add a test case for aarch64 (showing big endian)

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 7:08 AM

Harbormaster completed remote builds in B152915: Diff 413457.Mar 7 2022, 8:17 AM

LuoYuanke added inline comments.Mar 9 2022, 1:08 AM

llvm/include/llvm/IR/RuntimeLibcalls.def
50	Why set mul? Is there any test case for it?
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2114	Do we support bitsize that is no power of 2 (e.g., MVT::i1)? Should we check if the bit size exceed 128?
4350–4351	Why touch MUL? Can't MUL be splitted by Codegen?
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4004	Do we need to check type bit size exceed 128?
4005	Is expandExtIntRes_DIVREM more readable as the function name?
llvm/test/CodeGen/X86/udivmodei5.ll
10	Add nounwind to avoid generating cfi instruction.

LuoYuanke added inline comments.Mar 9 2022, 1:57 AM

llvm/test/CodeGen/X86/udivmodei5.ll

I'm not sure it is legal to pass i129. It seems front-end would pass i129 by value. However it is not related to this patch.

[clang]$ cat t.c
_BitInt(129) foo(_BitInt(129) a) {
  return a++;
}
[clang]$
[clang]$ clang -S t.c -emit-llvm -O2
[clang]$ cat t.ll
; ModuleID = 't.c'
source_filename = "t.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn uwtable
define dso_local void @foo(i129* noalias nocapture writeonly sret(i129) align 8 %agg.result, i129* nocapture noundef readonly byval(i129) align 8 %0) local_unnamed_addr #0 {
entry:
  %a = load i129, i129* %0, align 8, !tbaa !3
  store i129 %a, i129* %agg.result, align 8, !tbaa !3
  ret void
}

Is there any update on the patch?

aaron.ballman added a subscriber: aaron.ballman.Mar 11 2022, 4:26 AM

aaron.ballman added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2114	Yes, `_BitInt` can be of any width: https://godbolt.org/z/4zGfrosc7

Rename to ExpandExtIntRes_DIVREM
Add nounwind

llvm/include/llvm/IR/RuntimeLibcalls.def
50	see my other comment
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2114	It seems that SelectionDAG through some magic always first expands the integer type to a power of two before coming here and expanding into a lib call. So i1 would be expanded into i8. For this reason, only divisions on types bigger than i128 currently crash. For types bigger than i128, SelectionDAG also expands them to the next power of two, i.e. i129 becomes i256. I didn't want to change this, because it makes implementing the lib function easier. But for efficiency, we can try to disable this and pass the original non-power-of-2 type to the libcall in a future PR.
4350–4351	Yes, MUL will be splitted by codegen and never get here afaik, but I still needed to add a MUL_IEXT constant because this uses the same `ExpandIntLibCall` which is used for the division/remainder. And I had to extend ExpandIntLibCall by an extra argument to handle the > 128 bit case. I'm not really sure why this code is here at all, and whether MUL is actually expanded to a libcall on any target. I'm open for suggestions, I don't know a lot about SelectionDAG.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4004	Yes, but there is already an assert in passArguments_DIVREM (now called ExpandExtIntRes_DIVREM).
4005	yes, I'll rename it
llvm/test/CodeGen/X86/udivmodei5.ll
10	Great, thanks! I will add it

mgehre-amd marked 5 inline comments as done.Mar 11 2022, 7:41 AM

mgehre-amd marked 7 inline comments as done.Mar 11 2022, 7:49 AM

mgehre-amd added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3938	I'm not sure whether pointer casts exists in SelectionDAG. I found `DAG.getBitcast`, but from my understanding this only does bitcasts between integer types. (?)

Harbormaster completed remote builds in B153768: Diff 414655.Mar 11 2022, 8:14 AM

LGTM, thanks.

This revision is now accepted and ready to land.Mar 11 2022, 8:42 PM

@LuoYuanke, thank you very much for the review and comments!

Should I wait for somebody else to approve the PR too, or can I go ahead and push?

In D120329#3379223, @mgehre-amd wrote:

@LuoYuanke, thank you very much for the review and comments!

Should I wait for somebody else to approve the PR too, or can I go ahead and push?

Thanks for the patch. Maybe wait 1 or 2 days in case there are comments from other reviewers. If there is no objection I think you can push to patch.

pengfei added inline comments.Mar 14 2022, 7:35 AM

llvm/test/CodeGen/X86/udivmodei5.ll
6	I think we'd better to avoid passing / returning `i129` type in tests, especially the returning. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td#L221-L222 The comments said returning more than two registers is out of the ABI scope. What's more, unlike the passing arguments, returning just has 3 usable registers and cannot use stack. So it will always crash if we want to return a type larger than `i192`.

aaron.ballman added inline comments.Mar 14 2022, 11:58 AM

llvm/test/CodeGen/X86/udivmodei5.ll
6	So it will always crash if we want to return a type larger than i192. This seems like a bug we'd need to fix, yes?

pengfei added inline comments.Mar 14 2022, 10:20 PM

llvm/test/CodeGen/X86/udivmodei5.ll
6	Sorry, my mistake. It won't crash https://godbolt.org/z/f6b66xcfe. It turns out only type `i129` ~ `i192` has dubious behavior.

Can we land this patch? I think we can update test case after that if there is better idea for the parameter passing/returning.

In D120329#3384538, @LuoYuanke wrote:

Can we land this patch? I think we can update test case after that if there is better idea for the parameter passing/returning.

I'm fine with it. If fact, I think we may need to reflect the ABI lowering in the backend. That says the i129 will be good tests for the future work, though we may not work on it very recently.

This revision was landed with ongoing or failed builds.Mar 16 2022, 2:36 AM

Closed by commit rG09854f2af3b9: [SelectionDAG] Emit calls to __divei4 and friends for division/remainder of… (authored by mgehre-amd). · Explain Why

This revision was automatically updated to reflect the committed changes.

Matthias Gehre <matthias.gehre@xilinx.com> added a commit: rG09854f2af3b9: [SelectionDAG] Emit calls to __divei4 and friends for division/remainder of….

In D120329#3384553, @pengfei wrote:

In D120329#3384538, @LuoYuanke wrote:

Can we land this patch? I think we can update test case after that if there is better idea for the parameter passing/returning.

I'm fine with it. If fact, I think we may need to reflect the ABI lowering in the backend. That says the i129 will be good tests for the future work, though we may not work on it very recently.

FYI. X86 psABI has defined the representation of _BitInt(N) https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/ceff85b232117f15296da5a4ecc98e25a0547093
I can see both front end and backend need to change. FE should always emit i(((N + 63) / 64) * 64) type, while BE should take all non pow of 2 i(((N + 63) / 64) * 64) type as legal.
Thought?

Herald added a subscriber: StephenFan. · View Herald TranscriptApr 1 2022, 1:54 AM

mgehre-amd mentioned this in D123363: [SelectionDAG] Update emission of udivmodei5 to latest ABI changes.Apr 7 2022, 11:52 PM

mgehre-amd mentioned this in D130079: Revert "[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers".Jul 19 2022, 5:04 AM

Matthias Gehre <matthias.gehre@xilinx.com> mentioned this in rG6d13b80fcb1a: Revert "[SelectionDAG] Emit calls to __divei4 and friends for….Aug 26 2022, 2:53 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

RuntimeLibcalls.def

14 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

67 lines

LegalizeIntegerTypes.cpp

96 lines

test/

CodeGen/

AArch64/

udivmodei5.ll

276 lines

X86/

udivmodei5.ll

1110 lines

Diff 415745

llvm/include/llvm/IR/RuntimeLibcalls.def

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	HANDLE_LIBCALL(SRA_I32, "__ashrsi3")			HANDLE_LIBCALL(SRA_I32, "__ashrsi3")
	HANDLE_LIBCALL(SRA_I64, "__ashrdi3")			HANDLE_LIBCALL(SRA_I64, "__ashrdi3")
	HANDLE_LIBCALL(SRA_I128, "__ashrti3")			HANDLE_LIBCALL(SRA_I128, "__ashrti3")
	HANDLE_LIBCALL(MUL_I8, "__mulqi3")			HANDLE_LIBCALL(MUL_I8, "__mulqi3")
	HANDLE_LIBCALL(MUL_I16, "__mulhi3")			HANDLE_LIBCALL(MUL_I16, "__mulhi3")
	HANDLE_LIBCALL(MUL_I32, "__mulsi3")			HANDLE_LIBCALL(MUL_I32, "__mulsi3")
	HANDLE_LIBCALL(MUL_I64, "__muldi3")			HANDLE_LIBCALL(MUL_I64, "__muldi3")
	HANDLE_LIBCALL(MUL_I128, "__multi3")			HANDLE_LIBCALL(MUL_I128, "__multi3")
				HANDLE_LIBCALL(MUL_IEXT, nullptr)
				LuoYuankeUnsubmitted Done Reply Inline Actions Why set mul? Is there any test case for it? LuoYuanke: Why set mul? Is there any test case for it?
				mgehre-amdAuthorUnsubmitted Done Reply Inline Actions see my other comment mgehre-amd: see my other comment

	HANDLE_LIBCALL(MULO_I32, "__mulosi4")			HANDLE_LIBCALL(MULO_I32, "__mulosi4")
	HANDLE_LIBCALL(MULO_I64, "__mulodi4")			HANDLE_LIBCALL(MULO_I64, "__mulodi4")
	HANDLE_LIBCALL(MULO_I128, "__muloti4")			HANDLE_LIBCALL(MULO_I128, "__muloti4")
	HANDLE_LIBCALL(SDIV_I8, "__divqi3")			HANDLE_LIBCALL(SDIV_I8, "__divqi3")
	HANDLE_LIBCALL(SDIV_I16, "__divhi3")			HANDLE_LIBCALL(SDIV_I16, "__divhi3")
	HANDLE_LIBCALL(SDIV_I32, "__divsi3")			HANDLE_LIBCALL(SDIV_I32, "__divsi3")
	HANDLE_LIBCALL(SDIV_I64, "__divdi3")			HANDLE_LIBCALL(SDIV_I64, "__divdi3")
	HANDLE_LIBCALL(SDIV_I128, "__divti3")			HANDLE_LIBCALL(SDIV_I128, "__divti3")
				HANDLE_LIBCALL(SDIV_IEXT, "__divei4")

	HANDLE_LIBCALL(UDIV_I8, "__udivqi3")			HANDLE_LIBCALL(UDIV_I8, "__udivqi3")
	HANDLE_LIBCALL(UDIV_I16, "__udivhi3")			HANDLE_LIBCALL(UDIV_I16, "__udivhi3")
	HANDLE_LIBCALL(UDIV_I32, "__udivsi3")			HANDLE_LIBCALL(UDIV_I32, "__udivsi3")
	HANDLE_LIBCALL(UDIV_I64, "__udivdi3")			HANDLE_LIBCALL(UDIV_I64, "__udivdi3")
	HANDLE_LIBCALL(UDIV_I128, "__udivti3")			HANDLE_LIBCALL(UDIV_I128, "__udivti3")
				HANDLE_LIBCALL(UDIV_IEXT, "__udivei4")

	HANDLE_LIBCALL(SREM_I8, "__modqi3")			HANDLE_LIBCALL(SREM_I8, "__modqi3")
	HANDLE_LIBCALL(SREM_I16, "__modhi3")			HANDLE_LIBCALL(SREM_I16, "__modhi3")
	HANDLE_LIBCALL(SREM_I32, "__modsi3")			HANDLE_LIBCALL(SREM_I32, "__modsi3")
	HANDLE_LIBCALL(SREM_I64, "__moddi3")			HANDLE_LIBCALL(SREM_I64, "__moddi3")
	HANDLE_LIBCALL(SREM_I128, "__modti3")			HANDLE_LIBCALL(SREM_I128, "__modti3")
				HANDLE_LIBCALL(SREM_IEXT, "__modei4")

	HANDLE_LIBCALL(UREM_I8, "__umodqi3")			HANDLE_LIBCALL(UREM_I8, "__umodqi3")
	HANDLE_LIBCALL(UREM_I16, "__umodhi3")			HANDLE_LIBCALL(UREM_I16, "__umodhi3")
	HANDLE_LIBCALL(UREM_I32, "__umodsi3")			HANDLE_LIBCALL(UREM_I32, "__umodsi3")
	HANDLE_LIBCALL(UREM_I64, "__umoddi3")			HANDLE_LIBCALL(UREM_I64, "__umoddi3")
	HANDLE_LIBCALL(UREM_I128, "__umodti3")			HANDLE_LIBCALL(UREM_I128, "__umodti3")
				HANDLE_LIBCALL(UREM_IEXT, "__umodei4")

	HANDLE_LIBCALL(SDIVREM_I8, nullptr)			HANDLE_LIBCALL(SDIVREM_I8, nullptr)
	HANDLE_LIBCALL(SDIVREM_I16, nullptr)			HANDLE_LIBCALL(SDIVREM_I16, nullptr)
	HANDLE_LIBCALL(SDIVREM_I32, nullptr)			HANDLE_LIBCALL(SDIVREM_I32, nullptr)
	HANDLE_LIBCALL(SDIVREM_I64, nullptr)			HANDLE_LIBCALL(SDIVREM_I64, nullptr)
	HANDLE_LIBCALL(SDIVREM_I128, nullptr)			HANDLE_LIBCALL(SDIVREM_I128, nullptr)
				HANDLE_LIBCALL(SDIVREM_IEXT, nullptr)

	HANDLE_LIBCALL(UDIVREM_I8, nullptr)			HANDLE_LIBCALL(UDIVREM_I8, nullptr)
	HANDLE_LIBCALL(UDIVREM_I16, nullptr)			HANDLE_LIBCALL(UDIVREM_I16, nullptr)
	HANDLE_LIBCALL(UDIVREM_I32, nullptr)			HANDLE_LIBCALL(UDIVREM_I32, nullptr)
	HANDLE_LIBCALL(UDIVREM_I64, nullptr)			HANDLE_LIBCALL(UDIVREM_I64, nullptr)
	HANDLE_LIBCALL(UDIVREM_I128, nullptr)			HANDLE_LIBCALL(UDIVREM_I128, nullptr)
				HANDLE_LIBCALL(UDIVREM_IEXT, nullptr)

	HANDLE_LIBCALL(NEG_I32, "__negsi2")			HANDLE_LIBCALL(NEG_I32, "__negsi2")
	HANDLE_LIBCALL(NEG_I64, "__negdi2")			HANDLE_LIBCALL(NEG_I64, "__negdi2")
	HANDLE_LIBCALL(CTLZ_I32, "__clzsi2")			HANDLE_LIBCALL(CTLZ_I32, "__clzsi2")
	HANDLE_LIBCALL(CTLZ_I64, "__clzdi2")			HANDLE_LIBCALL(CTLZ_I64, "__clzdi2")
	HANDLE_LIBCALL(CTLZ_I128, "__clzti2")			HANDLE_LIBCALL(CTLZ_I128, "__clzti2")

	// Floating-point			// Floating-point
	HANDLE_LIBCALL(ADD_F32, "__addsf3")			HANDLE_LIBCALL(ADD_F32, "__addsf3")
	▲ Show 20 Lines • Show All 502 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	private:

void ExpandFPLibCall(SDNode *Node, RTLIB::Libcall LC,		void ExpandFPLibCall(SDNode *Node, RTLIB::Libcall LC,
SmallVectorImpl<SDValue> &Results);		SmallVectorImpl<SDValue> &Results);
void ExpandFPLibCall(SDNode *Node, RTLIB::Libcall Call_F32,		void ExpandFPLibCall(SDNode *Node, RTLIB::Libcall Call_F32,
RTLIB::Libcall Call_F64, RTLIB::Libcall Call_F80,		RTLIB::Libcall Call_F64, RTLIB::Libcall Call_F80,
RTLIB::Libcall Call_F128,		RTLIB::Libcall Call_F128,
RTLIB::Libcall Call_PPCF128,		RTLIB::Libcall Call_PPCF128,
SmallVectorImpl<SDValue> &Results);		SmallVectorImpl<SDValue> &Results);
SDValue ExpandIntLibCall(SDNode *Node, bool isSigned,		SDValue ExpandIntLibCall(SDNode *Node, bool isSigned, RTLIB::Libcall Call_I8,
RTLIB::Libcall Call_I8,		RTLIB::Libcall Call_I16, RTLIB::Libcall Call_I32,
RTLIB::Libcall Call_I16,		RTLIB::Libcall Call_I64, RTLIB::Libcall Call_I128,
RTLIB::Libcall Call_I32,		RTLIB::Libcall Call_IEXT);
RTLIB::Libcall Call_I64,
RTLIB::Libcall Call_I128);
void ExpandArgFPLibCall(SDNode *Node,		void ExpandArgFPLibCall(SDNode *Node,
RTLIB::Libcall Call_F32, RTLIB::Libcall Call_F64,		RTLIB::Libcall Call_F32, RTLIB::Libcall Call_F64,
RTLIB::Libcall Call_F80, RTLIB::Libcall Call_F128,		RTLIB::Libcall Call_F80, RTLIB::Libcall Call_F128,
RTLIB::Libcall Call_PPCF128,		RTLIB::Libcall Call_PPCF128,
SmallVectorImpl<SDValue> &Results);		SmallVectorImpl<SDValue> &Results);
void ExpandDivRemLibCall(SDNode *Node, SmallVectorImpl<SDValue> &Results);		void ExpandDivRemLibCall(SDNode *Node, SmallVectorImpl<SDValue> &Results);
void ExpandSinCosLibCall(SDNode *Node, SmallVectorImpl<SDValue> &Results);		void ExpandSinCosLibCall(SDNode *Node, SmallVectorImpl<SDValue> &Results);

▲ Show 20 Lines • Show All 1,942 Lines • ▼ Show 20 Lines	void SelectionDAGLegalize::ExpandFPLibCall(SDNode* Node,
RTLIB::Libcall Call_PPCF128,		RTLIB::Libcall Call_PPCF128,
SmallVectorImpl<SDValue> &Results) {		SmallVectorImpl<SDValue> &Results) {
RTLIB::Libcall LC = RTLIB::getFPLibCall(Node->getSimpleValueType(0),		RTLIB::Libcall LC = RTLIB::getFPLibCall(Node->getSimpleValueType(0),
Call_F32, Call_F64, Call_F80,		Call_F32, Call_F64, Call_F80,
Call_F128, Call_PPCF128);		Call_F128, Call_PPCF128);
ExpandFPLibCall(Node, LC, Results);		ExpandFPLibCall(Node, LC, Results);
}		}

SDValue SelectionDAGLegalize::ExpandIntLibCall(SDNode* Node, bool isSigned,		SDValue SelectionDAGLegalize::ExpandIntLibCall(
RTLIB::Libcall Call_I8,		SDNode *Node, bool isSigned, RTLIB::Libcall Call_I8,
RTLIB::Libcall Call_I16,		RTLIB::Libcall Call_I16, RTLIB::Libcall Call_I32, RTLIB::Libcall Call_I64,
RTLIB::Libcall Call_I32,		RTLIB::Libcall Call_I128, RTLIB::Libcall Call_IEXT) {
RTLIB::Libcall Call_I64,
RTLIB::Libcall Call_I128) {
RTLIB::Libcall LC;		RTLIB::Libcall LC;
switch (Node->getSimpleValueType(0).SimpleTy) {		switch (Node->getSimpleValueType(0).SimpleTy) {
default: llvm_unreachable("Unexpected request for libcall!");
		default:
		LC = Call_IEXT;
		LuoYuankeUnsubmitted Done Reply Inline Actions Do we support bitsize that is no power of 2 (e.g., MVT::i1)? Should we check if the bit size exceed 128? LuoYuanke: Do we support bitsize that is no power of 2 (e.g., MVT::i1)? Should we check if the bit size…
		aaron.ballmanUnsubmitted Done Reply Inline Actions Yes, `_BitInt` can be of any width: https://godbolt.org/z/4zGfrosc7 aaron.ballman: Yes, `_BitInt` can be of any width: https://godbolt.org/z/4zGfrosc7
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions It seems that SelectionDAG through some magic always first expands the integer type to a power of two before coming here and expanding into a lib call. So i1 would be expanded into i8. For this reason, only divisions on types bigger than i128 currently crash. For types bigger than i128, SelectionDAG also expands them to the next power of two, i.e. i129 becomes i256. I didn't want to change this, because it makes implementing the lib function easier. But for efficiency, we can try to disable this and pass the original non-power-of-2 type to the libcall in a future PR. mgehre-amd: It seems that SelectionDAG through some magic always first expands the integer type to a power…
		break;

case MVT::i8: LC = Call_I8; break;		case MVT::i8: LC = Call_I8; break;
case MVT::i16: LC = Call_I16; break;		case MVT::i16: LC = Call_I16; break;
case MVT::i32: LC = Call_I32; break;		case MVT::i32: LC = Call_I32; break;
case MVT::i64: LC = Call_I64; break;		case MVT::i64: LC = Call_I64; break;
case MVT::i128: LC = Call_I128; break;		case MVT::i128: LC = Call_I128; break;
}		}
return ExpandLibCall(LC, Node, isSigned);		return ExpandLibCall(LC, Node, isSigned);
}		}
Show All 18 Lines
void		void
SelectionDAGLegalize::ExpandDivRemLibCall(SDNode *Node,		SelectionDAGLegalize::ExpandDivRemLibCall(SDNode *Node,
SmallVectorImpl<SDValue> &Results) {		SmallVectorImpl<SDValue> &Results) {
unsigned Opcode = Node->getOpcode();		unsigned Opcode = Node->getOpcode();
bool isSigned = Opcode == ISD::SDIVREM;		bool isSigned = Opcode == ISD::SDIVREM;

RTLIB::Libcall LC;		RTLIB::Libcall LC;
switch (Node->getSimpleValueType(0).SimpleTy) {		switch (Node->getSimpleValueType(0).SimpleTy) {
default: llvm_unreachable("Unexpected request for libcall!");
		default:
		LC = isSigned ? RTLIB::SDIVREM_IEXT : RTLIB::UDIVREM_IEXT;
		break;

case MVT::i8: LC= isSigned ? RTLIB::SDIVREM_I8 : RTLIB::UDIVREM_I8; break;		case MVT::i8: LC= isSigned ? RTLIB::SDIVREM_I8 : RTLIB::UDIVREM_I8; break;
case MVT::i16: LC= isSigned ? RTLIB::SDIVREM_I16 : RTLIB::UDIVREM_I16; break;		case MVT::i16: LC= isSigned ? RTLIB::SDIVREM_I16 : RTLIB::UDIVREM_I16; break;
case MVT::i32: LC= isSigned ? RTLIB::SDIVREM_I32 : RTLIB::UDIVREM_I32; break;		case MVT::i32: LC= isSigned ? RTLIB::SDIVREM_I32 : RTLIB::UDIVREM_I32; break;
case MVT::i64: LC= isSigned ? RTLIB::SDIVREM_I64 : RTLIB::UDIVREM_I64; break;		case MVT::i64: LC= isSigned ? RTLIB::SDIVREM_I64 : RTLIB::UDIVREM_I64; break;
case MVT::i128: LC= isSigned ? RTLIB::SDIVREM_I128:RTLIB::UDIVREM_I128; break;		case MVT::i128: LC= isSigned ? RTLIB::SDIVREM_I128:RTLIB::UDIVREM_I128; break;
}		}

// The input chain to this libcall is the entry node of the function.		// The input chain to this libcall is the entry node of the function.
▲ Show 20 Lines • Show All 2,154 Lines • ▼ Show 20 Lines	void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
}		}
case ISD::FSUB:		case ISD::FSUB:
case ISD::STRICT_FSUB:		case ISD::STRICT_FSUB:
ExpandFPLibCall(Node, RTLIB::SUB_F32, RTLIB::SUB_F64,		ExpandFPLibCall(Node, RTLIB::SUB_F32, RTLIB::SUB_F64,
RTLIB::SUB_F80, RTLIB::SUB_F128,		RTLIB::SUB_F80, RTLIB::SUB_F128,
RTLIB::SUB_PPCF128, Results);		RTLIB::SUB_PPCF128, Results);
break;		break;
case ISD::SREM:		case ISD::SREM:
Results.push_back(ExpandIntLibCall(Node, true,		Results.push_back(ExpandIntLibCall(
RTLIB::SREM_I8,		Node, true, RTLIB::SREM_I8, RTLIB::SREM_I16, RTLIB::SREM_I32,
RTLIB::SREM_I16, RTLIB::SREM_I32,		RTLIB::SREM_I64, RTLIB::SREM_I128, RTLIB::SREM_IEXT));
RTLIB::SREM_I64, RTLIB::SREM_I128));
break;		break;
case ISD::UREM:		case ISD::UREM:
Results.push_back(ExpandIntLibCall(Node, false,		Results.push_back(ExpandIntLibCall(
RTLIB::UREM_I8,		Node, false, RTLIB::UREM_I8, RTLIB::UREM_I16, RTLIB::UREM_I32,
RTLIB::UREM_I16, RTLIB::UREM_I32,		RTLIB::UREM_I64, RTLIB::UREM_I128, RTLIB::UREM_IEXT));
RTLIB::UREM_I64, RTLIB::UREM_I128));
break;		break;
case ISD::SDIV:		case ISD::SDIV:
Results.push_back(ExpandIntLibCall(Node, true,		Results.push_back(ExpandIntLibCall(
RTLIB::SDIV_I8,		Node, true, RTLIB::SDIV_I8, RTLIB::SDIV_I16, RTLIB::SDIV_I32,
RTLIB::SDIV_I16, RTLIB::SDIV_I32,		RTLIB::SDIV_I64, RTLIB::SDIV_I128, RTLIB::SDIV_IEXT));
RTLIB::SDIV_I64, RTLIB::SDIV_I128));
break;		break;
case ISD::UDIV:		case ISD::UDIV:
Results.push_back(ExpandIntLibCall(Node, false,		Results.push_back(ExpandIntLibCall(
RTLIB::UDIV_I8,		Node, false, RTLIB::UDIV_I8, RTLIB::UDIV_I16, RTLIB::UDIV_I32,
RTLIB::UDIV_I16, RTLIB::UDIV_I32,		RTLIB::UDIV_I64, RTLIB::UDIV_I128, RTLIB::UDIV_IEXT));
RTLIB::UDIV_I64, RTLIB::UDIV_I128));
break;		break;
case ISD::SDIVREM:		case ISD::SDIVREM:
case ISD::UDIVREM:		case ISD::UDIVREM:
// Expand into divrem libcall		// Expand into divrem libcall
ExpandDivRemLibCall(Node, Results);		ExpandDivRemLibCall(Node, Results);
break;		break;
case ISD::MUL:		case ISD::MUL:
Results.push_back(ExpandIntLibCall(Node, false,		Results.push_back(ExpandIntLibCall(
		LuoYuankeUnsubmitted Done Reply Inline Actions Why touch MUL? Can't MUL be splitted by Codegen? LuoYuanke: Why touch MUL? Can't MUL be splitted by Codegen?
		mgehre-amdAuthorUnsubmitted Not Done Reply Inline Actions Yes, MUL will be splitted by codegen and never get here afaik, but I still needed to add a MUL_IEXT constant because this uses the same `ExpandIntLibCall` which is used for the division/remainder. And I had to extend ExpandIntLibCall by an extra argument to handle the > 128 bit case. I'm not really sure why this code is here at all, and whether MUL is actually expanded to a libcall on any target. I'm open for suggestions, I don't know a lot about SelectionDAG. mgehre-amd: Yes, MUL will be splitted by codegen and never get here afaik, but I still needed to add a…
RTLIB::MUL_I8,		Node, false, RTLIB::MUL_I8, RTLIB::MUL_I16, RTLIB::MUL_I32,
RTLIB::MUL_I16, RTLIB::MUL_I32,		RTLIB::MUL_I64, RTLIB::MUL_I128, RTLIB::MUL_IEXT));
RTLIB::MUL_I64, RTLIB::MUL_I128));
break;		break;
case ISD::CTLZ_ZERO_UNDEF:		case ISD::CTLZ_ZERO_UNDEF:
switch (Node->getSimpleValueType(0).SimpleTy) {		switch (Node->getSimpleValueType(0).SimpleTy) {
default:		default:
llvm_unreachable("LibCall explicitly requested, but not available");		llvm_unreachable("LibCall explicitly requested, but not available");
case MVT::i32:		case MVT::i32:
Results.push_back(ExpandLibCall(RTLIB::CTLZ_I32, Node, false));		Results.push_back(ExpandLibCall(RTLIB::CTLZ_I32, Node, false));
break;		break;
▲ Show 20 Lines • Show All 663 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 3,908 Lines • ▼ Show 20 Lines	if (HasCarryOp) {
EVT OType = Node->getValueType(1);		EVT OType = Node->getValueType(1);
Ovf = DAG.getSetCC(dl, OType, Ovf, DAG.getConstant(0, dl, VT), ISD::SETLT);		Ovf = DAG.getSetCC(dl, OType, Ovf, DAG.getConstant(0, dl, VT), ISD::SETLT);
}		}

// Use the calculated overflow everywhere.		// Use the calculated overflow everywhere.
ReplaceValueWith(SDValue(Node, 1), Ovf);		ReplaceValueWith(SDValue(Node, 1), Ovf);
}		}

		// Emit a call to __udivei4 and friends which require
		// the arguments be based on the stack
		// and extra argument that contains the number of bits of the operands.
		// Returns the result of the call operation.
		static SDValue ExpandExtIntRes_DIVREM(const TargetLowering &TLI,
		const RTLIB::Libcall &LC,
		SelectionDAG &DAG, SDNode *N,
		const SDLoc &DL, const EVT &VT) {

		SDValue InChain = DAG.getEntryNode();

		TargetLowering::ArgListTy Args;
		TargetLowering::ArgListEntry Entry;

		// The signature of __udivei4 is
		// void __udivei4(unsigned int quo, unsigned int a, unsigned int *b,
		// unsigned int bits)
		EVT ArgVT = N->op_begin()->getValueType();
		LuoYuankeUnsubmitted Done Reply Inline Actions Just be curious. Why the alignment is 4 for all target? LuoYuanke: Just be curious. Why the alignment is 4 for all target?
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions No particular reason. The interface to the __udivei4 functions is specified as `unsigned int` right now, so I wanted this to be aligned like an `int`. I guess we should somehow obtain the alignment of an `int` on the target platform? (How?) Alternatively, we can make __udivei4 take a `uint32_t`, so we don't need to guess what an `int` is in this target, and then use `DAG.getDataLayout().getABITypeAlign(Type::getInt32Ty())`? mgehre-amd: No particular reason. The interface to the __udivei4 functions is specified as `unsigned int*`…
		assert(ArgVT.isInteger() && ArgVT.getSizeInBits() > 128 &&
		"Unexpected argument type for lowering");
		Type ArgTy = ArgVT.getTypeForEVT(DAG.getContext());

		LuoYuankeUnsubmitted Done Reply Inline Actions Not sure if the pointer is "i32 " or "i129 ". LuoYuanke: Not sure if the pointer is "i32 " or "i129 ".
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions The pointer here will be i256* (after i129 is expanded to i256). The `__udivei4` argument is `unsigned int[]` to allow for any bitsize. mgehre-amd: The pointer here will be i256* (after i129 is expanded to i256). The `__udivei4` argument is…
		LuoYuankeUnsubmitted Done Reply Inline Actions Is it reasonable to have the pointer be "i32 " so that is align the prototype of `__udivei4`? LuoYuanke:* Is it reasonable to have the pointer be "i32 *" so that is align the prototype of `__udivei4`?
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions I'm not sure whether pointer casts exists in SelectionDAG. I found `DAG.getBitcast`, but from my understanding this only does bitcasts between integer types. (?) mgehre-amd: I'm not sure whether pointer casts exists in SelectionDAG. I found `DAG.getBitcast`, but from…
		SDValue Output = DAG.CreateStackTemporary(ArgVT);
		Entry.Node = Output;
		Entry.Ty = ArgTy->getPointerTo();
		Entry.IsSExt = false;
		Entry.IsZExt = false;
		Args.push_back(Entry);

		for (const llvm::SDUse &Op : N->ops()) {
		SDValue StackPtr = DAG.CreateStackTemporary(ArgVT);
		InChain = DAG.getStore(InChain, DL, Op, StackPtr, MachinePointerInfo());
		Entry.Node = StackPtr;
		Entry.Ty = ArgTy->getPointerTo();
		Entry.IsSExt = false;
		Entry.IsZExt = false;
		Args.push_back(Entry);
		}

		int Bits = N->getOperand(0)
		.getValueType()
		.getTypeForEVT(*DAG.getContext())
		->getIntegerBitWidth();
		Entry.Node = DAG.getConstant(Bits, DL, TLI.getPointerTy(DAG.getDataLayout()));
		Entry.Ty = Type::getInt32Ty(*DAG.getContext());
		Entry.IsSExt = false;
		LuoYuankeUnsubmitted Done Reply Inline Actions I notice in RFC (https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329) the prototype of the function is `void __udivei4(unsigned int quo, unsigned int a, unsigned int b, unsigned int bits);`. I didn't find the code to pass `quo` and the code to load the value from `quo` after the call. LuoYuanke:* I notice in RFC (https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint…
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions You are right, I did something stupid here. It turns out that I'm telling here that __udivei4 returns a `i256` (or similar) where instead the first argument is the output argument providing a `i256`. In the assembly, both come out the same, so my tests worked. But I will change this here to clearly load back the result from the output parameter. mgehre-amd: You are right, I did something stupid here. It turns out that I'm telling here that __udivei4…
		Entry.IsZExt = true;
		LuoYuankeUnsubmitted Done Reply Inline Actions Will the parameter passed in register for i386? LuoYuanke: Will the parameter passed in register for i386?
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions I got this from the code hat does i128 division lowering. Its not really clear to me when to set this flag, and it seems removing it has no effect on the generated assembly. mgehre-amd: I got this from the code hat does i128 division lowering. Its not really clear to me when to…
		Args.push_back(Entry);

		LuoYuankeUnsubmitted Done Reply Inline Actions Given the return value is void, why set the S/ZExt? LuoYuanke: Given the return value is void, why set the S/ZExt?
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions Good catch, I will remove those. mgehre-amd: Good catch, I will remove those.
		SDValue Callee = DAG.getExternalSymbol(TLI.getLibcallName(LC),
		TLI.getPointerTy(DAG.getDataLayout()));

		TargetLowering::CallLoweringInfo CLI(DAG);
		CLI.setDebugLoc(DL)
		.setChain(InChain)
		.setLibCallee(TLI.getLibcallCallingConv(LC),
		Type::getVoidTy(*DAG.getContext()), Callee, std::move(Args))
		.setDiscardResult();

		SDValue Chain = TLI.LowerCallTo(CLI).second;

		return DAG.getLoad(ArgVT, DL, Chain, Output, MachinePointerInfo());
		}

void DAGTypeLegalizer::ExpandIntRes_SDIV(SDNode *N,		void DAGTypeLegalizer::ExpandIntRes_SDIV(SDNode *N,
SDValue &Lo, SDValue &Hi) {		SDValue &Lo, SDValue &Hi) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc dl(N);		SDLoc dl(N);
SDValue Ops[2] = { N->getOperand(0), N->getOperand(1) };		SDValue Ops[2] = { N->getOperand(0), N->getOperand(1) };

if (TLI.getOperationAction(ISD::SDIVREM, VT) == TargetLowering::Custom) {		if (TLI.getOperationAction(ISD::SDIVREM, VT) == TargetLowering::Custom) {
SDValue Res = DAG.getNode(ISD::SDIVREM, dl, DAG.getVTList(VT, VT), Ops);		SDValue Res = DAG.getNode(ISD::SDIVREM, dl, DAG.getVTList(VT, VT), Ops);
SplitInteger(Res.getValue(0), Lo, Hi);		SplitInteger(Res.getValue(0), Lo, Hi);
return;		return;
}		}

RTLIB::Libcall LC = RTLIB::UNKNOWN_LIBCALL;		RTLIB::Libcall LC = RTLIB::UNKNOWN_LIBCALL;
if (VT == MVT::i16)		if (VT == MVT::i16)
LC = RTLIB::SDIV_I16;		LC = RTLIB::SDIV_I16;
else if (VT == MVT::i32)		else if (VT == MVT::i32)
LC = RTLIB::SDIV_I32;		LC = RTLIB::SDIV_I32;
else if (VT == MVT::i64)		else if (VT == MVT::i64)
LC = RTLIB::SDIV_I64;		LC = RTLIB::SDIV_I64;
else if (VT == MVT::i128)		else if (VT == MVT::i128)
LC = RTLIB::SDIV_I128;		LC = RTLIB::SDIV_I128;

		else {
		SDValue Result =
		LuoYuankeUnsubmitted Done Reply Inline Actions Do we need to check type bit size exceed 128? LuoYuanke: Do we need to check type bit size exceed 128?
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions Yes, but there is already an assert in passArguments_DIVREM (now called ExpandExtIntRes_DIVREM). mgehre-amd: Yes, but there is already an assert in passArguments_DIVREM (now called ExpandExtIntRes_DIVREM).
		ExpandExtIntRes_DIVREM(TLI, RTLIB::SDIV_IEXT, DAG, N, dl, VT);
		LuoYuankeUnsubmitted Done Reply Inline Actions Is expandExtIntRes_DIVREM more readable as the function name? LuoYuanke: Is expandExtIntRes_DIVREM more readable as the function name?
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions yes, I'll rename it mgehre-amd: yes, I'll rename it
		SplitInteger(Result, Lo, Hi);
		return;
		}

assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported SDIV!");		assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported SDIV!");

TargetLowering::MakeLibCallOptions CallOptions;		TargetLowering::MakeLibCallOptions CallOptions;
CallOptions.setSExt(true);		CallOptions.setSExt(true);
SplitInteger(TLI.makeLibCall(DAG, LC, VT, Ops, CallOptions, dl).first, Lo, Hi);		SplitInteger(TLI.makeLibCall(DAG, LC, VT, Ops, CallOptions, dl).first, Lo, Hi);
}		}

void DAGTypeLegalizer::ExpandIntRes_Shift(SDNode *N,		void DAGTypeLegalizer::ExpandIntRes_Shift(SDNode *N,
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	void DAGTypeLegalizer::ExpandIntRes_SREM(SDNode *N,
if (VT == MVT::i16)		if (VT == MVT::i16)
LC = RTLIB::SREM_I16;		LC = RTLIB::SREM_I16;
else if (VT == MVT::i32)		else if (VT == MVT::i32)
LC = RTLIB::SREM_I32;		LC = RTLIB::SREM_I32;
else if (VT == MVT::i64)		else if (VT == MVT::i64)
LC = RTLIB::SREM_I64;		LC = RTLIB::SREM_I64;
else if (VT == MVT::i128)		else if (VT == MVT::i128)
LC = RTLIB::SREM_I128;		LC = RTLIB::SREM_I128;

		else {
		SDValue Result =
		ExpandExtIntRes_DIVREM(TLI, RTLIB::SREM_IEXT, DAG, N, dl, VT);
		SplitInteger(Result, Lo, Hi);
		return;
		}

assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported SREM!");		assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported SREM!");

TargetLowering::MakeLibCallOptions CallOptions;		TargetLowering::MakeLibCallOptions CallOptions;
CallOptions.setSExt(true);		CallOptions.setSExt(true);
SplitInteger(TLI.makeLibCall(DAG, LC, VT, Ops, CallOptions, dl).first, Lo, Hi);		SplitInteger(TLI.makeLibCall(DAG, LC, VT, Ops, CallOptions, dl).first, Lo, Hi);
}		}

void DAGTypeLegalizer::ExpandIntRes_TRUNCATE(SDNode *N,		void DAGTypeLegalizer::ExpandIntRes_TRUNCATE(SDNode *N,
▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	void DAGTypeLegalizer::ExpandIntRes_UDIV(SDNode *N,
if (VT == MVT::i16)		if (VT == MVT::i16)
LC = RTLIB::UDIV_I16;		LC = RTLIB::UDIV_I16;
else if (VT == MVT::i32)		else if (VT == MVT::i32)
LC = RTLIB::UDIV_I32;		LC = RTLIB::UDIV_I32;
else if (VT == MVT::i64)		else if (VT == MVT::i64)
LC = RTLIB::UDIV_I64;		LC = RTLIB::UDIV_I64;
else if (VT == MVT::i128)		else if (VT == MVT::i128)
LC = RTLIB::UDIV_I128;		LC = RTLIB::UDIV_I128;

		else {
		SDValue Result =
		ExpandExtIntRes_DIVREM(TLI, RTLIB::UDIV_IEXT, DAG, N, dl, VT);
		SplitInteger(Result, Lo, Hi);
		return;
		}

assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported UDIV!");		assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported UDIV!");

TargetLowering::MakeLibCallOptions CallOptions;		TargetLowering::MakeLibCallOptions CallOptions;
SplitInteger(TLI.makeLibCall(DAG, LC, VT, Ops, CallOptions, dl).first, Lo, Hi);		SplitInteger(TLI.makeLibCall(DAG, LC, VT, Ops, CallOptions, dl).first, Lo, Hi);
}		}

void DAGTypeLegalizer::ExpandIntRes_UREM(SDNode *N,		void DAGTypeLegalizer::ExpandIntRes_UREM(SDNode *N,
SDValue &Lo, SDValue &Hi) {		SDValue &Lo, SDValue &Hi) {
Show All 11 Lines	void DAGTypeLegalizer::ExpandIntRes_UREM(SDNode *N,
if (VT == MVT::i16)		if (VT == MVT::i16)
LC = RTLIB::UREM_I16;		LC = RTLIB::UREM_I16;
else if (VT == MVT::i32)		else if (VT == MVT::i32)
LC = RTLIB::UREM_I32;		LC = RTLIB::UREM_I32;
else if (VT == MVT::i64)		else if (VT == MVT::i64)
LC = RTLIB::UREM_I64;		LC = RTLIB::UREM_I64;
else if (VT == MVT::i128)		else if (VT == MVT::i128)
LC = RTLIB::UREM_I128;		LC = RTLIB::UREM_I128;

		else {
		SDValue Result =
		ExpandExtIntRes_DIVREM(TLI, RTLIB::UREM_IEXT, DAG, N, dl, VT);
		SplitInteger(Result, Lo, Hi);
		return;
		}

assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported UREM!");		assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported UREM!");

TargetLowering::MakeLibCallOptions CallOptions;		TargetLowering::MakeLibCallOptions CallOptions;
SplitInteger(TLI.makeLibCall(DAG, LC, VT, Ops, CallOptions, dl).first, Lo, Hi);		SplitInteger(TLI.makeLibCall(DAG, LC, VT, Ops, CallOptions, dl).first, Lo, Hi);
}		}

void DAGTypeLegalizer::ExpandIntRes_ZERO_EXTEND(SDNode *N,		void DAGTypeLegalizer::ExpandIntRes_ZERO_EXTEND(SDNode *N,
SDValue &Lo, SDValue &Hi) {		SDValue &Lo, SDValue &Hi) {
▲ Show 20 Lines • Show All 1,000 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/udivmodei5.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnuabi < %s \| FileCheck %s
				; RUN: llc -mtriple=aarch64_be-linux-gnuabi < %s \| FileCheck %s --check-prefix=CHECK-BE

				define void @udiv129(i129* %ptr, i129* %out) nounwind {
				; CHECK-LABEL: udiv129:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #112
				; CHECK-NEXT: ldp x10, x11, [x0]
				; CHECK-NEXT: stp x30, x19, [sp, #96] // 16-byte Folded Spill
				; CHECK-NEXT: mov x19, x1
				; CHECK-NEXT: mov w8, #3
				; CHECK-NEXT: ldrb w9, [x0, #16]
				; CHECK-NEXT: add x0, sp, #64
				; CHECK-NEXT: add x1, sp, #32
				; CHECK-NEXT: mov x2, sp
				; CHECK-NEXT: mov w3, #256
				; CHECK-NEXT: stp x9, xzr, [sp, #48]
				; CHECK-NEXT: stp xzr, xzr, [sp, #8]
				; CHECK-NEXT: stp xzr, x10, [sp, #24]
				; CHECK-NEXT: str x11, [sp, #40]
				; CHECK-NEXT: str x8, [sp]
				; CHECK-NEXT: bl __udivei4
				; CHECK-NEXT: ldr w8, [sp, #80]
				; CHECK-NEXT: ldp x9, x10, [sp, #64]
				; CHECK-NEXT: and w8, w8, #0x1
				; CHECK-NEXT: stp x9, x10, [x19]
				; CHECK-NEXT: strb w8, [x19, #16]
				; CHECK-NEXT: ldp x30, x19, [sp, #96] // 16-byte Folded Reload
				; CHECK-NEXT: add sp, sp, #112
				; CHECK-NEXT: ret
				;
				; CHECK-BE-LABEL: udiv129:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: sub sp, sp, #112
				; CHECK-BE-NEXT: ldp x11, x10, [x0]
				; CHECK-BE-NEXT: mov w8, #3
				; CHECK-BE-NEXT: stp x30, x19, [sp, #96] // 16-byte Folded Spill
				; CHECK-BE-NEXT: ldrb w9, [x0, #16]
				; CHECK-BE-NEXT: mov x19, x1
				; CHECK-BE-NEXT: add x0, sp, #64
				; CHECK-BE-NEXT: add x1, sp, #32
				; CHECK-BE-NEXT: stp x8, xzr, [sp, #24]
				; CHECK-BE-NEXT: mov x2, sp
				; CHECK-BE-NEXT: extr x8, x11, x10, #56
				; CHECK-BE-NEXT: lsr x11, x11, #56
				; CHECK-BE-NEXT: bfi x9, x10, #8, #56
				; CHECK-BE-NEXT: mov w3, #256
				; CHECK-BE-NEXT: stp xzr, xzr, [sp, #8]
				; CHECK-BE-NEXT: str xzr, [sp]
				; CHECK-BE-NEXT: stp x11, x8, [sp, #40]
				; CHECK-BE-NEXT: str x9, [sp, #56]
				; CHECK-BE-NEXT: bl __udivei4
				; CHECK-BE-NEXT: ldp x9, x8, [sp, #72]
				; CHECK-BE-NEXT: ldr x10, [sp, #88]
				; CHECK-BE-NEXT: extr x9, x9, x8, #8
				; CHECK-BE-NEXT: extr x8, x8, x10, #8
				; CHECK-BE-NEXT: strb w10, [x19, #16]
				; CHECK-BE-NEXT: and x9, x9, #0x1ffffffffffffff
				; CHECK-BE-NEXT: stp x9, x8, [x19]
				; CHECK-BE-NEXT: ldp x30, x19, [sp, #96] // 16-byte Folded Reload
				; CHECK-BE-NEXT: add sp, sp, #112
				; CHECK-BE-NEXT: ret
				%a = load i129, i129* %ptr
				%res = udiv i129 %a, 3
				store i129 %res, i129* %out
				ret void
				}

				define i129 @urem129(i129 %a, i129 %b) nounwind {
				; CHECK-LABEL: urem129:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #112
				; CHECK-NEXT: stp x0, x1, [sp, #32]
				; CHECK-NEXT: and x8, x2, #0x1
				; CHECK-NEXT: and x9, x6, #0x1
				; CHECK-NEXT: add x0, sp, #64
				; CHECK-NEXT: add x1, sp, #32
				; CHECK-NEXT: mov x2, sp
				; CHECK-NEXT: mov w3, #256
				; CHECK-NEXT: str x30, [sp, #96] // 8-byte Folded Spill
				; CHECK-NEXT: stp x4, x5, [sp]
				; CHECK-NEXT: stp x8, xzr, [sp, #48]
				; CHECK-NEXT: stp x9, xzr, [sp, #16]
				; CHECK-NEXT: bl __umodei4
				; CHECK-NEXT: ldp x1, x8, [sp, #72]
				; CHECK-NEXT: ldr x0, [sp, #64]
				; CHECK-NEXT: ldr x30, [sp, #96] // 8-byte Folded Reload
				; CHECK-NEXT: and x2, x8, #0x1
				; CHECK-NEXT: add sp, sp, #112
				; CHECK-NEXT: ret
				;
				; CHECK-BE-LABEL: urem129:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: sub sp, sp, #112
				; CHECK-BE-NEXT: stp x1, x2, [sp, #48]
				; CHECK-BE-NEXT: and x8, x0, #0x1
				; CHECK-BE-NEXT: and x9, x4, #0x1
				; CHECK-BE-NEXT: add x0, sp, #64
				; CHECK-BE-NEXT: add x1, sp, #32
				; CHECK-BE-NEXT: mov x2, sp
				; CHECK-BE-NEXT: mov w3, #256
				; CHECK-BE-NEXT: str x30, [sp, #96] // 8-byte Folded Spill
				; CHECK-BE-NEXT: stp x6, xzr, [sp, #24]
				; CHECK-BE-NEXT: stp x9, x5, [sp, #8]
				; CHECK-BE-NEXT: str xzr, [sp]
				; CHECK-BE-NEXT: str x8, [sp, #40]
				; CHECK-BE-NEXT: bl __umodei4
				; CHECK-BE-NEXT: ldp x8, x1, [sp, #72]
				; CHECK-BE-NEXT: ldp x2, x30, [sp, #88] // 8-byte Folded Reload
				; CHECK-BE-NEXT: and x0, x8, #0x1
				; CHECK-BE-NEXT: add sp, sp, #112
				; CHECK-BE-NEXT: ret
				%res = urem i129 %a, %b
				ret i129 %res
				}

				define i129 @sdiv129(i129 %a, i129 %b) nounwind {
				; CHECK-LABEL: sdiv129:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #112
				; CHECK-NEXT: sbfx x8, x2, #0, #1
				; CHECK-NEXT: stp x0, x1, [sp, #32]
				; CHECK-NEXT: sbfx x9, x6, #0, #1
				; CHECK-NEXT: add x0, sp, #64
				; CHECK-NEXT: add x1, sp, #32
				; CHECK-NEXT: mov x2, sp
				; CHECK-NEXT: mov w3, #256
				; CHECK-NEXT: str x30, [sp, #96] // 8-byte Folded Spill
				; CHECK-NEXT: stp x4, x5, [sp]
				; CHECK-NEXT: stp x8, x8, [sp, #48]
				; CHECK-NEXT: stp x9, x9, [sp, #16]
				; CHECK-NEXT: bl __divei4
				; CHECK-NEXT: ldp x1, x8, [sp, #72]
				; CHECK-NEXT: ldr x0, [sp, #64]
				; CHECK-NEXT: ldr x30, [sp, #96] // 8-byte Folded Reload
				; CHECK-NEXT: and x2, x8, #0x1
				; CHECK-NEXT: add sp, sp, #112
				; CHECK-NEXT: ret
				;
				; CHECK-BE-LABEL: sdiv129:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: sub sp, sp, #112
				; CHECK-BE-NEXT: sbfx x8, x0, #0, #1
				; CHECK-BE-NEXT: stp x1, x2, [sp, #48]
				; CHECK-BE-NEXT: sbfx x9, x4, #0, #1
				; CHECK-BE-NEXT: add x0, sp, #64
				; CHECK-BE-NEXT: add x1, sp, #32
				; CHECK-BE-NEXT: mov x2, sp
				; CHECK-BE-NEXT: mov w3, #256
				; CHECK-BE-NEXT: str x30, [sp, #96] // 8-byte Folded Spill
				; CHECK-BE-NEXT: stp x5, x6, [sp, #16]
				; CHECK-BE-NEXT: stp x8, x8, [sp, #32]
				; CHECK-BE-NEXT: stp x9, x9, [sp]
				; CHECK-BE-NEXT: bl __divei4
				; CHECK-BE-NEXT: ldp x8, x1, [sp, #72]
				; CHECK-BE-NEXT: ldp x2, x30, [sp, #88] // 8-byte Folded Reload
				; CHECK-BE-NEXT: and x0, x8, #0x1
				; CHECK-BE-NEXT: add sp, sp, #112
				; CHECK-BE-NEXT: ret
				%res = sdiv i129 %a, %b
				ret i129 %res
				}

				define i129 @srem129(i129 %a, i129 %b) nounwind {
				; CHECK-LABEL: srem129:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #112
				; CHECK-NEXT: sbfx x8, x2, #0, #1
				; CHECK-NEXT: stp x0, x1, [sp, #32]
				; CHECK-NEXT: sbfx x9, x6, #0, #1
				; CHECK-NEXT: add x0, sp, #64
				; CHECK-NEXT: add x1, sp, #32
				; CHECK-NEXT: mov x2, sp
				; CHECK-NEXT: mov w3, #256
				; CHECK-NEXT: str x30, [sp, #96] // 8-byte Folded Spill
				; CHECK-NEXT: stp x4, x5, [sp]
				; CHECK-NEXT: stp x8, x8, [sp, #48]
				; CHECK-NEXT: stp x9, x9, [sp, #16]
				; CHECK-NEXT: bl __modei4
				; CHECK-NEXT: ldp x1, x8, [sp, #72]
				; CHECK-NEXT: ldr x0, [sp, #64]
				; CHECK-NEXT: ldr x30, [sp, #96] // 8-byte Folded Reload
				; CHECK-NEXT: and x2, x8, #0x1
				; CHECK-NEXT: add sp, sp, #112
				; CHECK-NEXT: ret
				;
				; CHECK-BE-LABEL: srem129:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: sub sp, sp, #112
				; CHECK-BE-NEXT: sbfx x8, x0, #0, #1
				; CHECK-BE-NEXT: stp x1, x2, [sp, #48]
				; CHECK-BE-NEXT: sbfx x9, x4, #0, #1
				; CHECK-BE-NEXT: add x0, sp, #64
				; CHECK-BE-NEXT: add x1, sp, #32
				; CHECK-BE-NEXT: mov x2, sp
				; CHECK-BE-NEXT: mov w3, #256
				; CHECK-BE-NEXT: str x30, [sp, #96] // 8-byte Folded Spill
				; CHECK-BE-NEXT: stp x5, x6, [sp, #16]
				; CHECK-BE-NEXT: stp x8, x8, [sp, #32]
				; CHECK-BE-NEXT: stp x9, x9, [sp]
				; CHECK-BE-NEXT: bl __modei4
				; CHECK-BE-NEXT: ldp x8, x1, [sp, #72]
				; CHECK-BE-NEXT: ldp x2, x30, [sp, #88] // 8-byte Folded Reload
				; CHECK-BE-NEXT: and x0, x8, #0x1
				; CHECK-BE-NEXT: add sp, sp, #112
				; CHECK-BE-NEXT: ret
				%res = srem i129 %a, %b
				ret i129 %res
				}

				; Some higher sizes
				define i257 @sdiv257(i257 %a, i257 %b) nounwind {
				; CHECK-LABEL: sdiv257:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #208
				; CHECK-NEXT: ldp x8, x9, [sp, #208]
				; CHECK-NEXT: stp x2, x3, [sp, #80]
				; CHECK-NEXT: mov x2, sp
				; CHECK-NEXT: stp x0, x1, [sp, #64]
				; CHECK-NEXT: add x0, sp, #128
				; CHECK-NEXT: add x1, sp, #64
				; CHECK-NEXT: mov w3, #512
				; CHECK-NEXT: str x30, [sp, #192] // 8-byte Folded Spill
				; CHECK-NEXT: stp x8, x9, [sp, #16]
				; CHECK-NEXT: ldr x9, [sp, #224]
				; CHECK-NEXT: sbfx x8, x4, #0, #1
				; CHECK-NEXT: stp x6, x7, [sp]
				; CHECK-NEXT: sbfx x9, x9, #0, #1
				; CHECK-NEXT: stp x8, x8, [sp, #112]
				; CHECK-NEXT: stp x8, x8, [sp, #96]
				; CHECK-NEXT: stp x9, x9, [sp, #48]
				; CHECK-NEXT: stp x9, x9, [sp, #32]
				; CHECK-NEXT: bl __divei4
				; CHECK-NEXT: ldp x3, x8, [sp, #152]
				; CHECK-NEXT: ldp x0, x1, [sp, #128]
				; CHECK-NEXT: ldr x2, [sp, #144]
				; CHECK-NEXT: ldr x30, [sp, #192] // 8-byte Folded Reload
				; CHECK-NEXT: and x4, x8, #0x1
				; CHECK-NEXT: add sp, sp, #208
				; CHECK-NEXT: ret
				;
				; CHECK-BE-LABEL: sdiv257:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: sub sp, sp, #208
				; CHECK-BE-NEXT: add x8, sp, #208
				; CHECK-BE-NEXT: str x30, [sp, #192] // 8-byte Folded Spill
				; CHECK-BE-NEXT: sbfx x9, x0, #0, #1
				; CHECK-BE-NEXT: add x0, sp, #128
				; CHECK-BE-NEXT: ld1 { v0.2d }, [x8]
				; CHECK-BE-NEXT: mov x8, sp
				; CHECK-BE-NEXT: add x8, x8, #40
				; CHECK-BE-NEXT: st1 { v0.2d }, [x8]
				; CHECK-BE-NEXT: ldr x8, [sp, #224]
				; CHECK-BE-NEXT: stp x3, x4, [sp, #112]
				; CHECK-BE-NEXT: mov w3, #512
				; CHECK-BE-NEXT: stp x1, x2, [sp, #96]
				; CHECK-BE-NEXT: add x1, sp, #64
				; CHECK-BE-NEXT: stp x8, x9, [sp, #56]
				; CHECK-BE-NEXT: sbfx x8, x6, #0, #1
				; CHECK-BE-NEXT: mov x2, sp
				; CHECK-BE-NEXT: stp x9, x9, [sp, #80]
				; CHECK-BE-NEXT: str x9, [sp, #72]
				; CHECK-BE-NEXT: stp x8, x8, [sp, #8]
				; CHECK-BE-NEXT: stp x8, x7, [sp, #24]
				; CHECK-BE-NEXT: str x8, [sp]
				; CHECK-BE-NEXT: bl __divei4
				; CHECK-BE-NEXT: ldp x8, x1, [sp, #152]
				; CHECK-BE-NEXT: ldp x2, x3, [sp, #168]
				; CHECK-BE-NEXT: ldp x4, x30, [sp, #184] // 8-byte Folded Reload
				; CHECK-BE-NEXT: and x0, x8, #0x1
				; CHECK-BE-NEXT: add sp, sp, #208
				; CHECK-BE-NEXT: ret
				%res = sdiv i257 %a, %b
				ret i257 %res
				}

llvm/test/CodeGen/X86/udivmodei5.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=i686-unknown-unknown \| FileCheck %s --check-prefix=X86
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=X64
				LuoYuankeUnsubmitted Done Reply Inline Actions Generate case for i386? LuoYuanke: Generate case for i386?
				mgehre-amdAuthorUnsubmitted Done Reply Inline Actions Thanks, will do mgehre-amd: Thanks, will do

				define i129 @udiv129(i129 %a, i129 %b) nounwind {
				; X86-LABEL: udiv129:
				LuoYuankeUnsubmitted Done Reply Inline Actions I'm not sure it is legal to pass i129. It seems front-end would pass i129 by value. However it is not related to this patch. [clang]$ cat t.c _BitInt(129) foo(_BitInt(129) a) { return a++; } [clang]$ [clang]$ clang -S t.c -emit-llvm -O2 [clang]$ cat t.ll ; ModuleID = 't.c' source_filename = "t.c" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn uwtable define dso_local void @foo(i129* noalias nocapture writeonly sret(i129) align 8 %agg.result, i129* nocapture noundef readonly byval(i129) align 8 %0) local_unnamed_addr #0 { entry: %a = load i129, i129* %0, align 8, !tbaa !3 store i129 %a, i129* %agg.result, align 8, !tbaa !3 ret void } LuoYuanke: I'm not sure it is legal to pass i129. It seems front-end would pass i129 by value. However it…
				pengfeiUnsubmitted Not Done Reply Inline Actions I think we'd better to avoid passing / returning `i129` type in tests, especially the returning. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td#L221-L222 The comments said returning more than two registers is out of the ABI scope. What's more, unlike the passing arguments, returning just has 3 usable registers and cannot use stack. So it will always crash if we want to return a type larger than `i192`. pengfei: I think we'd better to avoid passing / returning `i129` type in tests, especially the returning.
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions So it will always crash if we want to return a type larger than i192. This seems like a bug we'd need to fix, yes? aaron.ballman: > So it will always crash if we want to return a type larger than i192. This seems like a bug…
				pengfeiUnsubmitted Not Done Reply Inline Actions Sorry, my mistake. It won't crash https://godbolt.org/z/f6b66xcfe. It turns out only type `i129` ~ `i192` has dubious behavior. pengfei: Sorry, my mistake. It won't crash https://godbolt.org/z/f6b66xcfe. It turns out only type…
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: pushl %ebx
				LuoYuankeUnsubmitted Done Reply Inline Actions Add nounwind to avoid generating cfi instruction. LuoYuanke: Add nounwind to avoid generating cfi instruction.
				mgehre-amdAuthorUnsubmitted Done Reply Inline Actions Great, thanks! I will add it mgehre-amd: Great, thanks! I will add it
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $104, %esp
				; X86-NEXT: movl 24(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 20(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 16(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 12(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 40(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 44(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 32(%ebp), %eax
				; X86-NEXT: movl %eax, (%esp)
				; X86-NEXT: movl 36(%ebp), %eax
				LuoYuankeUnsubmitted Done Reply Inline Actions Is there any ABI description that shows how to pass i129 parameter or return i129 value? LuoYuanke: Is there any ABI description that shows how to pass i129 parameter or return i129 value?
				mgehre-amdAuthorUnsubmitted Done Reply Inline Actions My observation is that those get passed around as pointers, and my interpretation is that they are handled like big structs. Also we only seem to generated power-of-2 sizes after SelectionDAG, so we are passing a i256 here. mgehre-amd: My observation is that those get passed around as pointers, and my interpretation is that they…
				LuoYuankeUnsubmitted Done Reply Inline Actions OK, backend would call TLI->getNumRegistersForCallingConv() to calculate how many virtual register are needed and follow the calling convention to allocate register or memory. LuoYuanke: OK, backend would call TLI->getNumRegistersForCallingConv() to calculate how many virtual…
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 28(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 48(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 8(%ebp), %esi
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %esp, %eax
				; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: leal {{[0-9]+}}(%esp), %edx
				; X86-NEXT: pushl $256 # imm = 0x100
				; X86-NEXT: pushl %eax
				; X86-NEXT: pushl %ecx
				; X86-NEXT: pushl %edx
				; X86-NEXT: calll __udivei4
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; X86-NEXT: movl %ebx, 12(%esi)
				; X86-NEXT: movl %edi, 8(%esi)
				; X86-NEXT: movl %edx, 4(%esi)
				; X86-NEXT: movl %ecx, (%esi)
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movb %al, 16(%esi)
				; X86-NEXT: movl %esi, %eax
				; X86-NEXT: leal -12(%ebp), %esp
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl $4
				;
				; X64-LABEL: udiv129:
				; X64: # %bb.0:
				; X64-NEXT: subq $104, %rsp
				; X64-NEXT: andl $1, %r9d
				; X64-NEXT: andl $1, %edx
				; X64-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq $0, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq $0, {{[0-9]+}}(%rsp)
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rsi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: movl $256, %ecx # imm = 0x100
				; X64-NEXT: callq __udivei4@PLT
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
				; X64-NEXT: andl $1, %ecx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: addq $104, %rsp
				; X64-NEXT: retq
				%res = udiv i129 %a, %b
				ret i129 %res
				}

				LuoYuankeUnsubmitted Done Reply Inline Actions Is the result returned as value or from a pointer? LuoYuanke: Is the result returned as value or from a pointer?
				mgehre-amdAuthorUnsubmitted Done Reply Inline Actions The result is written to the pointer given by the first argument. I.e. for `void __udivei4(unsigned int quo, unsigned int a, unsigned int b, unsigned int bits)`, the result is stored to `quo`. mgehre-amd:* The result is written to the pointer given by the first argument. I.e. for `void __udivei4…
				define i129 @urem129(i129 %a, i129 %b) nounwind {
				; X86-LABEL: urem129:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: pushl %ebx
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $104, %esp
				; X86-NEXT: movl 24(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 20(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 16(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 12(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 40(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 44(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 32(%ebp), %eax
				; X86-NEXT: movl %eax, (%esp)
				; X86-NEXT: movl 36(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 28(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 48(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 8(%ebp), %esi
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %esp, %eax
				; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: leal {{[0-9]+}}(%esp), %edx
				; X86-NEXT: pushl $256 # imm = 0x100
				; X86-NEXT: pushl %eax
				; X86-NEXT: pushl %ecx
				; X86-NEXT: pushl %edx
				; X86-NEXT: calll __umodei4
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; X86-NEXT: movl %ebx, 12(%esi)
				; X86-NEXT: movl %edi, 8(%esi)
				; X86-NEXT: movl %edx, 4(%esi)
				; X86-NEXT: movl %ecx, (%esi)
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movb %al, 16(%esi)
				; X86-NEXT: movl %esi, %eax
				; X86-NEXT: leal -12(%ebp), %esp
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl $4
				;
				; X64-LABEL: urem129:
				; X64: # %bb.0:
				; X64-NEXT: subq $104, %rsp
				; X64-NEXT: andl $1, %r9d
				; X64-NEXT: andl $1, %edx
				; X64-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq $0, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq $0, {{[0-9]+}}(%rsp)
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rsi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: movl $256, %ecx # imm = 0x100
				; X64-NEXT: callq __umodei4@PLT
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
				; X64-NEXT: andl $1, %ecx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: addq $104, %rsp
				; X64-NEXT: retq
				%res = urem i129 %a, %b
				ret i129 %res
				}

				define i129 @sdiv129(i129 %a, i129 %b) nounwind {
				; X86-LABEL: sdiv129:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: pushl %ebx
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $104, %esp
				; X86-NEXT: movl 24(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 20(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 16(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 12(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 40(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 44(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 32(%ebp), %eax
				; X86-NEXT: movl %eax, (%esp)
				; X86-NEXT: movl 36(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 28(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: negl %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 48(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: negl %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 8(%ebp), %esi
				; X86-NEXT: movl %esp, %eax
				; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: leal {{[0-9]+}}(%esp), %edx
				; X86-NEXT: pushl $256 # imm = 0x100
				; X86-NEXT: pushl %eax
				; X86-NEXT: pushl %ecx
				; X86-NEXT: pushl %edx
				; X86-NEXT: calll __divei4
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; X86-NEXT: movl %ebx, 12(%esi)
				; X86-NEXT: movl %edi, 8(%esi)
				; X86-NEXT: movl %edx, 4(%esi)
				; X86-NEXT: movl %ecx, (%esi)
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movb %al, 16(%esi)
				; X86-NEXT: movl %esi, %eax
				; X86-NEXT: leal -12(%ebp), %esp
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl $4
				;
				; X64-LABEL: sdiv129:
				; X64: # %bb.0:
				; X64-NEXT: subq $104, %rsp
				; X64-NEXT: andl $1, %r9d
				; X64-NEXT: negq %r9
				; X64-NEXT: andl $1, %edx
				; X64-NEXT: negq %rdx
				; X64-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rsi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: movl $256, %ecx # imm = 0x100
				; X64-NEXT: callq __divei4@PLT
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
				; X64-NEXT: andl $1, %ecx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: addq $104, %rsp
				; X64-NEXT: retq
				%res = sdiv i129 %a, %b
				ret i129 %res
				}

				define i129 @srem129(i129 %a, i129 %b) nounwind {
				; X86-LABEL: srem129:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: pushl %ebx
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $104, %esp
				; X86-NEXT: movl 24(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 20(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 16(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 12(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 40(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 44(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 32(%ebp), %eax
				; X86-NEXT: movl %eax, (%esp)
				; X86-NEXT: movl 36(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 28(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: negl %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 48(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: negl %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 8(%ebp), %esi
				; X86-NEXT: movl %esp, %eax
				; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: leal {{[0-9]+}}(%esp), %edx
				; X86-NEXT: pushl $256 # imm = 0x100
				; X86-NEXT: pushl %eax
				; X86-NEXT: pushl %ecx
				; X86-NEXT: pushl %edx
				; X86-NEXT: calll __modei4
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; X86-NEXT: movl %ebx, 12(%esi)
				; X86-NEXT: movl %edi, 8(%esi)
				; X86-NEXT: movl %edx, 4(%esi)
				; X86-NEXT: movl %ecx, (%esi)
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movb %al, 16(%esi)
				; X86-NEXT: movl %esi, %eax
				; X86-NEXT: leal -12(%ebp), %esp
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl $4
				;
				; X64-LABEL: srem129:
				; X64: # %bb.0:
				; X64-NEXT: subq $104, %rsp
				; X64-NEXT: andl $1, %r9d
				; X64-NEXT: negq %r9
				; X64-NEXT: andl $1, %edx
				; X64-NEXT: negq %rdx
				; X64-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rsi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: movl $256, %ecx # imm = 0x100
				; X64-NEXT: callq __modei4@PLT
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
				; X64-NEXT: andl $1, %ecx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: addq $104, %rsp
				; X64-NEXT: retq
				%res = srem i129 %a, %b
				ret i129 %res
				}

				; Some higher sizes
				define i257 @sdiv257(i257 %a, i257 %b) nounwind {
				; X86-LABEL: sdiv257:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: pushl %ebx
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $216, %esp
				; X86-NEXT: movl 40(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 36(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 32(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 28(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 24(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 20(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 16(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 12(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 72(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 76(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 64(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 68(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 56(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 60(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 48(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 52(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 44(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: negl %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 80(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: negl %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 8(%ebp), %esi
				; X86-NEXT: leal {{[0-9]+}}(%esp), %eax
				; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: leal {{[0-9]+}}(%esp), %edx
				; X86-NEXT: pushl $512 # imm = 0x200
				; X86-NEXT: pushl %eax
				; X86-NEXT: pushl %ecx
				; X86-NEXT: pushl %edx
				; X86-NEXT: calll __divei4
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl %ecx, (%esp) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; X86-NEXT: movl %ebx, 28(%esi)
				; X86-NEXT: movl %ecx, 24(%esi)
				; X86-NEXT: movl %edx, 20(%esi)
				; X86-NEXT: movl %edi, 16(%esi)
				; X86-NEXT: movl (%esp), %ecx # 4-byte Reload
				; X86-NEXT: movl %ecx, 12(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
				; X86-NEXT: movl %ecx, 8(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
				; X86-NEXT: movl %ecx, 4(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
				; X86-NEXT: movl %ecx, (%esi)
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movb %al, 32(%esi)
				; X86-NEXT: movl %esi, %eax
				; X86-NEXT: leal -12(%ebp), %esp
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl $4
				;
				; X64-LABEL: sdiv257:
				; X64: # %bb.0:
				; X64-NEXT: pushq %r14
				; X64-NEXT: pushq %rbx
				; X64-NEXT: subq $200, %rsp
				; X64-NEXT: movq %rdi, %rbx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: andl $1, %eax
				; X64-NEXT: negq %rax
				; X64-NEXT: andl $1, %r9d
				; X64-NEXT: negq %r9
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %r11
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %r10
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %r14
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdi
				; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r14, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r11, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r10, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rsi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: movl $512, %ecx # imm = 0x200
				; X64-NEXT: callq __divei4@PLT
				; X64-NEXT: movl {{[0-9]+}}(%rsp), %eax
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdi
				; X64-NEXT: movq %rdi, 24(%rbx)
				; X64-NEXT: movq %rsi, 16(%rbx)
				; X64-NEXT: movq %rdx, 8(%rbx)
				; X64-NEXT: movq %rcx, (%rbx)
				; X64-NEXT: andl $1, %eax
				; X64-NEXT: movb %al, 32(%rbx)
				; X64-NEXT: movq %rbx, %rax
				; X64-NEXT: addq $200, %rsp
				; X64-NEXT: popq %rbx
				; X64-NEXT: popq %r14
				; X64-NEXT: retq
				%res = sdiv i257 %a, %b
				ret i257 %res
				}

				define i1001 @srem1001(i1001 %a, i1001 %b) nounwind {
				; X86-LABEL: srem1001:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: pushl %ebx
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $496, %esp # imm = 0x1F0
				; X86-NEXT: movl 132(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 128(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 124(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 120(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 116(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 112(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 108(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 104(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 100(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 96(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 92(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 88(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 84(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 80(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 76(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 72(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 68(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 64(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 60(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 56(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 52(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 48(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 44(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 40(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 36(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 32(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 28(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 24(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 20(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 16(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 12(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 136(%ebp), %eax
				; X86-NEXT: shll $23, %eax
				; X86-NEXT: sarl $23, %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 264(%ebp), %eax
				; X86-NEXT: shll $23, %eax
				; X86-NEXT: sarl $23, %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 260(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 256(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 252(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 248(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 244(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 240(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 236(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 232(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 228(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 224(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 220(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 216(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 212(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 208(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 204(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 200(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 196(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 192(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 188(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 184(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 180(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 176(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 172(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 168(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 164(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 160(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 156(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 152(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 148(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 144(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 140(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 8(%ebp), %esi
				; X86-NEXT: leal {{[0-9]+}}(%esp), %eax
				; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: leal {{[0-9]+}}(%esp), %edx
				; X86-NEXT: pushl $1024 # imm = 0x400
				; X86-NEXT: pushl %eax
				; X86-NEXT: pushl %ecx
				; X86-NEXT: pushl %edx
				; X86-NEXT: calll __modei4
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; X86-NEXT: movl %ebx, 120(%esi)
				; X86-NEXT: movl %eax, 116(%esi)
				; X86-NEXT: movl %ecx, 112(%esi)
				; X86-NEXT: movl %edx, 108(%esi)
				; X86-NEXT: movl %edi, 104(%esi)
				; X86-NEXT: movl (%esp), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 100(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 96(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 92(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 88(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 84(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 80(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 76(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 72(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 68(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 64(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 60(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 56(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 52(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 48(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 44(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 40(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 36(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 32(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 28(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 24(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 20(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 16(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 12(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 8(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, 4(%esi)
				; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-NEXT: movl %eax, (%esi)
				; X86-NEXT: movl $511, %eax # imm = 0x1FF
				; X86-NEXT: andl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movw %ax, 124(%esi)
				; X86-NEXT: movl %esi, %eax
				; X86-NEXT: leal -12(%ebp), %esp
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl $4
				;
				; X64-LABEL: srem1001:
				; X64: # %bb.0:
				; X64-NEXT: pushq %rbp
				; X64-NEXT: pushq %r15
				; X64-NEXT: pushq %r14
				; X64-NEXT: pushq %r13
				; X64-NEXT: pushq %r12
				; X64-NEXT: pushq %rbx
				; X64-NEXT: subq $408, %rsp # imm = 0x198
				; X64-NEXT: movq %rdi, %rbx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: shlq $23, %rax
				; X64-NEXT: sarq $23, %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: shlq $23, %rax
				; X64-NEXT: sarq $23, %rax
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rsi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: movl $1024, %ecx # imm = 0x400
				; X64-NEXT: callq __modei4@PLT
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
				; X64-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
				; X64-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %r10
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %r11
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %r14
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %r15
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %r12
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %r13
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %r8
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdi
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rbp
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %r9
				; X64-NEXT: movq %r9, 112(%rbx)
				; X64-NEXT: movq %rcx, 104(%rbx)
				; X64-NEXT: movq %rbp, 96(%rbx)
				; X64-NEXT: movq %rdi, 88(%rbx)
				; X64-NEXT: movq %rsi, 80(%rbx)
				; X64-NEXT: movq %rdx, 72(%rbx)
				; X64-NEXT: movq %r8, 64(%rbx)
				; X64-NEXT: movq %r13, 56(%rbx)
				; X64-NEXT: movq %r12, 48(%rbx)
				; X64-NEXT: movq %r15, 40(%rbx)
				; X64-NEXT: movq %r14, 32(%rbx)
				; X64-NEXT: movq %r11, 24(%rbx)
				; X64-NEXT: movq %r10, 16(%rbx)
				; X64-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rcx # 8-byte Reload
				; X64-NEXT: movq %rcx, 8(%rbx)
				; X64-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rcx # 8-byte Reload
				; X64-NEXT: movq %rcx, (%rbx)
				; X64-NEXT: movl %eax, 120(%rbx)
				; X64-NEXT: shrq $32, %rax
				; X64-NEXT: andl $511, %eax # imm = 0x1FF
				; X64-NEXT: movw %ax, 124(%rbx)
				; X64-NEXT: movq %rbx, %rax
				; X64-NEXT: addq $408, %rsp # imm = 0x198
				; X64-NEXT: popq %rbx
				; X64-NEXT: popq %r12
				; X64-NEXT: popq %r13
				; X64-NEXT: popq %r14
				; X64-NEXT: popq %r15
				; X64-NEXT: popq %rbp
				; X64-NEXT: retq
				%res = srem i1001 %a, %b
				ret i1001 %res
				}

				define i129 @chain129(i129 %a, i129 %b) nounwind {
				; X86-LABEL: chain129:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: pushl %ebx
				; X86-NEXT: pushl %edi
				; X86-NEXT: pushl %esi
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $200, %esp
				; X86-NEXT: movl 24(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 20(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 16(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 12(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 40(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 44(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 32(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 36(%ebp), %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 28(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 48(%ebp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl 8(%ebp), %esi
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: leal {{[0-9]+}}(%esp), %eax
				; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: leal {{[0-9]+}}(%esp), %edx
				; X86-NEXT: pushl $256 # imm = 0x100
				; X86-NEXT: pushl %eax
				; X86-NEXT: pushl %ecx
				; X86-NEXT: pushl %edx
				; X86-NEXT: calll __udivei4
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: negl %eax
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; X86-NEXT: movl %ebx, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %edi, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %edx, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %ecx, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl $17, (%esp)
				; X86-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X86-NEXT: movl %esp, %eax
				; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: leal {{[0-9]+}}(%esp), %edx
				; X86-NEXT: pushl $256 # imm = 0x100
				; X86-NEXT: pushl %eax
				; X86-NEXT: pushl %ecx
				; X86-NEXT: pushl %edx
				; X86-NEXT: calll __divei4
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; X86-NEXT: movl %ebx, 12(%esi)
				; X86-NEXT: movl %edi, 8(%esi)
				; X86-NEXT: movl %edx, 4(%esi)
				; X86-NEXT: movl %ecx, (%esi)
				; X86-NEXT: andl $1, %eax
				; X86-NEXT: movb %al, 16(%esi)
				; X86-NEXT: movl %esi, %eax
				; X86-NEXT: leal -12(%ebp), %esp
				; X86-NEXT: popl %esi
				; X86-NEXT: popl %edi
				; X86-NEXT: popl %ebx
				; X86-NEXT: popl %ebp
				; X86-NEXT: retl $4
				;
				; X64-LABEL: chain129:
				; X64: # %bb.0:
				; X64-NEXT: subq $200, %rsp
				; X64-NEXT: andl $1, %r9d
				; X64-NEXT: andl $1, %edx
				; X64-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdi, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq $0, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq $0, {{[0-9]+}}(%rsp)
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rsi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: movl $256, %ecx # imm = 0x100
				; X64-NEXT: callq __udivei4@PLT
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: andl $1, %eax
				; X64-NEXT: negq %rax
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq $0, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq $0, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq $17, {{[0-9]+}}(%rsp)
				; X64-NEXT: movq $0, {{[0-9]+}}(%rsp)
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rsi
				; X64-NEXT: leaq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: movl $256, %ecx # imm = 0x100
				; X64-NEXT: callq __divei4@PLT
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
				; X64-NEXT: andl $1, %ecx
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
				; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
				; X64-NEXT: addq $200, %rsp
				; X64-NEXT: retq
				%res = udiv i129 %a, %b
				%res2 = sdiv i129 %res, 17
				ret i129 %res2
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integersClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 415745

llvm/include/llvm/IR/RuntimeLibcalls.def

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

llvm/test/CodeGen/AArch64/udivmodei5.ll

llvm/test/CodeGen/X86/udivmodei5.ll

[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers
ClosedPublic