Page MenuHomePhabricator

[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers
ClosedPublic

Authored by mgehre-amd on Feb 22 2022, 7:39 AM.

Diff Detail

Event Timeline

mgehre-amd created this revision.Feb 22 2022, 7:39 AM
mgehre-amd requested review of this revision.Feb 22 2022, 7:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 7:39 AM
LuoYuanke added inline comments.Feb 22 2022, 11:23 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3934

Just be curious. Why the alignment is 4 for all target?

3938

Not sure if the pointer is "i32 *" or "i129 *".

3963

Will the parameter passed in register for i386?

3965

Given the return value is void, why set the S/ZExt?

llvm/test/CodeGen/X86/udivmodei5.ll
3

Generate case for i386?

29

Is there any ABI description that shows how to pass i129 parameter or return i129 value?

mgehre-amd added inline comments.Feb 23 2022, 12:15 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3934

No particular reason. The interface to the __udivei4 functions is specified as unsigned int* right now, so I wanted this to be aligned like an int.
I guess we should somehow obtain the alignment of an int on the target platform? (How?)

Alternatively, we can make __udivei4 take a uint32_t*, so we don't need to guess what an int is in this target,
and then use DAG.getDataLayout().getABITypeAlign(Type::getInt32Ty())?

3938

The pointer here will be i256* (after i129 is expanded to i256).
The __udivei4 argument is unsigned int[] to allow for any bitsize.

3963

I got this from the code hat does i128 division lowering. Its not really clear to me when to set this flag, and it seems removing it has no effect on the generated assembly.

3965

Good catch, I will remove those.

llvm/test/CodeGen/X86/udivmodei5.ll
3

Thanks, will do

29

My observation is that those get passed around as pointers, and my interpretation is that they are handled like big structs. Also we only seem to generated power-of-2 sizes after SelectionDAG, so we are passing a i256 here.

Add test case of i386
Remove SExt, ZExt and setInRegister from CallLoweringInfo

mgehre-amd marked 3 inline comments as not done.Feb 23 2022, 12:18 AM
LuoYuanke added inline comments.Feb 23 2022, 1:54 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3938

Is it reasonable to have the pointer be "i32 *" so that is align the prototype of __udivei4?

llvm/test/CodeGen/X86/udivmodei5.ll
29

OK, backend would call TLI->getNumRegistersForCallingConv() to calculate how many virtual register are needed and follow the calling convention to allocate register or memory.

LuoYuanke added inline comments.Feb 23 2022, 3:38 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3962

I notice in RFC (https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329) the prototype of the function is void __udivei4(unsigned int *quo, unsigned int *a, unsigned int *b, unsigned int bits);. I didn't find the code to pass quo and the code to load the value from quo after the call.

LuoYuanke added inline comments.Feb 23 2022, 3:40 AM
llvm/test/CodeGen/X86/udivmodei5.ll
99

Is the result returned as value or from a pointer?

mgehre-amd added inline comments.Feb 23 2022, 5:43 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3962

You are right, I did something stupid here. It turns out that I'm telling here that __udivei4 returns a i256 (or similar) where instead the first argument is the output argument providing a i256.
In the assembly, both come out the same, so my tests worked. But I will change this here to clearly load back the result from the output parameter.

llvm/test/CodeGen/X86/udivmodei5.ll
99

The result is written to the pointer given by the first argument. I.e. for void __udivei4(unsigned int *quo, unsigned int *a, unsigned int *b, unsigned int bits), the result is stored to quo.

  • Don't declare the return value of __divei4 to be the large integer. Use the first output parameter.
  • Don't hard code the alignment of the arguments to be 4
mgehre-amd updated this revision to Diff 413457.Mar 7 2022, 7:08 AM

Add a test case for aarch64 (showing big endian)

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 7:08 AM
LuoYuanke added inline comments.Mar 9 2022, 1:08 AM
llvm/include/llvm/IR/RuntimeLibcalls.def
50

Why set mul? Is there any test case for it?

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2114

Do we support bitsize that is no power of 2 (e.g., MVT::i1)? Should we check if the bit size exceed 128?

4350–4351

Why touch MUL? Can't MUL be splitted by Codegen?

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4004

Do we need to check type bit size exceed 128?

4005

Is expandExtIntRes_DIVREM more readable as the function name?

llvm/test/CodeGen/X86/udivmodei5.ll
10

Add nounwind to avoid generating cfi instruction.

LuoYuanke added inline comments.Mar 9 2022, 1:57 AM
llvm/test/CodeGen/X86/udivmodei5.ll
6

I'm not sure it is legal to pass i129. It seems front-end would pass i129 by value. However it is not related to this patch.

[clang]$ cat t.c
_BitInt(129) foo(_BitInt(129) a) {
  return a++;
}
[clang]$
[clang]$ clang -S t.c -emit-llvm -O2
[clang]$ cat t.ll
; ModuleID = 't.c'
source_filename = "t.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn uwtable
define dso_local void @foo(i129* noalias nocapture writeonly sret(i129) align 8 %agg.result, i129* nocapture noundef readonly byval(i129) align 8 %0) local_unnamed_addr #0 {
entry:
  %a = load i129, i129* %0, align 8, !tbaa !3
  store i129 %a, i129* %agg.result, align 8, !tbaa !3
  ret void
}

Is there any update on the patch?

aaron.ballman added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2114

Yes, _BitInt can be of any width: https://godbolt.org/z/4zGfrosc7

mgehre-amd marked 6 inline comments as done.

Rename to ExpandExtIntRes_DIVREM
Add nounwind

llvm/include/llvm/IR/RuntimeLibcalls.def
50

see my other comment

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2114

It seems that SelectionDAG through some magic always first expands the integer type to a power of two before coming here and expanding into a lib call.

So i1 would be expanded into i8. For this reason, only divisions on types bigger than i128 currently crash.

For types bigger than i128, SelectionDAG also expands them to the next power of two, i.e. i129 becomes i256.
I didn't want to change this, because it makes implementing the lib function easier.

But for efficiency, we can try to disable this and pass the original non-power-of-2 type to the libcall in a future PR.

4350–4351

Yes, MUL will be splitted by codegen and never get here afaik, but I still needed to add a MUL_IEXT constant
because this uses the same ExpandIntLibCall which is used for the division/remainder.
And I had to extend ExpandIntLibCall by an extra argument to handle the > 128 bit case.

I'm not really sure why this code is here at all, and whether MUL is actually expanded to a libcall on any target.

I'm open for suggestions, I don't know a lot about SelectionDAG.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4004

Yes, but there is already an assert in passArguments_DIVREM (now called ExpandExtIntRes_DIVREM).

4005

yes, I'll rename it

llvm/test/CodeGen/X86/udivmodei5.ll
10

Great, thanks! I will add it

mgehre-amd marked 5 inline comments as done.Mar 11 2022, 7:41 AM
mgehre-amd marked 7 inline comments as done.Mar 11 2022, 7:49 AM
mgehre-amd added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
3938

I'm not sure whether pointer casts exists in SelectionDAG. I found DAG.getBitcast, but from my understanding this only does bitcasts between integer types. (?)

LuoYuanke accepted this revision.Mar 11 2022, 8:42 PM

LGTM, thanks.

This revision is now accepted and ready to land.Mar 11 2022, 8:42 PM
mgehre-amd marked an inline comment as done.Mar 14 2022, 6:01 AM

@LuoYuanke, thank you very much for the review and comments!

Should I wait for somebody else to approve the PR too, or can I go ahead and push?

@LuoYuanke, thank you very much for the review and comments!

Should I wait for somebody else to approve the PR too, or can I go ahead and push?

Thanks for the patch. Maybe wait 1 or 2 days in case there are comments from other reviewers. If there is no objection I think you can push to patch.

pengfei added inline comments.Mar 14 2022, 7:35 AM
llvm/test/CodeGen/X86/udivmodei5.ll
6

I think we'd better to avoid passing / returning i129 type in tests, especially the returning.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td#L221-L222
The comments said returning more than two registers is out of the ABI scope. What's more, unlike the passing arguments, returning just has 3 usable registers and cannot use stack. So it will always crash if we want to return a type larger than i192.

aaron.ballman added inline comments.Mar 14 2022, 11:58 AM
llvm/test/CodeGen/X86/udivmodei5.ll
6

So it will always crash if we want to return a type larger than i192.

This seems like a bug we'd need to fix, yes?

pengfei added inline comments.Mar 14 2022, 10:20 PM
llvm/test/CodeGen/X86/udivmodei5.ll
6

Sorry, my mistake. It won't crash https://godbolt.org/z/f6b66xcfe. It turns out only type i129 ~ i192 has dubious behavior.

Can we land this patch? I think we can update test case after that if there is better idea for the parameter passing/returning.

pengfei accepted this revision.Mar 15 2022, 6:41 PM

Can we land this patch? I think we can update test case after that if there is better idea for the parameter passing/returning.

I'm fine with it. If fact, I think we may need to reflect the ABI lowering in the backend. That says the i129 will be good tests for the future work, though we may not work on it very recently.

Can we land this patch? I think we can update test case after that if there is better idea for the parameter passing/returning.

I'm fine with it. If fact, I think we may need to reflect the ABI lowering in the backend. That says the i129 will be good tests for the future work, though we may not work on it very recently.

FYI. X86 psABI has defined the representation of _BitInt(N) https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/ceff85b232117f15296da5a4ecc98e25a0547093
I can see both front end and backend need to change. FE should always emit i(((N + 63) / 64) * 64) type, while BE should take all non pow of 2 i(((N + 63) / 64) * 64) type as legal.
Thought?