This is an archive of the discontinued LLVM Phabricator instance.

[X86] Unbind the ebx with GOT address in regcall calling convention
ClosedPublic

Authored by xiangzhangllvm on Nov 7 2020, 5:45 PM.

Details

Summary

No register can be allocated for indirect call when it use regcall calling convention and passed 5/5+ args.

e.g.

call vreg (ag1, ag2, ag3, ag4, ag5, ...) --> 5 regs (EAX, ECX, EDX, ESI, EDI) used for pass args, 1 reg (EBX )used for hold GOT point,
so no regs can be allocated to vreg.

The Intel386 architecture provides 8 general purpose 32-bit registers. RA mostly
use 6 of them (EAX, EBX, ECX, EDX, ESI, EDI).
5 of this regs can be used to pass function arguments (EAX, ECX, EDX, ESI, EDI).
EBX used to hold the GOT pointer when making function calls via the PLT.
ESP and EBP usually be "reserved" in register allocation.

Reproduce case:

//clang -target i386-unknown-linux-gnu -S -fpic t.c

#define REGCALL __attribute__((regcall))
int REGCALL func (int i1, int i2, int i3, int i4, int i5);
int (REGCALL *fptr) (int, int, int, int, int) = func;
int test() {
    return fptr(1,2,3,4,5);
}

Diff Detail

Event Timeline

xiangzhangllvm created this revision.Nov 7 2020, 5:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2020, 5:45 PM
xiangzhangllvm requested review of this revision.Nov 7 2020, 5:45 PM
LuoYuanke added inline comments.Nov 7 2020, 6:00 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
4123

Would you add comments to explain why EBX is avoid in regcall?

4136

Would you add a test case for tail call? Is there any conflict to ECX?

xiangzhangllvm added inline comments.Nov 7 2020, 6:42 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
4123

I didn't see the rule in regcall calling convertion, let me check how I descript here better.

4136

I check here local before, For X86, tail call will not work with regcall, For X86_64 it will work, but X86_64 don't fix ebx with GOT point.
please check the following case, it will not generate jump for 2nd command.

1 ; llc -mtriple=x86_64-unknown-linux-gnu -relocation-model=pic
2 ; llc -mtriple=i386-unknown-linux-gnu -relocation-model=pic  <-tailcallopt>
3
4 declare x86_regcallcc void @regcall_not_lazy(i32 %a0, i32 %b0)
5
6 define void @tail_call_regcall() nounwind {
7   tail call x86_regcallcc void @regcall_not_lazy(i32 0, i32 1)
8   ret void
9 }
MaskRay added inline comments.Nov 7 2020, 7:12 PM
llvm/test/CodeGen/X86/x86-regcall-got.ll
55

The attributes can be reduced

63

Unneeded metadatas should be stripped.

LuoYuanke added inline comments.Nov 7 2020, 10:23 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
4120

To be conservative, do you need to check if the argument number exceed 4 (>=5)?

4136

It seems compiler generate jmp instruction only when the argument number is less or equal to 2 and without pic relocation model.

; llc -mtriple=x86_64-unknown-linux-gnu
; llc -mtriple=i386-unknown-linux-gnu

@foo6 = external global void (i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5)*

define void @tail_call_regcall6(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, ...) nounwind {
  %t0 = alloca i32, align 128
  %t1 = load void (i32, i32, i32, i32, i32, i32)*, void (i32, i32, i32, i32, i32, i32)** @foo6, align 4
  tail call x86_regcallcc void %t1(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5) nounwind
  ret void
}

@foo5 = external global void (i32 %0, i32 %1, i32 %2, i32 %3, i32 %4)*

define void @tail_call_regcall5(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) nounwind {
  %t1 = load void (i32, i32, i32, i32, i32)*, void (i32, i32, i32, i32, i32)** @foo5, align 4
  ; tail call x86_regcallcc void %t1(i32 0, i32 1, i32 2, i32 3, i32 4) nounwind
  tail call x86_regcallcc void %t1(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) nounwind
  ret void
}

@foo4 = external global void (i32 %0, i32 %1, i32 %2, i32 %3)*

define void @tail_call_regcall4(i32 %a, i32 %b, i32 %c, i32 %d) nounwind {
  %t1 = load void (i32, i32, i32, i32)*, void (i32, i32, i32, i32)** @foo4, align 4
  ; tail call x86_regcallcc void %t1(i32 0, i32 1, i32 2, i32 3, i32 4) nounwind
  tail call x86_regcallcc void %t1(i32 %a, i32 %b, i32 %c, i32 %d) nounwind
  ret void
}

@foo3 = external global void (i32 %0, i32 %1, i32 %2)*

define void @tail_call_regcall3(i32 %a, i32 %b) nounwind {
  %t1 = load void (i32, i32, i32)*, void (i32, i32, i32)** @foo3, align 4
  tail call x86_regcallcc void %t1(i32 0, i32 1, i32 2) nounwind
  ret void
}

@foo2 = external global void (i32 %0, i32 %1)*

define void @tail_call_regcall2(i32 %a, i32 %b) nounwind {
  %t1 = load void (i32, i32)*, void (i32, i32)** @foo2, align 4
  tail call x86_regcallcc void %t1(i32 0, i32 1) nounwind
  ; tail call x86_regcallcc void %t1(i32 %a, i32 %b) nounwind
  ret void
}
xiangzhangllvm marked 2 inline comments as not done.Nov 7 2020, 10:55 PM
xiangzhangllvm added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
4120

It is unclear to me why we need fix ebx for GOT points other than let it freely choose, (even in other calling convention),
In my eyes, It is not well/strange to bind the ebx with GOT according to argument number.

4136

I add "-tailcallopt" on your test, all jump disappeared.
The constrain of tail call should just be "variable argument lists are used" (should not according to the numbers of function).
I guess there must be some bug about tail call itself.
Anyway I'll take a deeper look to check the tail call, all my tests I checked before under pic mode.
And for the relation of ebx and GOT we only need to check pic mode.

llvm/test/CodeGen/X86/x86-regcall-got.ll
55

The attributes can be reduced

63

I'll fix it, thank you!

xiangzhangllvm added inline comments.Nov 23 2020, 8:08 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
4136

It seems compiler generate jmp instruction only when the argument number is less or equal to 2 and without pic relocation model.

 Yes, in fact, current X86 Lowering has consider the "tailcall address may be in a register", and it try to "escape" no register for allocation problem, so it limited the register number for function args.
 For PIC mode, one more register need to be "bind" to GOT, so the register number for function args should less than non-PIC mode. And PIC mode will disable tail calls to external symbols with default visibility.
So I reproduce the tail call case for PIC in following case:
; llc -mtriple=i386-unknown-linux-gnu -relocation-model=pic tail.ll

@a0 = global i32 0, align 4

define x86_regcallcc void @tail_call_regcall2(i32 %a) nounwind {
  tail call x86_regcallcc void @__regcall3__func(i32 %a) nounwind
  ret void
}

define internal x86_regcallcc void @__regcall3__func(i32 %i1) #0 {
entry:
  store i32 %i1, i32* @a0, align 4
  ret void
}

Current change did no affect on it. (tail call load the address of the callee into ECX at PIC duo to the ebx/callee-saved problem)

xiangzhangllvm marked 5 inline comments as done.Nov 28 2020, 11:31 PM
xiangzhangllvm added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
4136

Though this patch not affect tailcall, I still add test (X86/tailregccpic.ll) to make sure it.
What's more, tailcall has limited argument number for tail callee, =1 in pic mode, it has already consider the register allocation problem.

ping
@LuoYuanke I see you give the most comments, could you +1 for this patch. tks

LuoYuanke accepted this revision.Dec 1 2020, 5:37 AM

Thank you! Pls wait for 1 or 2 days to see if any other reviews have any comments.

This revision is now accepted and ready to land.Dec 1 2020, 5:37 AM

Thank you! Pls wait for 1 or 2 days to see if any other reviews have any comments.

NO problem, thank you a lot!