This is an archive of the discontinued LLVM Phabricator instance.

Teach fast isel about thiscall (and callee-pop) calls.
ClosedPublic

Authored by thakis on Jul 13 2016, 2:05 PM.

Details

Reviewers
rnk
Summary

Testing for this is a bit ugly: -fast-isel-abort=1 doesn't abort on calls, and -fast-isel-abort=2 aborts both on calls and on parameters (which fail to be lowered), so it can't be used. Instead, look at the output of -fast-isel-verbose.

Depends on http://reviews.llvm.org/D22314, else test/CodeGen/X86/win32_sret.ll fails due to (harmless) changes in the generated code.

Diff Detail

Event Timeline

thakis updated this revision to Diff 63857.Jul 13 2016, 2:05 PM
thakis retitled this revision from to Teach fast isel about thiscall (and callee-pop) calls..
thakis updated this object.
thakis added a reviewer: rnk.
thakis added a subscriber: llvm-commits.
rnk added inline comments.Jul 13 2016, 3:26 PM
lib/Target/X86/X86FastISel.cpp
3028

Maybe toss stdcall on here, since that's easier than thiscall.

3117

"getAlignedCallFrameSize" makes me suspicious

3337

I don't think this is true if stack realignment is active. NumBytes is the argument size rounded up to stack alignment. Make a test case with alloca align 32 to see if there's a bug.

Thanks for the review!

lib/Target/X86/X86FastISel.cpp
3028

I wanted to do that in a separate change (here and in X86SelectRet()) since it feels like a separate thing.

3337

It seems to work (if I understood your example request correctly – this is with a locally-hacked up version that supports stdcall in fast isel as well):

Nicos-MBP:llvm-build thakis$ cat test.ll
source_filename = "fastisel.cc"
target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"
target triple = "i386-pc-windows-msvc18.0.0"

define void @myfun() #0 {
entry:
  %i = alloca i32, align 32
  store i32 43, i32* %i, align 4
  %0 = load i32, i32* %i, align 4
  call x86_stdcallcc void @stdcallfun(i32 %0)
  ret void
}

declare x86_stdcallcc void @stdcallfun(i32) #1
Nicos-MBP:llvm-build thakis$ bin/llc -no-x86-call-frame-opt  -O0 -fast-isel-verbose < test.ll
...
_myfun:                                 # @myfun
# BB#0:                                 # %entry
	pushl	%ebp
	movl	%esp, %ebp
	andl	$-32, %esp
	subl	$64, %esp
	movl	$43, 32(%esp)
	movl	32(%esp), %eax
	movl	%eax, (%esp)
	calll	_stdcallfun@4
	movl	%ebp, %esp
	popl	%ebp
	retl

This matches what X86ISelLowering.cpp does, and I built a few test binaries with this that worked too. Intuitively, I think it works because the call frame is aligned as much as the parameters are, not as much as the arguments are – and the callee-pop pops the parameter size.

rnk added inline comments.Jul 13 2016, 5:30 PM
lib/Target/X86/X86FastISel.cpp
3337

I see, CCInfo.getAlignedCallFrameSize() doesn't round up to stack alignment, it rounds up to the alignment of the parameters. I was concerned about what when the outgoing stack alignment is 16 instead of 4, and we need to emit cleanup adjustments. This probably only happens on Linux, so this is something like the test case I was thinking about:

$ cat t.ll
target triple = "i686-linux"
declare x86_stdcallcc void @stdcallfun(i32*)
define void @myfun(i32 %n) {
entry:
  %i = alloca i32
  call x86_stdcallcc void @stdcallfun(i32* %i)
  %j = alloca i32, i32 %n
  call x86_stdcallcc void @stdcallfun(i32* %j)
  store volatile i32 0, i32* %i
  ret void
}

$ llc t.ll -o - -mtriple=i686-linux -no-x86-call-frame-opt
...
        subl    $16, %esp
        leal    -8(%ebp), %eax
        movl    %eax, (%esp)
        calll   stdcallfun
        addl    $12, %esp

        movl    %esp, %eax
        leal    15(,%esi,4), %ecx
        andl    $-16, %ecx
        subl    %ecx, %eax
        movl    %eax, %esp

        subl    $16, %esp  # ADJCALLSTACKDOWN uses 16 bytes
        movl    %eax, (%esp)
        calll   stdcallfun
        addl    $12, %esp    # The callee only pops 4 bytes, not 16, so we need to add 12 to esp
...
3338

This computeBytesPoppedByCallee function only handles popping hidden sret parameters, which has to be one of the weirdest 32-bit ABI corner cases ever. Maybe rename it computeBytesPoppedByCalleeForSRet?

test/CodeGen/X86/fast-isel-call.ll
59

We should check for correct code

64

If you add a store to an alloca after the call, it will ensure that the call sequence end stack adjustment isn't eliminated as dead code.

66

I would suggest testing with dynamic allocas, but fastisel can't handle them. =/

thakis updated this revision to Diff 63895.Jul 13 2016, 6:26 PM

address comments

test/CodeGen/X86/fast-isel-call.ll
59

Done. (e.g. win32_sret.ll already checked for that, but having more testing doesn't hurt of course.)

64

Does that happen in fast isel mode? As far as I can tell, it doesn't.

rnk accepted this revision.Jul 13 2016, 6:32 PM
rnk edited edge metadata.

lgtm

test/CodeGen/X86/fast-isel-call.ll
64

Nevermind, this comment isn't relevant to the actual code sequence you're checking.

This revision is now accepted and ready to land.Jul 13 2016, 6:32 PM
thakis closed this revision.Jul 13 2016, 7:00 PM

r275360, thanks!