Add _alloca routine, since that's what X86FrameLowering::emitStackProbeCall actually uses to probe stack on Win32.
Also, replace or instructions with test. The latter should be marginally more efficient, as it doesn't write into memory.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
After playing with this some more - it's no fun linking with gcc-generated libraries, because they actually do use __checkstk_ms, so you have to pull in libgcc as well, and end up with duplicate symbols. Let's add _chkstk as an extra, not as replacement.
On second thought, elimination of probe for the last page is not correct - that final push of the return address could land on the next page...
Also, I've figured out a better way to preserve eax.
Here's how I understand the various functions:
- __chkstk from msvcrt: takes size in RAX, probes all pages, returns preserving all registers.
- __chkstk_ms from mingw: copies __chkstk from msvcrt
- _alloca / __chkstk from mingw: helpers for implementing C alloca() with custom conventions. They trash volatile registers (ECX/RCX), update the SP, and return a new object address in RAX.
I'd rather simplify LLVM's X86FrameLowering to always use __chkstk / __chkstk_ms than add support for __alloca and Mingw __chkstk. It reduces the amount of assembly in compiler-rt, and simplifies LLVM in one step. The downside is that the compiler generates more code around dynamic allocas, which really doesn't matter.
What do you think?
lib/builtins/i386/chkstk.S | ||
---|---|---|
22 | I guess MSVC uses test, so this is fine. |
I think they trash RAX/EAX as well. That EAX contains a pointer to the allocated space on return is just an accident. I took care to preserve EAX because comments in LLVM's emitStackProbeCall say so; not sure if this is used anywhere in LLVM.
I'd rather simplify LLVM's X86FrameLowering to always use __chkstk / __chkstk_ms than add support for __alloca and Mingw __chkstk. It reduces the amount of assembly in compiler-rt, and simplifies LLVM in one step. The downside is that the compiler generates more code around dynamic allocas, which really doesn't matter.
What do you think?
I personally would be fine with this.
However, the description of builtins module on the compiler-rt page says: "builtins provides full support for the libgcc interfaces on supported targets". If compiler-rt is a drop-in for for libgcc, it should provide all functions that libgcc does. If this is not the case, then linking a binary from a mix of LLVM and GCC objects would require linking both to libgcc and to compiler-rt, which, as I've mentioned above, creates a problem with duplicate symbols.
To resolve this in a principled way, we'd probably need to rename all compiler-rt exports. LLVM would also need a new switch to allow targeting either runtime. A new set of triples for "-llvm" environment? This is a bigger can of worms than I'd like to open. :-)
Hm, that's pretty compelling. I can easily imagine someone having a few objects built with GCC and wanting to use compiler-rt.
@rnk, Sounds like compiler-rt *is* a drop-in replacement for libgcc, so let's add 64-bit versions of chkstk/alloca for gcc objects.
i386/chkstk.S)
i386/chkstk2.S)
should have been
i386/chkstk.S
i386/chkstk2.S)
and same for x64
fixed in rL251931.
I guess MSVC uses test, so this is fine.