This is an archive of the discontinued LLVM Phabricator instance.

Bug 18582 - Offset overflow on calling __chkstk and __alloca on x64
AbandonedPublic

Authored by belleyb on Jan 6 2015, 10:25 AM.

Details

Summary

See: http://llvm.org/bugs/show_bug.cgi?id=18582

The llvm X86/Win64 code generator was generating 32-bit relative calls
when calling these special/intrinsic functions. It turns out that, under
Windows 8 (and 10), the DLL containing these functions is loaded more
than 2 GB away from our jitted code in the process virtual address space.
The MCJIT ELF symbol relocation couldn't do anything about it and that
lead to crashes.

The issue can be fixed by using indirect call via 64-bit register.

This patch includes updated regression tests.

Diff Detail

Event Timeline

belleyb updated this revision to Diff 17836.Jan 6 2015, 10:25 AM
belleyb retitled this revision from to Bug 18582 - Offset overflow on calling __chkstk and __alloca on x64.
belleyb updated this object.
belleyb edited the test plan for this revision. (Show Details)
belleyb edited the test plan for this revision. (Show Details)Jan 6 2015, 10:32 AM
belleyb added reviewers: nadav, asl, chapuni.
belleyb set the repository for this revision to rL LLVM.
belleyb added a subscriber: Unknown Object (MLST).
belleyb updated this object.Jan 6 2015, 10:41 AM
belleyb added a reviewer: rafael.
asl edited edge metadata.Jan 29 2015, 5:55 AM

Should we do this only for large code model? JIT should definitely default to it or something similar.

I believe that this should be performed in all memory models.

In fact, I now believe that it is probably also affecting non-JIT usage. Windows 8 and 10 performs some address space randomization (DLL loading and memory allocations), probably for security reasons. This makes it more likely to that an inter-DLL function call is located more than 2GB away. 32-bit PC relative relocation cannot be used to perform an inter-DLL function call (unless going through the PLT). This is true for all memory models.

The __chkstk function is an inter-DLL/JIT-module function call (unless one is linking statically to the MSVC runtime, which we can't safely assume while compiling...). So, we need to use a 64-bit safe relocation for this call...

asl added a comment.Jan 29 2015, 7:02 AM

I believe that this should be performed in all memory models.

In fact, I now believe that it is probably also affecting non-JIT usage. Windows 8 and 10 performs some address space randomization (DLL loading and memory allocations), probably for security reasons. This makes it more likely to that an inter-DLL function call is located more than 2GB away. 32-bit PC relative relocation cannot be used to perform an inter-DLL function call (unless going through the PLT). This is true for all memory models.

Is there any documentation that Windows 8 and 10 indeed use large code model?

AFAIU, the code model isn't imposed by the OS. One can use DLLs compiled with either code model on any versions of Windows. One chooses the code model for each DLL depending on whether he's expecting particular sections (.data and .text) of the DLL to be larger than 2^^24 bytes.

See: System V ABI, AMD64 Supplement, http://www.x86-64.org/documentation/abi.pdf, Section 3.5.1

For example, in the x86_64 small code model, one call only use the 32-bit PC relative call to make calls within the same DLL. When calling a function residing in a different DLL, one still has to use a 64-bit safe relocation, which amount to either loading the 64-bit address directly, loading it for a GOT table or jumping to a PLT stub containing the 64-bit call.

I'm a little confused here.

The symbols __alloca and __chkstk come from a DLL which means that referring to them goes to a thunk which will then dereference __imp__chkstk and __imp__alloca. The thunks are built by the linker which means that they must be accessible to your EXE or DLL regardless how far away you are from the C runtime's DLL.

I guess my first question would be, how does the symbol for __chkstk work out when running on the JIT? Do we literally find out where the CRT's __chkstk function is in memory and use that address? If so, how about we say that these calls should be indirect if we are using CodeModel::Large ?

majnemer wrote:

I'm a little confused here.

Me also! :-)

The symbols alloca and chkstk come from a DLL which means that referring to them goes to a thunk which will then dereference impchkstk and impalloca. The thunks are built by the linker which means that they must be accessible to your EXE or DLL regardless how far away you are from the C runtime's DLL.

Probably... I'll have to investigate.

I guess my first question would be, how does the symbol for chkstk work out when running on the JIT? Do we literally find out where the CRT's chkstk function is in memory and use that address?

Yes, that's what the problem is...

If so, how about we say that these calls should be indirect if we are using CodeModel::Large ?

I'm ok about the indirection. But, I really don't understand why going through that impchkstk indirection wouldn't be necessary in the CodeModel::Small and CodeModel::Medium also. AFAIU, this is an X86_64 specific issue, not a large model issue...

rnk edited edge metadata.Jan 29 2015, 10:10 AM

We shouldn't be making this call unconditionally indirect. We should be using whatever strategy MCJIT is using to reference external symbols. I assume it does so by creating a PLT or something like it.

lib/Target/X86/X86FrameLowering.cpp
767–772

You left behind the old call op, it seems.

Adding Lang to the review so that he can comment on the MCJIT requirements...

belleyb added a comment.EditedJan 29 2015, 1:27 PM

Thanks guys for the comments. I will experiment a little bit more and come back once I have a better understanding. It might also be a difference between the behaviour of the COFF vs ELF object files under Windows.

There is a guiding principle though that I'd like to discuss first. AFAIK, the MCJIT engine does not perform any special configuration of the target backends. It uses a vanilla pass manager to create an in-memory object file (ELF or Mach-O). That object file is essentially no different than the one that would have been generated by llc -filetype=obj. I believe that this is a desirable design goal. So, the content of the object file is either right or wrong. The answer should not depend on the intended use of the object file (JIT execution or offline compilation).

In addition, and AFAIK, it is currently a limitation of the RuntimeDyLd symbol relocators that we have to use CodeModel::Large. It is also my understanding that it is a limitation that might be lifted in the future. The corollary is that we shouldn't make any assumption that CodeModel::Large implies MCJIT or vice-versa.

Is my understanding of those parts correct ?

rnk added a comment.Jan 29 2015, 2:25 PM

I went ahead and put together a patch that I think does the right thing here:
http://reviews.llvm.org/D7267

Note it doesn't touch EmitLoweredWinAlloca, but that code is a mess and IMO should be refactored to use the fixed code separately.

Wow! Thanks so much... I am downloading your proposed patch and testing it out inside our product on Windows 7, 8 and 10...

lhames edited edge metadata.Jan 29 2015, 4:56 PM

I'm not sure about the static case but when JITing MCJIT should synthesize a PLT entry for you, so you should be ok to emit a direct call.

Thanks Lang, I will investigate some more next week. I'd like to go to the bottom of this issue and understand it thoroughly.

belleyb abandoned this revision.Jun 21 2017, 10:56 AM

Closing. This patch is now irrelevant.