This is an archive of the discontinued LLVM Phabricator instance.

128-bit ABI for x86_64-w64-mingw32 incorrectly generated
ClosedPublic

Authored by vtjnash on Oct 21 2013, 9:48 PM.

Details

Reviewers
vtjnash
Summary

Bug #16168

Summary: The calling convention of 128-bit integers generated for Windows x86-64 appears to be incorrectly using an approximation of the Linux ABI instead of the Windows ABI.

The llvm math intrinsics need to be updated to use call-by-reference for i128 parameters, and need to expect the return value in the XMM0 register, to be compatible with GCC and Microsoft on the Win64 platform.

Diff Detail

Event Timeline

asl added a comment.Oct 21 2013, 10:53 PM

This is very invasive patch which definitely cannot be accepted as-is now, simply because it brings target-specific stuff inside the target-independent code.

Also, the patch has not tests at all, so it's hard to see, which problem the patch wants to solve (and thus it's not possible to suggest the alternative solution).

Start by adding the tests and we'll see how it will need to be reworked.

majnemer added inline comments.Oct 21 2013, 11:44 PM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1872 ↗(On Diff #5076)

Remove this commented out printfstatement.

1883 ↗(On Diff #5076)

What is the point of this? It seems to always set isTailCall to false.

1895 ↗(On Diff #5076)

No braces needed.

1945 ↗(On Diff #5076)

Remove this commented out printfstatement.

2004 ↗(On Diff #5076)

Remove this commented out printfstatement.

2026 ↗(On Diff #5076)

No need for braces around a single statement.

2156 ↗(On Diff #5076)

Remove this commented out printfstatement.

2184 ↗(On Diff #5076)

No need for braces around a single statement.

lib/CodeGen/SelectionDAG/LegalizeTypes.cpp
1089 ↗(On Diff #5076)

Remove this commented out printfstatement.

1112 ↗(On Diff #5076)

No need for braces around a single statement.

lib/CodeGen/SelectionDAG/TargetLowering.cpp
88 ↗(On Diff #5076)

Remove this commented out printfstatement.

109 ↗(On Diff #5076)

No need for braces around a single statement.

I am addressing the segfault in the following code snippet caused by the srem instruction using the wrong calling convention (GCC fixed this bug in 4.7). I have not been able to run the test suite on Windows (because my choices of python either don't support unix paths (native install), or don't support import threading (msys2 install)), but this is presumably already one of the tests.

#include <stdint.h>

int main() {
  volatile __int128 x = 1;
  return x % 3;
}
; ModuleID = 'i128.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-w64-mingw32"

define i32 @main() nounwind uwtable {
  %x = alloca i128, align 16
  store volatile i128 1, i128* %x, align 16
  %x.0.load1 = load volatile i128* %x, align 16
  %1 = srem i128 %x.0.load1, 3
  %2 = trunc i128 %1 to i32
  ret i32 %2
}

Note that there are 6 identical copies of the same patch here because there were 6 nearly identical functions in the code makeLibCall/ExpandLibCall/etc. . It seems all of these functions should be rewritten as 1 function. I mention that because I find it easier to refer to this as a change to a single function, while assuming the reader understands that that function exists in 6 locations.

@asl I agree that it is unfortunate that this is encoding target specific information in this file. Let me explain why I put it here: After this function is called, the function has been translated to the "llvm calling convention" and the information necessary to map that information to the calling convention of the target platform has been destroyed. However, before this function is called the information on what function is getting called and it's arguments is somewhat scattered. I would like to see the LLVM calling convention actually match the platform C ABI, so that it could be trivially used from other front-ends -- but I have seen mentioned that the LLVM calling convention is necessary for efficient optimization and won't be changed. Perhaps this information (ByRef vs. ByVal) could be part of RTLIB::Libcall, but I'm not sufficiently familiar with LLVM to make that modification.
To put this another way, makeLibCall is translating an LLVM opcode into a call to a C library function, and thus needs to know how to properly translate the C calling convention into LLVM.

@majnemer thanks for the feedback. I will be sure to incorporate those fixes along the way.

Is there any possibility of this moving forward? This patch is crucial to get Julia to function properly on Windows.

Seconded. How can we help improve the patch and get this problem fixed?

asl added a comment.Apr 18 2014, 11:23 PM

Just start from simple things: address the review comments.

I'm running the test suite on Windows 7 without problems, using standard Python 2.7.6 32 bit from python.org and running the tests from regular command line, not from MSYS.

Assuming llvm in c:\llvm was built RelWithDebInfo and Pyhton in c:\Python27, the commands to run all tests would be

cd c:\llvm
c:\Python27\python msvc/RelWithDebInfo/bin/llvm-lit.py test

To run just the CodeGen tests

cd c:\llvm
c:\Python27\python msvc/RelWithDebInfo/bin/llvm-lit.py test\CodeGen

Can you modify one of the tests (or create a new one) to show the problem?
possibly mul128.ll or i128-ret.ll would do.

thanks yaron.keren, it sounds like I need to rebuild using MSVC. when that finishes, I will add a test case here

@asl since the comments were mostly formatting, I was planning on addressing them when updating the patch for a more recent version of llvm. do you have a preference for what version (or svn checkout) that I retarget this for?

do you have some thoughts on where to locate this patch in the llvm source code? and how I might be able to modify it to address your previous comment that "it brings target-specific stuff inside the target-independent code"? I would like to also consolidate the various copies makeLibCall, but was initially trying to minimize the amount of code affected to help with merging

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1883 ↗(On Diff #5076)

I was avoiding writing the memory location for the second i128 argument. I guess that's not actually necessary. I wasn't sure if this function could still be tail-called when it had arguments on the stack, so I disabled it to be safe.

asl added a comment.Apr 19 2014, 11:03 AM

@vtjnash obviously, everything should be moved into X86 target. Any check for "win64" and similar stuff inside the common codegen is not acceptable at all. However, first you need to show that there is indeed a problem somewhere - I believe that the calling convention bits for 128 bit stuff is correct there.

I'm having trouble finding a button to update this diff. However, here's my new test, showing the code generated before/after the patch:

; RUN: llc < %s -mtriple=x86_64-linux | FileCheck %s -check-prefix=X86-64
; RUN: llc < %s -mtriple=x86_64-win32 | FileCheck %s -check-prefix=WIN64

define i128 @mod128(i128 %x) nounwind {
; X86-64: movl	$3, %edx
; X86-64: xorl	%ecx, %ecx
; X86-64: callq	__modti3
    
; WIN64-NOT: movl	$3, %r8d
; WIN64-NOT: xorl	%r9d, %r9d
; WIN64: leaq    -72(%rsp), %rsp
; WIN64: movq    %rdx, 56(%rsp)
; WIN64: leaq    32(%rsp), %rdx
; WIN64: movq    %rcx, 48(%rsp)
; WIN64: leaq    48(%rsp), %rcx
; WIN64: movq    $0, 40(%rsp)
; WIN64: movq    $3, 32(%rsp)
; WIN64: callq   __modti3
; WIN64: movd    %xmm0, %rax
; WIN64: punpckhqdq      %xmm0, %xmm0
; WIN64: movd    %xmm0, %rdx
; WIN64: leaq    72(%rsp), %rsp

  %1 = srem i128 %x, 3
  ret i128 %1
}

(this is based on my previous test code posted above, which segfaults without the patch and returns the correct answer with the patch)

asl added a comment.Apr 19 2014, 12:17 PM

@vtjnash: I'm confused. Twice. In particular:

  1. It looks like you're trying to match VCPP here, since you're using "-win32" target triple. However, you're calling __modti3 here, which is definitely not provided by VCPP...
  2. Win64 ABI does not define the calling convention for __int128 (see http://msdn.microsoft.com/en-us/library/zthk2dkh.aspx). So, will you please indicate the document describing the desired behavior?

@vtjnash obviously, everything should be moved into X86 target. Any check for "win64" and similar stuff inside the common codegen is not acceptable at all. However, first you need to show that there is indeed a problem somewhere - I believe that the calling convention bits for 128 bit stuff is correct there.

I understand and agree. However, CodeGen here is implicitly assuming a calling convention for i128 types without acknowledging that it is doing target specific work -- in particular, it assumes that the library intrinsics functions (e.g. __modti3, provided by gcc-4.8) operate on registers, whereas WIN64 i128 uses a different calling convention (passed as pointers but returned in xmm0).

It looks like you're trying to match VCPP here, since you're using "-win32" target triple. However, you're calling __modti3 here, which is definitely not provided by VCPP...

True, I was running the tests on MinGW64 and OSX while I am waiting for the MSVC build to finish. LLVM is creating the call to __modti3, based on the llvm srem intrinsic.

Win64 ABI does not define the calling convention for __int128 (see http://msdn.microsoft.com/en-us/library/zthk2dkh.aspx). So, will you please indicate the document describing the desired behavior?

it does not explicitly call out i128, but it does state that anything over 8 bytes is unconditionally passed as a pointer. also relevant:
https://groups.google.com/forum/#!topic/llvm-dev/wZg9CnTevFI

Jameson, to update the diff: click the Create diff button at the top right, paste the complete diff (not just the new test) and in the next screen instead of create new revision select atttach to revision D1998.

It looks like you're trying to match VCPP here, since you're using "-win32" target triple. However, you're calling __modti3 here, which is definitely not provided by VCPP...

It appears that MSVC-compiled LLVM generates identical code (including callq __modti3), although it moves one of the instructions, so my initial tests do not succeed, without reordering the assembly instructions in the test to match:

; WIN64-NOT: movl       $3, %r8d
; WIN64-NOT: xorl       %r9d, %r9d
; WIN64: leaq    -72(%rsp), %rsp
; WIN64: movq    %rdx, 56(%rsp)
; WIN64: movq    %rcx, 48(%rsp)
; WIN64: leaq    48(%rsp), %rcx
; WIN64: leaq    32(%rsp), %rdx # <= this instruction moved
; WIN64: movq    $0, 40(%rsp)
; WIN64: movq    $3, 32(%rsp)
; WIN64: callq   __modti3
; WIN64: movd    %xmm0, %rax
; WIN64: punpckhqdq      %xmm0, %xmm0
; WIN64: movd    %xmm0, %rdx
; WIN64: leaq    72(%rsp), %rsp

note that __modti3 can be optained as part of llvm (http://llvm.org/svn/llvm-project/compiler-rt/trunk/lib/builtins/modti3.c) for compilers where it is not present, although my initial impression is that it would need to be ported to win32/pe-coff

vtjnash updated this revision to Unknown Object (????).Apr 19 2014, 9:03 PM

thank you @yaron.keren

this updates the diff to address @majnemer's comments and add the test file

vtjnash updated this revision to Unknown Object (????).Apr 19 2014, 9:30 PM

fix diff view to show all context (-U100000)

vtjnash updated this revision to Unknown Object (????).Apr 19 2014, 11:14 PM

move logic to TargetLowering::LowerCallTo

this significantly simplifies the change, while moving it to a more sensible location

asl added a comment.Apr 20 2014, 7:42 AM

@vtjnash: Again, this is still a layering violation. Move the logic to X86 target.

asl added inline comments.Apr 20 2014, 8:08 AM
test/CodeGen/X86/mod128.ll
11 ↗(On Diff #8673)

This is incorrect. You need to make sure that the args are indeed passed where necessary, not only that the 'just' call is emitted.

asl added a comment.Apr 20 2014, 8:20 AM

@vtjnash: I looked over the code and the proper solution would be:

  1. You need to extend the calling convention bits to pass 128-bit stuff by value (see the existing code which already does this for vectors, i.e. __m128)
  2. You need to make sure that libcalls are using the proper calling convention

This way no target-specific hacks in the target independent code would be required as well as much more clear implementation will be added.

Again, this is still a layering violation. Move the logic to X86 target.

I know, but hoped this would make it easier to identify the correct solution, by moving the logic closer to the invocation of the target-dependent code, X86TargetLowering::LowerCall

You need to extend the calling convention bits to pass 128-bit stuff by value (see the existing code which already does this for vectors, i.e. __m128)

Is this the patch you were expecting:

diff --git a/lib/Target/X86/X86CallingConv.td b/lib/Target/X86/X86CallingConv.td
index a78b5c0..7bc179d 100644
--- a/lib/Target/X86/X86CallingConv.td
+++ b/lib/Target/X86/X86CallingConv.td
@@ -137,6 +137,8 @@ def RetCC_X86_64_C : CallingConv<[
 def RetCC_X86_Win64_C : CallingConv<[
   // The X86-Win64 calling convention always returns __m64 values in RAX.
   CCIfType<[x86mmx], CCBitConvertToType<i64>>,
+  // The X86-Win64 calling convention always returns __m128i values in XMM.
+  CCIfType<[i128], CCAssignToReg<[XMM0,XMM1,XMM2,XMM3]>>,
 
   // Otherwise, everything is the same as 'normal' X86-64 C CC.
   CCDelegateTo<RetCC_X86_64_C>
@@ -324,7 +326,10 @@ def CC_X86_Win64_C : CallingConv<[
 
   // Long doubles get stack slots whose size and alignment depends on the
   // subtarget.
-  CCIfType<[f80], CCAssignToStack<0, 0>>
+  CCIfType<[f80], CCAssignToStack<0, 0>>,
+
+  // 128 bit numbers are passed by pointer
+  CCIfType<[i128], CCPassIndirect<i64>>
 ]>;
 
 def CC_X86_64_GHC : CallingConv<[

This was how I first attempted this patch. However, it does not work because line 6998 of the original file in the diff above ensures that the target-specific code will never see a type larger than a register. It seems like either more of the LowerCallTo code should be moved to the target dependent sections, or it should provide some hook to allow target-dependent register allocation

test/CodeGen/X86/mod128.ll
11 ↗(On Diff #8673)

each version of llvm emits the arguments in a different order. looking at the documentation for FileCheck, it looks like WIN64-DAG may be the syntax I was looking for. I'll fix this in my next diff.

asl added a comment.Apr 20 2014, 9:09 AM

Ideally you should custom lower the necessary libcalls. See e.g. in ARM backend how this is handled for SDIVREM nodes (ARMTargetLowering::LowerDivRem and around).

ok. i was not aware that could be done.

i think the following RTLIB functions would need this correction:
SHL_I128, SRL_I128, SRA_I128, MUL_I128, MULO_I128, SDIV_I128, UDIV_I128, SREM_I128, UREM_I128, SDIVREM_I128, UDIVREM_I128,

of these, I was able to generate calls to the following for MVT::i128
ISD::SDIV, ISD::UDIV, ISD::SREM, ISD::UREM
(I couldn't cause ISD::SDIVREM or ISD::UDIVREM, but I assumed they are possible)

vtjnash updated this revision to Unknown Object (????).Apr 20 2014, 10:05 PM

move calling convention patch to X86 target

vtjnash updated this revision to Unknown Object (????).Apr 20 2014, 10:17 PM

cygwin-64 has the same calling convention as win64

Is this patch acceptable now?

rnk added a subscriber: rnk.May 1 2014, 11:05 AM
rnk added inline comments.
lib/Target/X86/X86ISelLowering.cpp
1504

Please use two space indentation here.

12528–12534

I'd rather hoist the check for win64 + [if]128 out of the function that lowers them and turn these into assertions.

It might be better to strengthen these to assertions, if the result of failing to replace something is a miscompile.

12571

This is copy-pasted from other examples, but I'd whack it with clang-format if you have it easily available. Otherwise, it's fine.

13613

Indentation. Alternatively, move the check for win64 + [fi]128 over the lowering call and drop this check.

vtjnash updated this revision to Diff 9090.May 5 2014, 5:21 PM

reformat per comments

@rnk I have updated the diff based on your comments (although I did not manage to locate a copy of clang-format). As suggested, I strengthened all of the conditional checks to assertions. The original code to retest the conditional was copied from the ARM code example and was not needed here.

majnemer added inline comments.May 5 2014, 6:19 PM
lib/Target/X86/X86ISelLowering.cpp
12551

Please keep columns under 80 characters.

rnk added a comment.May 5 2014, 6:28 PM

Thanks, that looks good. I committed this with minor changes in r208029.

vtjnash accepted this revision.Mar 22 2019, 6:12 PM
This revision is now accepted and ready to land.Mar 22 2019, 6:12 PM
vtjnash closed this revision.Mar 22 2019, 6:26 PM