This is an archive of the discontinued LLVM Phabricator instance.

[flang] Complex numbers in function arguments on Windows
ClosedPublic

Authored by mmuetzel on Apr 7 2023, 12:30 AM.

Details

Summary

Function arguments or return values that are complex floating point values
aren't correctly lowered for Windows x86 32-bit and 64-bit targets.
See: https://github.com/llvm/llvm-project/issues/61976

Add targets that are specific for these platforms and OS.

With thanks to @mstorsjo for pointing out the fix.

Diff Detail

Event Timeline

mmuetzel created this revision.Apr 7 2023, 12:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2023, 12:30 AM
mmuetzel requested review of this revision.Apr 7 2023, 12:30 AM
This comment was removed by MehdiChinoune.

How about extended precision (80bit) which is supported by LLVM/Clang on MINGW.

Thanks a lot for adding support here! I am not a windows ABI expert, so I cannot validate the ABI choices here, but the code looks good.

I think a simple test should be added!

I agree, you can add it in flang/test/Fir/target-rewrite-complex.fir.

mmuetzel added a comment.EditedApr 7 2023, 4:42 AM

Thanks a lot for adding support here! I am not a windows ABI expert, so I cannot validate the ABI choices here, but the code looks good.

Thank you for checking these changes.

To be honest, I am not a Windows ABI expert either. Basically, I checked if the LLVM IR emitted by clang and the one emitted by flang looked equivalent when calling a function from the FortranRuntime library (cpowi or zpowi, respectively).

I basically copied TargetX86_64 and modified it until the LLVM IR looked equivalent.

With that change, I was able to build an example that uses the power operator with single and double precision floating point values. And executing the program returned the correct result.

I think a simple test should be added!

I agree, you can add it in flang/test/Fir/target-rewrite-complex.fir.

That will be the first time I touch the test suite. Let's see how that fairs. 🙂

mmuetzel updated this revision to Diff 511675.Apr 7 2023, 6:13 AM

I tried to stay close to the existing tests for the other targets by basically copying the lines for X64 and adapting them with the results I currently see on MinGW with the other changes from here.
I hope that is ok.

This comment was removed by MehdiChinoune.

On further inspection, the lowering for double complex doesn't look right.

I'll try to fix that.

When I said tests, I meant Fortran tests. Their absence led to this hidden issue.

There are no end to end test here. If you want some you would need to add them to the llvm-test-suite (https://github.com/llvm/llvm-test-suite)

This comment was removed by MehdiChinoune.

There are no end to end test here. If you want some you would need to add them to the llvm-test-suite (https://github.com/llvm/llvm-test-suite)

So, people freely change code without even test if a Fortran program generate wrong results or not?
For example here, If there was a simple test for complex numbers in Fortran, the issue would be caught at implementation.
I see people here add C++ tests to test their code.
Example:
https://reviews.llvm.org/D134889 they added C++ tests to test valid results!

This is a unit test. Not an end to end test. The llvm-test-suite is meant to be run by buildbots so if you add a test there, an implementation changed would be caught.

This comment was removed by MehdiChinoune.

Before I'm starting to change this forth and back, could someone please check if I'm interpreting this correctly?

I've compiled the following C source code with clang -S -O2 -emit-llvm test-pow2-d.c -o test-pow2-d-c.lir into LLVM IR:

#include <complex.h>
#include <stdio.h>

complex double _FortranAzpowi(complex double base, int exp);

int main(void)
{
  complex double c = {0., 1.};
  complex double z = _FortranAzpowi(c, 2);

  // printf("(%f, %f)\n", creal(z), cimag(z));

  return 0;
}

That yields:

; ModuleID = 'test-pow2-d.c'
source_filename = "test-pow2-d.c"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-w64-windows-gnu"

; Function Attrs: nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #0 {
  %1 = alloca { double, double }, align 8
  %2 = alloca { double, double }, align 16
  store <2 x double> <double 0.000000e+00, double 1.000000e+00>, ptr %2, align 16
  call void @_FortranAzpowi(ptr nonnull sret({ double, double }) align 8 %1, ptr noundef nonnull %2, i32 noundef 2) #2
  ret i32 0
}

declare dso_local void @_FortranAzpowi(ptr sret({ double, double }) align 8, ptr noundef, i32 noundef) local_unnamed_addr #1

attributes #0 = { nounwind uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #2 = { nounwind }

!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}

!0 = !{i32 1, !"wchar_size", i32 2}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"uwtable", i32 2}
!3 = !{!"clang version 16.0.1"}

Assuming that clang does the right thing, does that mean that double precision complex arguments are passed as 2-element array on that platform?
Would the following be correct in that case?

  complexArgumentType(mlir::Location loc, mlir::Type eleTy) const override {
[...]
    } else if (sem == &llvm::APFloat::IEEEdouble()) {
      // <2 x t>   vector of 2 eleTy
      marshal.emplace_back(fir::VectorType::get(2, eleTy), AT{});

This is a unit test. Not an end to end test. The llvm-test-suite is meant to be run by buildbots so if you add a test there, an implementation changed would be caught.

Where does it say that "Fortran unit-tests couldn't be added and should be added to llvm-test-suite instead"?
Why Clang, libc++, ... have unit-tests in C/C++ but Flang doesn't?
Is this rule imposed by LLVM project or Flang sub-project?

A test like that is not harmful

program complex_powi
  implicit none
  complex(4) :: c = cmplx(0., 1.)
  complex(8) :: z = cmplx(0., 1.)
  ! assert c**2 = (-1., 0.)
  ! assert z**2 = (-1., 0.)
end program

So, my question is: Why there wasn't one? or Why isn't required for contributors to add those tests? (in llvm-test-suite)
I think tests of that type are very essential.
For example GNU Fortran developers add just Fortran tests (most of time) when they implement features or fix tests.
Is this a new testing method, I feel it very strange.

End to End tests are not allowed in the llvm-project monorepo. They belong to the testsuite (as @clementval says) as per current llvm practice. So this is not something that Flang mandates.
There has been some long discussions about end-to-end tests in https://discourse.llvm.org/t/rfc-end-to-end-testing/53306. If you have a strong opinion, you can voice your concerns there.

Before I'm starting to change this forth and back, could someone please check if I'm interpreting this correctly?

I've compiled the following C source code with clang -S -O2 -emit-llvm test-pow2-d.c -o test-pow2-d-c.lir into LLVM IR:

#include <complex.h>
#include <stdio.h>

complex double _FortranAzpowi(complex double base, int exp);

int main(void)
{
  complex double c = {0., 1.};
  complex double z = _FortranAzpowi(c, 2);

  // printf("(%f, %f)\n", creal(z), cimag(z));

  return 0;
}

That yields:

; ModuleID = 'test-pow2-d.c'
source_filename = "test-pow2-d.c"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-w64-windows-gnu"

; Function Attrs: nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #0 {
  %1 = alloca { double, double }, align 8
  %2 = alloca { double, double }, align 16
  store <2 x double> <double 0.000000e+00, double 1.000000e+00>, ptr %2, align 16
  call void @_FortranAzpowi(ptr nonnull sret({ double, double }) align 8 %1, ptr noundef nonnull %2, i32 noundef 2) #2
  ret i32 0
}

declare dso_local void @_FortranAzpowi(ptr sret({ double, double }) align 8, ptr noundef, i32 noundef) local_unnamed_addr #1

attributes #0 = { nounwind uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #2 = { nounwind }

!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}

!0 = !{i32 1, !"wchar_size", i32 2}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"uwtable", i32 2}
!3 = !{!"clang version 16.0.1"}

Assuming that clang does the right thing, does that mean that double precision complex arguments are passed as 2-element array on that platform?
Would the following be correct in that case?

  complexArgumentType(mlir::Location loc, mlir::Type eleTy) const override {
[...]
    } else if (sem == &llvm::APFloat::IEEEdouble()) {
      // <2 x t>   vector of 2 eleTy
      marshal.emplace_back(fir::VectorType::get(2, eleTy), AT{});

Can you run this without optimisations and see the output?

In this IR the argument is passed as a pointer to a {double, double} tuple:

%2 = alloca { double, double }, align 16
store <2 x double> <double 0.000000e+00, double 1.000000e+00>, ptr %2, align 16
call void @_FortranAzpowi(ptr nonnull sret({ double, double }) align 8 %1, ptr noundef nonnull %2, i32 noundef 2) #2

The result is returned via the first sret argument.

To be on the safe side, I suggest compiling the flang/runtime/complex-powi.cpp file on Windows with clang-cl and see what function signatures are created.

This comment was removed by MehdiChinoune.
This comment was removed by MehdiChinoune.

Can you run this without optimisations and see the output?

The output for clang -S -O0 -emit-llvm test-pow2-d.c -o test-pow2-d-c.lir:

; ModuleID = 'test-pow2-d.c'
source_filename = "test-pow2-d.c"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-w64-windows-gnu"

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca { double, double }, align 8
  %3 = alloca { double, double }, align 8
  %4 = alloca { double, double }, align 8
  %5 = alloca { double, double }, align 8
  store i32 0, ptr %1, align 4
  %6 = getelementptr inbounds { double, double }, ptr %2, i32 0, i32 0
  %7 = getelementptr inbounds { double, double }, ptr %2, i32 0, i32 1
  store double 0.000000e+00, ptr %6, align 8
  store double 1.000000e+00, ptr %7, align 8
  %8 = getelementptr inbounds { double, double }, ptr %2, i32 0, i32 0
  %9 = load double, ptr %8, align 8
  %10 = getelementptr inbounds { double, double }, ptr %2, i32 0, i32 1
  %11 = load double, ptr %10, align 8
  %12 = getelementptr inbounds { double, double }, ptr %5, i32 0, i32 0
  %13 = getelementptr inbounds { double, double }, ptr %5, i32 0, i32 1
  store double %9, ptr %12, align 8
  store double %11, ptr %13, align 8
  call void @_FortranAzpowi(ptr sret({ double, double }) align 8 %4, ptr noundef %5, i32 noundef 2)
  %14 = getelementptr inbounds { double, double }, ptr %4, i32 0, i32 0
  %15 = load double, ptr %14, align 8
  %16 = getelementptr inbounds { double, double }, ptr %4, i32 0, i32 1
  %17 = load double, ptr %16, align 8
  %18 = getelementptr inbounds { double, double }, ptr %3, i32 0, i32 0
  %19 = getelementptr inbounds { double, double }, ptr %3, i32 0, i32 1
  store double %15, ptr %18, align 8
  store double %17, ptr %19, align 8
  ret i32 0
}

declare dso_local void @_FortranAzpowi(ptr sret({ double, double }) align 8, ptr noundef, i32 noundef) #1

attributes #0 = { noinline nounwind optnone uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}

!0 = !{i32 1, !"wchar_size", i32 2}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"uwtable", i32 2}
!3 = !{!"clang version 16.0.1"}

In this IR the argument is passed as a pointer to a {double, double} tuple:

%2 = alloca { double, double }, align 16
store <2 x double> <double 0.000000e+00, double 1.000000e+00>, ptr %2, align 16
call void @_FortranAzpowi(ptr nonnull sret({ double, double }) align 8 %1, ptr noundef nonnull %2, i32 noundef 2) #2

The result is returned via the first sret argument.

To be on the safe side, I suggest compiling the flang/runtime/complex-powi.cpp file on Windows with clang-cl and see what function signatures are created.

Could you please tell the flags to output the LLVM IR with clang-cl?
There is a different code path for MSVC in that file. Is that why you are asking?

Could you please tell the flags to output the LLVM IR with clang-cl?

I am not sure, but I think it supports all options that clang driver supports, i.e. -S -emit-llvm

There is a different code path for MSVC in that file. Is that why you are asking?

I am not sure if your source code reproducer matches exactly what is done in the runtime code, so compiling the runtime code precisely will give you a bulletproof answer to your question.

If you just want to look at the output in MSVC ABI, you can use clang -target x86_64-pc-windows-msvc, no need to use clang-cl.

mmuetzel added a comment.EditedApr 7 2023, 10:59 AM

If you just want to look at the output in MSVC ABI, you can use clang -target x86_64-pc-windows-msvc, no need to use clang-cl.

It looks like I don't have the STL headers for that target. (Or I don't understand the error...)

$ clang++ -target x86_64-pc-windows-msvc -S -emit-llvm flang/runtime/complex-powi.cpp -ocomplex-powi.lir -Iflang/include
flang/runtime/complex-powi.cpp:10:10: fatal error: 'cstdint' file not found
#include <cstdint>
         ^~~~~~~~~
1 error generated.

For the default target (windows-gnu), I get the following LLVM IR. (I hope that is the relevant part.):

; Function Attrs: mustprogress noinline optnone uwtable
define dso_local void @_FortranAzpowi(ptr noalias sret({ double, double }) align 8 %0, ptr noundef %1, i32 noundef %2) #0 {
  %4 = alloca ptr, align 8
  %5 = alloca i32, align 4
  %6 = alloca { double, double }, align 8
  %7 = alloca { double, double }, align 8
  store ptr %0, ptr %4, align 8
  store i32 %2, ptr %5, align 4
  %8 = getelementptr inbounds { double, double }, ptr %1, i32 0, i32 0
  %9 = load double, ptr %8, align 8
  %10 = getelementptr inbounds { double, double }, ptr %1, i32 0, i32 1
  %11 = load double, ptr %10, align 8
  %12 = load i32, ptr %5, align 4
  %13 = getelementptr inbounds { double, double }, ptr %7, i32 0, i32 0
  %14 = getelementptr inbounds { double, double }, ptr %7, i32 0, i32 1
  store double %9, ptr %13, align 8
  store double %11, ptr %14, align 8
  call void @_Z6tgpowiICdiET_S1_T0_(ptr sret({ double, double }) align 8 %6, ptr noundef %7, i32 noundef %12)
  %15 = getelementptr inbounds { double, double }, ptr %6, i32 0, i32 0
  %16 = load double, ptr %15, align 8
  %17 = getelementptr inbounds { double, double }, ptr %6, i32 0, i32 1
  %18 = load double, ptr %17, align 8
  %19 = getelementptr inbounds { double, double }, ptr %0, i32 0, i32 0
  %20 = getelementptr inbounds { double, double }, ptr %0, i32 0, i32 1
  store double %16, ptr %19, align 8
  store double %18, ptr %20, align 8
  %21 = getelementptr inbounds { double, double }, ptr %0, i32 0, i32 0
  %22 = load double, ptr %21, align 8
  %23 = getelementptr inbounds { double, double }, ptr %0, i32 0, i32 1
  %24 = load double, ptr %23, align 8
  %25 = getelementptr inbounds { double, double }, ptr %0, i32 0, i32 0
  %26 = getelementptr inbounds { double, double }, ptr %0, i32 0, i32 1
  store double %22, ptr %25, align 8
  store double %24, ptr %26, align 8
  ret void
}

Thanks! So the complex argument is passed as before, i.e. as a pointer to a tuple of two doubles.

Thanks! So the complex argument is passed as before, i.e. as a pointer to a tuple of two doubles.

This should match the byval passing of {double, double} tuple - the way you do it. I am not sure why clang does not use byval.

mmuetzel added a comment.EditedApr 7 2023, 11:14 AM

Thanks! So the complex argument is passed as before, i.e. as a pointer to a tuple of two doubles.

This should match the byval passing of {double, double} tuple - the way you do it. I am not sure why clang does not use byval.

Thanks for helping with this. I got confused by the optimized LLVM IR...

To be absolutely sure, I emitted the LLVM IR for the following Fortran example:

program test
  implicit none
  double complex :: x, z
  z = cmplx(0, 1.)
  x = z**2
  ! print*, x
end program

For flang-new -S -emit-llvm test-pow-d.f90 -o test-pow-d.lir with the changes from here, that is:

; ModuleID = 'FIRModule'
source_filename = "FIRModule"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-w64-windows-gnu"

@_QQEnvironmentDefaults = constant ptr null

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QQmain() {
  %1 = alloca { double, double }, i64 1, align 8
  %2 = alloca { double, double }, i64 1, align 8
  store { double, double } { double 0.000000e+00, double 1.000000e+00 }, ptr %2, align 8
  %3 = load { double, double }, ptr %2, align 8
  %4 = alloca { double, double }, i64 1, align 8
  %5 = alloca { double, double }, i64 1, align 8
  store { double, double } %3, ptr %5, align 8
  call void @_FortranAzpowi(ptr %4, ptr %5, i32 2)
  %6 = load { double, double }, ptr %4, align 8
  store { double, double } %6, ptr %1, align 8
  ret void
}

declare void @_FortranAzpowi(ptr sret({ double, double }) align 8, ptr byval({ double, double }) align 8, i32)

!llvm.module.flags = !{!0, !1}

!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = !{i32 8, !"PIC Level", i32 2}

Looks about right to me. Is that correct?

Looks correct to me.

mmuetzel edited the summary of this revision. (Show Details)Apr 8 2023, 3:30 AM

How about extended precision (80bit) which is supported by LLVM/Clang on MINGW.

There is no 80-bit floating point type in Fortran afaict, is there?
The LLVM IR emitted for COMPLEX*32 (corresponding to complex __float128 in C afaict) looks correct to me with the changes from here.

There is no 80-bit floating point type in Fortran afaict, is there?
The LLVM IR emitted for COMPLEX*32 (corresponding to complex __float128 in C afaict) looks correct to me with the changes from here.

GNU Fortran supports it as COMPLEX(10), the same as REAL(10).
There is no to way query COMPLEX kinds in Fortran, It uses REAL kinds, so what should compiler do with KIND=10
Also COMPLEX*32 is a non-standard, Use COMPLEX(KIND) where kind=4,8,10,16.

mmuetzel updated this revision to Diff 512524.Apr 11 2023, 10:58 AM

Thanks for clearing that up.

The latest version of the proposed changes produces LLVM IR for complex(10) that is equivalent to complex long double in C on Windows.

vzakhari added a comment.EditedApr 13 2023, 2:16 PM

I think the alignment should be 16 for x86_fp80 case: https://godbolt.org/z/39Yvxbj41

mmuetzel updated this revision to Diff 513499.Apr 14 2023, 2:26 AM

Ooof. Thanks for pointing that out. I don't know how I could have misread that...

That leaves the branches for IEEEquad and x87DoubleExtended basically identical. They only differ in the comments.
What is the style recommendation for that case? Keep them separate? Or use a || in the if-condition?

mmuetzel updated this revision to Diff 513566.Apr 14 2023, 6:33 AM

I opted for combining the two identical branches of the if-else-tree.

I also added a target for Windows 32-bit. Even if Flang doesn't support 32-bit hosts (iiuc), cross-building to that target might still be useful.

mmuetzel edited the summary of this revision. (Show Details)Apr 14 2023, 6:50 AM
mmuetzel updated this revision to Diff 513568.Apr 14 2023, 6:55 AM

Fix copy-paste error.

vzakhari accepted this revision.Apr 14 2023, 9:07 AM

LGTM. Thank you for the changes!

This revision is now accepted and ready to land.Apr 14 2023, 9:07 AM

Thank you for reviewing and accepting these changes.
Is there anything else I need to do to help this to get merged?

Thank you for reviewing and accepting these changes.
Is there anything else I need to do to help this to get merged?

I can merge this on Monday (do not feel comfortable merging changes before the weekend). Please ping me then.

I can merge this on Monday (do not feel comfortable merging changes before the weekend). Please ping me then.

Ping 🙂
Could you please merge this?

I can merge this on Monday (do not feel comfortable merging changes before the weekend). Please ping me then.

Ping 🙂
Could you please merge this?

Thanks. I will merge it today. It looks like arcanist was not able to identify your git name/email automatically - please specify which one I should use.

Thanks. I will merge it today. It looks like arcanist was not able to identify your git name/email automatically - please specify which one I should use.

I'm not sure what arcanist is. Would it help if I registered somewhere?

Could you please use "Markus Mützel <markus.muetzel@gmx.de>" as identification?

I'm not sure what arcanist is. Would it help if I registered somewhere?

Could you please use "Markus Mützel <markus.muetzel@gmx.de>" as identification?

Sure, I will use this id. I think you might need to be registered on github with mmuetzel id, but I am not 100% sure.

This revision was automatically updated to reflect the committed changes.

Sure, I will use this id.

Thank you very much.
This change is limited to a couple of targets for which incorrect code was produced previously.
Do you think this is eligible to be cherry-picked to the release branch?

I think you might need to be registered on github with mmuetzel id, but I am not 100% sure.

Hmm... That's my user name on GitHub. And if I recall correctly, I've originally logged in with my GitHub account here.
I perused the account settings here on Phabricator to try and find settings for the id. But I wasn't successful either.
I changed my email to public in my GitHub account settings. Maybe that will make a difference.

Do you think this is eligible to be cherry-picked to the release branch?

I think it is safe to merge it into the release branch.

Hmm... That's my user name on GitHub. And if I recall correctly, I've originally logged in with my GitHub account here.
I perused the account settings here on Phabricator to try and find settings for the id. But I wasn't successful either.
I changed my email to public in my GitHub account settings. Maybe that will make a difference.

For whatever reason I was not able to find your user id on github yesterday, but I see it now. Maybe the account settings actually made the difference.

I think it is safe to merge it into the release branch.

See: https://github.com/llvm/llvm-project/issues/62218

Not sure if I'm doing this right...