This is an archive of the discontinued LLVM Phabricator instance.

[flang][RFC] Proposal for complex number lowering through MLIR
ClosedPublic

Authored by DavidTruby on Sep 21 2022, 8:19 AM.

Details

Summary

This design document proposes lowering FIR complex number operations
through the MLIR complex dialect.

Diff Detail

Event Timeline

DavidTruby created this revision.Sep 21 2022, 8:19 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
DavidTruby requested review of this revision.Sep 21 2022, 8:19 AM

Thank you for working on this, @DavidTruby!

flang/docs/ComplexOperations.md
60

Maybe call it "Conversion" to avoid confusing it with Flang lowering.

66

Unfortunately, not all Complex operations can be lowered to LLVM dialect. Currently, only these operations are supported there:

patterns.add<
    AbsOpConversion,
    AddOpConversion,
    ConstantOpLowering,
    CreateOpConversion,
    DivOpConversion,
    ImOpConversion,
    MulOpConversion,
    ReOpConversion,
    SubOpConversion
  >(converter);

I think the problems with ComplexToLibm conversions have to be resolved before we can start using Complex dialect operations in Flang lowering uniformly. @kiranchandramohan sent this link before: https://discourse.llvm.org/t/c-representation-of-complex-types/2180
@ftynse described some ideas there.

It seems to me https://polygeist.mit.edu/ team will have to solve the same issue for C/C++ (e.g. will they use Complex dialect for C/C++ complex? how they are going to represent/translate function calls with C/C++ complex arguments for different targets?). It may be a good idea to collaborate with them on a common solution.

72

This may be a good start, but I would prefer lowering to libm calls rather than pgmath calls just because I would expect libm calls have better support in LLVM backend.

I think lowering to libm calls in Flang should not be a problem, since it "works" for pgmath calls already: we generate fir.call operations with fir.complex arguments, and then let FIR dialect convertor convert them into LLVM dialect calls with "proper" arguments. I used quotes, because I believe there is the same potential issue that on some target the ABI for C complex may not be matching the handling we do for fir.complex (which is representing it as LLVM {t, t} type and letting LLVM dialect translator translate it to LLVM IR as an aggregate type). So if we build pgmath library for this target with a regular C compiler, then we will have the mismatch.

With that said, lowering to libm instead of pgmath does not make anything worse than it is.

E.g. compare LLVM IR for PowerPC: https://godbolt.org/z/ffWvfd3f4

DavidTruby added inline comments.Sep 21 2022, 11:44 AM
flang/docs/ComplexOperations.md
66

I also got confused by this! But there's actually a ComplexToStandard conversion _separate_ from the ComplexToLLVM conversion. All the operations are supported by ComplexToStandard. I experimented with it converting pow and that worked correctly on x86_64 and AArch64.

It was bought to my attention in this discussion on Discourse: https://discourse.llvm.org/t/complex-to-libm-conversion-abi-issues/65131/4?u=davidtruby

72

Currently we do actually lower correctly on x86_64 to a call to pgmath using <2 x float> as opposed to {float, float} at the LLVM level.
If you do give the x86_64 backend an LLVM {float, float} it does not handle converting that to a <2 x float> at the backend level for some reason, you have to have already correctly given it <2 x float> at the LLVM IR level. This is the problematic ABI case I am aware of.

Perhaps we can do the same handling we do to get this correct for a pgmath call but call libm instead?

vzakhari added inline comments.Sep 21 2022, 12:33 PM
flang/docs/ComplexOperations.md
66

Oh, I see. I missed this convertor. While it allows converting all complex operations into other dialects and it may be an acceptable near-term solution, I am not sure if we necessarily want to use it. My concern is that the inline implementations used there may not be as performant as libm (or other) library implementations, e.g. I suppose there might be a more efficient way for implementing tan vs representing it as sin / cos on some targets.

Thanks for the link!

72

Yes, my point is that: if the immediate goal is to get rid of pgmath dependency, we can let Flang lower math operations straight into libm calls with fir.complex result/arguments, and this should not introduce any new problems.

DavidTruby added inline comments.Sep 21 2022, 12:41 PM
flang/docs/ComplexOperations.md
66

I agree this may be an issue, perhaps I should measure the performance to see how big a difference there is?

72

In principle I'd rather lower through the MLIR complex dialect, in case there are any optimisations there we can take advantage of and to avoid duplicating effort. Our conversation about the complex dialect didn't really lead anywhere but I wonder if there's some way we can get the ABI correct on the ComplexToLibmConversion side rather than having to do it at the FIR level?

DavidTruby added inline comments.Sep 21 2022, 12:43 PM
flang/docs/ComplexOperations.md
72

I wonder if anyone knows why the x86_64 backend doesn't just do the right thing with an LLVM struct of {float, float}? I'm curious why frontends should have to care about ABI at that level.

vzakhari added inline comments.Sep 21 2022, 12:58 PM
flang/docs/ComplexOperations.md
66

It will be good food for thought, though, I am not sure if this has to be done for this particular change-set.

So, basically, we need to compare performance of ComplexToStandard sequences vs libm (and maybe pgmath) implementation on at least x86_64 and AArch64.

I think we can do the comparison in the following way:

  • Let Flang lower math operations straight into libm calls under -math-runtime=precise.
  • Let Flang lower math operations into Complex dialect operations and use ComplexToStandard convertor under -math-runtime=fast.

Having these two options should provide an easier way for comparing performance on some small kernel tests.

DavidTruby added inline comments.Sep 21 2022, 1:22 PM
flang/docs/ComplexOperations.md
66

Ok so, would you like to see in the interim a switch to libm calls from pgmath calls, and then investigate the performance implications of using MLIR complex after that is done? Or should we go ahead with MLIR complex and potentially replace it with libm if the performance is bad?

For what it's worth, I already have a prototype patch implementing this by going through MLIR complex for pow which would be very low effort to add the other operations. Presumably it would not be that difficult to switch to using libm either though.

vzakhari added inline comments.Sep 21 2022, 1:25 PM
flang/docs/ComplexOperations.md
72

I think it is because C99 _Complex is a first-class data type, and the standard is not defining it as a structure of two elements, so the ABI may be defined differently for _Complex T and struct { T; T } for whatever reason. Next, there is no first-class representation for _Complex in LLVM IR, so the backend has no clue whether <2 x T> or {T, T} represents _Complex and that it should apply ABI rules for _Complex to it. Thus, for example, in clang it is the front-end job to properly generate LLVM IR for the target ABI for _Complex.

I wonder if there's some way we can get the ABI correct on the ComplexToLibmConversion side rather than having to do it at the FIR level?

I suppose for this to work for our use-case, the MLIR complex should be treated as C99 _Complex during the type conversion (e.g. in ComplexToLLVM convertor). I believe this is currently not true, and it is treated as std::complex, and it is not the same. @jeanPerier showed an example here: https://reviews.llvm.org/D83397#2169102
Or, I guess, ComplexToLibm has to make sure that it applies the target ABI for the generated libm calls, since they are known to return and accept C99 _Complex.

vzakhari added inline comments.Sep 21 2022, 1:31 PM
flang/docs/ComplexOperations.md
66

Since we currently do not have a way to represent strict FP behavior in MLIR, I think we have to have the "opaque calls" at least for -math-runtime=precise. So I think lowering into libm calls instead of pgmath calls is the right first step. Then we can add lowering into Complex dialect under fast and relaxed.

Basically, I suggest lowering all complex math intrinsics the same way it is done for bessel_j0 and other intrinsics that use genLibCall generator.

DavidTruby added inline comments.Sep 21 2022, 2:34 PM
flang/docs/ComplexOperations.md
72

Right, I remember this issue!

I'm actually a bit confused about this after having checked the C and C++ standards. Both standards guarantee that a _Complex float and std::complex<float> can be cast to float[2] and that the first element is the real part and the second is the complex part. The C++ standard additionally *does* guarantee ABI compatibility between std::complex and _Complex in 26.4.1 but I suppose if x86_32's ABI doesn't follow the C/C++ standards we still have to accept that fact.

My preference would again still be to go through ComplexToLibm and make that ABI aware (as currently ComplexToLibm is fundamentally broken on certain architectures) but that may require effort as MLIR doesn't generally have ABI awareness.

DavidTruby added inline comments.Sep 21 2022, 2:36 PM
flang/docs/ComplexOperations.md
66

Ok, I'll work on a patch switching over to libm from pgmath and then rebase my prototype on top of that.
One concern is that libm doesn't have a cpowi function which pgmath does have. I'm not 100% sure of the implications of this; presumably in non-strict modes the result would at least be correct if you convert the integer to a _Complex float and call cpow, but perhaps performance of that would be worse than an alternative algorithm expecting only an integer?

vzakhari added inline comments.Sep 21 2022, 3:47 PM
flang/docs/ComplexOperations.md
66

Since not all integers can be represented in a floating point type of the same size, I think you might get wrong results. For example, 2147483647 (INT_MAX) is 2147483648 in float, so you will get wrong result for -1.0f+i*0.0f. I guess you may use wider FP type for i32 and i64, but it might have negative performance impact, and I am not sure how this will affect the accuracy.

One solution is to handle it the same way as math.fpowi, i.e. add dedicated Complex dialect operation and convert it accordingly. Another way is for Flang to lower it into a Fortran runtime call and have the implementation there.

DavidTruby added inline comments.Sep 21 2022, 6:20 PM
flang/docs/ComplexOperations.md
66

I'm struggling a bit with the Fortran standard here.. does it say that e.g. c ** 3 has to be handled as if 3 were an integer?
For example, from my knowledge of the C++ standard that just requires that the exponent become a complex number with the same base type as the mantissa, but the implementation is free to do differently and many do. Is this the same for Fortran?

A quick glance at compiler explorer tells me that gfortran does the same for c ** n where n is an integer as it does where n is a float:
https://godbolt.org/z/dEscfe686

DavidTruby added inline comments.Sep 21 2022, 6:22 PM
flang/docs/ComplexOperations.md
66

I'm struggling a bit with the Fortran standard here.. does it say that e.g. c ** 3 has to be handled as if 3 were an integer?
For example, from my knowledge of the C++ standard that just requires that the exponent become a complex number with the same base type as the mantissa, but the implementation is free to do differently and many do. Is this the same for Fortran?

A quick glance at compiler explorer tells me that gfortran does the same for c ** n where n is an integer as it does where n is a float:
https://godbolt.org/z/dEscfe686

Apologies, I posted the wrong link here!
The correct one should be:
https://godbolt.org/z/xMa84n1hY

vzakhari added inline comments.Sep 21 2022, 6:29 PM
flang/docs/ComplexOperations.md
66

Both links seem to show the same code :)

I am looking at N2162:

10.1.5.2.1 Interpretation of numeric intrinsic operations
1 The two operands of numeric intrinsic binary operations may be of different numeric types or different kind
type parameters. Except for a value of type real or complex raised to an integer power, if the operands have
different types or kind type parameters, the effect is as if each operand that differs in type or kind type parameter
from those of the result is converted to the type and kind type parameter of the result before the operation is
performed. When a value of type real or complex is raised to an integer power, the integer operand need not be
converted.

gfortran seems to agree: https://godbolt.org/z/jaj83fvr3

peixin added a subscriber: peixin.Sep 21 2022, 7:46 PM

Can you consider more cases such as follows:

subroutine sub
  complex :: a, b, c
  c = conj(a) * b
end

and -a * b and -conj(a) * b.

Beyond the motivation mentioned in the doc, is it possible to do some performance optimization through this dialect you proposed. Take the complex multiplication (a * b) as one example:
Instead of lowering to LLVM IR, we can generate some transformation IR such as the custom intrinsics. In the middle end, vectorization can be performed. In the backend, the instruction fcmla can be generated.

jeanPerier added inline comments.Sep 22 2022, 1:15 AM
flang/docs/ComplexOperations.md
72

The C++ standard additionally *does* guarantee ABI compatibility between std::complex and _Complex in 26.4.1

Does it the C++ standard explicitly says so ? All I could read so far are statements that guarantee layout compatibility (The _Complex memory representation is exactly equivalent to the one of std::complex). However, I do not believe the C++ is telling anything about how std::complex should be passed by value in function interfaces. std::complex is an aggregate type, and it is up to target ABI's to define how aggregate types are passed by value. _Complex however is an intrinsic type, and target ABI's may define custom rules regarding how it is passed by value.

Anyway, I agree with your preference. Making ComplexToLibm ABI aware seems the better long term solutions. Thanks for looking into this !

I also agree with @vzakhari that whom ever wants to compile C/C++ with _Complex and std::complex via MLIR will face the same issue, and that it would make sense to have a common approach.

How about somebody adds:

llvm::Optional<ComplexLayout> getComplexLayout(llvm::Triple);

to llvm/Frontend/TargetInfo/Complex.h ?

You could lazy add support only for your two favorite triples.

DavidTruby added inline comments.Sep 22 2022, 5:03 AM
flang/docs/ComplexOperations.md
66

Thanks, this is the context I was looking for in the standard but couldn't find.
This does imply that you should handle complex to integer power separately. I agree that we should add a powi to the mlir complex dialect. My main remaining question there is what we should do when we want not to use that dialect eg in precise mode, as there's no cpowi in libm. Perhaps we should add one in the flang runtime?

How about somebody adds:

llvm::Optional<ComplexLayout> getComplexLayout(llvm::Triple);

to llvm/Frontend/TargetInfo/Complex.h ?

You could lazy add support only for your two favorite triples.

I think the problem is that the MLIR complex dialect doesn't have any notion of "target" at the moment, so even with a function like this we wouldn't know which target to request the ABI for when doing the conversion to libm calls. You could just assume the current target but that would break cross compilation.

flang/docs/ComplexOperations.md
66

The standard dialect was removed and the operations there were moved to other dialects. I guess the pass was not renamed.

72

Yes, my point is that: if the immediate goal is to get rid of pgmath dependency, we can let Flang lower math operations straight into libm calls with fir.complex result/arguments, and this should not introduce any new problems.

Is this (using libm calls with fir.complex) OK with @jeanPerier as a first step and subsequent steps being lowering to the Complex Dialect and using ComplexToLibm with some ABI fixes?

@DavidTruby discussed this in an MLIR RFC but did not gain much traction https://discourse.llvm.org/t/complex-to-libm-conversion-abi-issues/65131. We will want some support here if this is the path we all want.

vzakhari added inline comments.Sep 22 2022, 8:38 AM
flang/docs/ComplexOperations.md
66

Good question! Yes, it looks like generating a Fortran runtime call for cpowi under precise is the only way to keep following the current approach.

How about somebody adds:

llvm::Optional<ComplexLayout> getComplexLayout(llvm::Triple);

to llvm/Frontend/TargetInfo/Complex.h ?

You could lazy add support only for your two favorite triples.

I think the problem is that the MLIR complex dialect doesn't have any notion of "target" at the moment, so even with a function like this we wouldn't know which target to request the ABI for when doing the conversion to libm calls. You could just assume the current target but that would break cross compilation.

But the conversion pass from Complex to Libm needs to know the target. Otherwise you will get ABI issues.

There was a discussion on discourse how to model strict floating point behaviour.

https://discourse.llvm.org/t/modelling-strict-floating-point-behavior-in-math-like-dialects/63000/2

I've done some testing of the mlir complex dialect and it seems that using
ComplexToStandard does give increased performance vs emitting calls to libm.

I've updated this proposal to reflect that we now call to libm rather than
pgmath.

I will be uploading a patch that changes some complex operations to use MLIR
complex momentarily so that people can review the approach and test this
themselves.

Thanks for the input so far!

I've uploaded D135882 now which shows this approach in action

DavidTruby updated this revision to Diff 472294.Nov 1 2022, 7:17 AM

Add clarification about what is done for unsupported operations.

vzakhari accepted this revision.Nov 1 2022, 2:27 PM

Thank you, David!

This revision is now accepted and ready to land.Nov 1 2022, 2:27 PM