This design document proposes lowering FIR complex number operations
through the MLIR complex dialect.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thank you for working on this, @DavidTruby!
flang/docs/ComplexOperations.md | ||
---|---|---|
60 | Maybe call it "Conversion" to avoid confusing it with Flang lowering. | |
66 | Unfortunately, not all Complex operations can be lowered to LLVM dialect. Currently, only these operations are supported there: patterns.add< AbsOpConversion, AddOpConversion, ConstantOpLowering, CreateOpConversion, DivOpConversion, ImOpConversion, MulOpConversion, ReOpConversion, SubOpConversion >(converter); I think the problems with ComplexToLibm conversions have to be resolved before we can start using Complex dialect operations in Flang lowering uniformly. @kiranchandramohan sent this link before: https://discourse.llvm.org/t/c-representation-of-complex-types/2180 It seems to me https://polygeist.mit.edu/ team will have to solve the same issue for C/C++ (e.g. will they use Complex dialect for C/C++ complex? how they are going to represent/translate function calls with C/C++ complex arguments for different targets?). It may be a good idea to collaborate with them on a common solution. | |
72 | This may be a good start, but I would prefer lowering to libm calls rather than pgmath calls just because I would expect libm calls have better support in LLVM backend. I think lowering to libm calls in Flang should not be a problem, since it "works" for pgmath calls already: we generate fir.call operations with fir.complex arguments, and then let FIR dialect convertor convert them into LLVM dialect calls with "proper" arguments. I used quotes, because I believe there is the same potential issue that on some target the ABI for C complex may not be matching the handling we do for fir.complex (which is representing it as LLVM {t, t} type and letting LLVM dialect translator translate it to LLVM IR as an aggregate type). So if we build pgmath library for this target with a regular C compiler, then we will have the mismatch. With that said, lowering to libm instead of pgmath does not make anything worse than it is. E.g. compare LLVM IR for PowerPC: https://godbolt.org/z/ffWvfd3f4 |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | I also got confused by this! But there's actually a ComplexToStandard conversion _separate_ from the ComplexToLLVM conversion. All the operations are supported by ComplexToStandard. I experimented with it converting pow and that worked correctly on x86_64 and AArch64. It was bought to my attention in this discussion on Discourse: https://discourse.llvm.org/t/complex-to-libm-conversion-abi-issues/65131/4?u=davidtruby | |
72 | Currently we do actually lower correctly on x86_64 to a call to pgmath using <2 x float> as opposed to {float, float} at the LLVM level. Perhaps we can do the same handling we do to get this correct for a pgmath call but call libm instead? |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | Oh, I see. I missed this convertor. While it allows converting all complex operations into other dialects and it may be an acceptable near-term solution, I am not sure if we necessarily want to use it. My concern is that the inline implementations used there may not be as performant as libm (or other) library implementations, e.g. I suppose there might be a more efficient way for implementing tan vs representing it as sin / cos on some targets. Thanks for the link! | |
72 | Yes, my point is that: if the immediate goal is to get rid of pgmath dependency, we can let Flang lower math operations straight into libm calls with fir.complex result/arguments, and this should not introduce any new problems. |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | I agree this may be an issue, perhaps I should measure the performance to see how big a difference there is? | |
72 | In principle I'd rather lower through the MLIR complex dialect, in case there are any optimisations there we can take advantage of and to avoid duplicating effort. Our conversation about the complex dialect didn't really lead anywhere but I wonder if there's some way we can get the ABI correct on the ComplexToLibmConversion side rather than having to do it at the FIR level? |
flang/docs/ComplexOperations.md | ||
---|---|---|
72 | I wonder if anyone knows why the x86_64 backend doesn't just do the right thing with an LLVM struct of {float, float}? I'm curious why frontends should have to care about ABI at that level. |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | It will be good food for thought, though, I am not sure if this has to be done for this particular change-set. So, basically, we need to compare performance of ComplexToStandard sequences vs libm (and maybe pgmath) implementation on at least x86_64 and AArch64. I think we can do the comparison in the following way:
Having these two options should provide an easier way for comparing performance on some small kernel tests. |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | Ok so, would you like to see in the interim a switch to libm calls from pgmath calls, and then investigate the performance implications of using MLIR complex after that is done? Or should we go ahead with MLIR complex and potentially replace it with libm if the performance is bad? For what it's worth, I already have a prototype patch implementing this by going through MLIR complex for pow which would be very low effort to add the other operations. Presumably it would not be that difficult to switch to using libm either though. |
flang/docs/ComplexOperations.md | ||
---|---|---|
72 | I think it is because C99 _Complex is a first-class data type, and the standard is not defining it as a structure of two elements, so the ABI may be defined differently for _Complex T and struct { T; T } for whatever reason. Next, there is no first-class representation for _Complex in LLVM IR, so the backend has no clue whether <2 x T> or {T, T} represents _Complex and that it should apply ABI rules for _Complex to it. Thus, for example, in clang it is the front-end job to properly generate LLVM IR for the target ABI for _Complex.
I suppose for this to work for our use-case, the MLIR complex should be treated as C99 _Complex during the type conversion (e.g. in ComplexToLLVM convertor). I believe this is currently not true, and it is treated as std::complex, and it is not the same. @jeanPerier showed an example here: https://reviews.llvm.org/D83397#2169102 |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | Since we currently do not have a way to represent strict FP behavior in MLIR, I think we have to have the "opaque calls" at least for -math-runtime=precise. So I think lowering into libm calls instead of pgmath calls is the right first step. Then we can add lowering into Complex dialect under fast and relaxed. Basically, I suggest lowering all complex math intrinsics the same way it is done for bessel_j0 and other intrinsics that use genLibCall generator. |
flang/docs/ComplexOperations.md | ||
---|---|---|
72 | Right, I remember this issue! I'm actually a bit confused about this after having checked the C and C++ standards. Both standards guarantee that a _Complex float and std::complex<float> can be cast to float[2] and that the first element is the real part and the second is the complex part. The C++ standard additionally *does* guarantee ABI compatibility between std::complex and _Complex in 26.4.1 but I suppose if x86_32's ABI doesn't follow the C/C++ standards we still have to accept that fact. My preference would again still be to go through ComplexToLibm and make that ABI aware (as currently ComplexToLibm is fundamentally broken on certain architectures) but that may require effort as MLIR doesn't generally have ABI awareness. |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | Ok, I'll work on a patch switching over to libm from pgmath and then rebase my prototype on top of that. |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | Since not all integers can be represented in a floating point type of the same size, I think you might get wrong results. For example, 2147483647 (INT_MAX) is 2147483648 in float, so you will get wrong result for -1.0f+i*0.0f. I guess you may use wider FP type for i32 and i64, but it might have negative performance impact, and I am not sure how this will affect the accuracy. One solution is to handle it the same way as math.fpowi, i.e. add dedicated Complex dialect operation and convert it accordingly. Another way is for Flang to lower it into a Fortran runtime call and have the implementation there. |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | I'm struggling a bit with the Fortran standard here.. does it say that e.g. c ** 3 has to be handled as if 3 were an integer? A quick glance at compiler explorer tells me that gfortran does the same for c ** n where n is an integer as it does where n is a float: |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 |
Apologies, I posted the wrong link here! |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | Both links seem to show the same code :) I am looking at N2162: 10.1.5.2.1 Interpretation of numeric intrinsic operations 1 The two operands of numeric intrinsic binary operations may be of different numeric types or different kind type parameters. Except for a value of type real or complex raised to an integer power, if the operands have different types or kind type parameters, the effect is as if each operand that differs in type or kind type parameter from those of the result is converted to the type and kind type parameter of the result before the operation is performed. When a value of type real or complex is raised to an integer power, the integer operand need not be converted. gfortran seems to agree: https://godbolt.org/z/jaj83fvr3 |
Can you consider more cases such as follows:
subroutine sub complex :: a, b, c c = conj(a) * b end
and -a * b and -conj(a) * b.
Beyond the motivation mentioned in the doc, is it possible to do some performance optimization through this dialect you proposed. Take the complex multiplication (a * b) as one example:
Instead of lowering to LLVM IR, we can generate some transformation IR such as the custom intrinsics. In the middle end, vectorization can be performed. In the backend, the instruction fcmla can be generated.
flang/docs/ComplexOperations.md | ||
---|---|---|
72 |
Does it the C++ standard explicitly says so ? All I could read so far are statements that guarantee layout compatibility (The _Complex memory representation is exactly equivalent to the one of std::complex). However, I do not believe the C++ is telling anything about how std::complex should be passed by value in function interfaces. std::complex is an aggregate type, and it is up to target ABI's to define how aggregate types are passed by value. _Complex however is an intrinsic type, and target ABI's may define custom rules regarding how it is passed by value. Anyway, I agree with your preference. Making ComplexToLibm ABI aware seems the better long term solutions. Thanks for looking into this ! I also agree with @vzakhari that whom ever wants to compile C/C++ with _Complex and std::complex via MLIR will face the same issue, and that it would make sense to have a common approach. |
How about somebody adds:
llvm::Optional<ComplexLayout> getComplexLayout(llvm::Triple);
to llvm/Frontend/TargetInfo/Complex.h ?
You could lazy add support only for your two favorite triples.
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | Thanks, this is the context I was looking for in the standard but couldn't find. |
I think the problem is that the MLIR complex dialect doesn't have any notion of "target" at the moment, so even with a function like this we wouldn't know which target to request the ABI for when doing the conversion to libm calls. You could just assume the current target but that would break cross compilation.
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | The standard dialect was removed and the operations there were moved to other dialects. I guess the pass was not renamed. | |
72 |
Is this (using libm calls with fir.complex) OK with @jeanPerier as a first step and subsequent steps being lowering to the Complex Dialect and using ComplexToLibm with some ABI fixes? @DavidTruby discussed this in an MLIR RFC but did not gain much traction https://discourse.llvm.org/t/complex-to-libm-conversion-abi-issues/65131. We will want some support here if this is the path we all want. |
flang/docs/ComplexOperations.md | ||
---|---|---|
66 | Good question! Yes, it looks like generating a Fortran runtime call for cpowi under precise is the only way to keep following the current approach. |
But the conversion pass from Complex to Libm needs to know the target. Otherwise you will get ABI issues.
There was a discussion on discourse how to model strict floating point behaviour.
https://discourse.llvm.org/t/modelling-strict-floating-point-behavior-in-math-like-dialects/63000/2
I've done some testing of the mlir complex dialect and it seems that using
ComplexToStandard does give increased performance vs emitting calls to libm.
I've updated this proposal to reflect that we now call to libm rather than
pgmath.
I will be uploading a patch that changes some complex operations to use MLIR
complex momentarily so that people can review the approach and test this
themselves.
Thanks for the input so far!
Maybe call it "Conversion" to avoid confusing it with Flang lowering.