Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
182 | does -> doesn't |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
53833 | I've tried to find a better place to handle ISD::FMAXIMUM but there is no place in chain Selecting->Combining->Legalization. So, I was inspired by ISD::FMAXNUM that is actually lowered during combining. If you are about naming solely, it is not a big deal to change the name, but all callees in PerformDAGCombine are named as combine* even for ISD::FMAXNUM. Do we really want to have a single LowerFMinimumFMaximum? | |
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
182 | Probably, yes. I wanted to show that we are able to fold all these checks with constant arguments. But we've already tested folding of zero and nan checks. Original FMAX and FMIN are tested as well. Dropped it. |
Do you have plan to support minimumNumber?
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
53833 | No, not the name. Did you try set action of FMAXIMUM to Custom? But I'm fine given it's similar to combineFMinNumFMaxNum. | |
53844 | Subtarget.hasFP16() && VT == MVT::f16 | |
53865 | Better to give a table like above // Op1 Op1 // Num xNaN +0 -0 // ----------------- -------------- // Num | Max | qNaN | +0 | +0 | +0 | // Op0 ----------------- Op0 -------------- // xNaN | qNaN | qNaN | -0 | +0 | -0 | // ----------------- -------------- | |
53891–53894 | Can we check if it is 0 or NaN? We can put both in the second operand I think. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
53891–53894 | We can use vfpclassps/d on AVX512DQ to optimize it. |
Sorry, I didn't get it. Could you be more specific?
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
53833 | Yes, thank you. I was unaware of this mechanism. It works but I've needed to include extra logic because with Custom lowering there is no more setcc combining. | |
53891–53894 | It was my first attempt to compare with 0 and NaN at the same time. We have two problems. The first is that comparison with zero returns ZF regardless positive or negative zero is provided. The second is that we still need to know whether the second operand is NaN or not. We may have (0.0, NaN) as arguments. We checked that the first op is not NaN and is zero. Replaced it with the second when the second is NaN. It seems to me that we always need to check both operands on NaN and one check on zero. We can use vfpclassps/d on AVX512DQ if one of operands is known never NaN. Working on it. |
It's accompanying function of minimum in IEEE-754 2019, it will be introduced in new C/C++ standard too. I thought you are working for that.
Rebased.
Moved from combine to lowering.
Supported f16 version.
Added optimization for avx512dq.
Added and updated tests.
I can't find any lib calls or specific intrinsics for them. I'd like to add them separately if we have any users or needs of them, do we?
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1004–1005 | Make the format align with its context. | |
2133–2134 | ditto. | |
30255 | Put Max/Min together is a bit confusing. My first impression is it can return either +0 or -0 for a single comparison. | |
30284 | Need hasFP16() for f16. | |
34086 | ditto format. |
I heard glibc is supporting them. I'm fine with leaving it to the future. Do you have plan on the vector support?
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
5 | Guess you missed a AVX512F, i.e. AVX,AVX512,AVX512F |
Addressed formatting comments.
Check f16 explicitly even if avx512f16 implies avx512dq for now.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30284 | (VT == MVT::f16 && Subtarget.hasFP16()) || Subtarget.hasDQI() |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30284 | Sorry, mistake. It should be: |
Indeed, found this
- <math.h> functions for floating-point maximum and minimum, corresponding to new operations in IEEE 754-2019, and corresponding <tgmath.h> macros, are added from draft ISO C2X: fmaximum, fmaximum_num, fmaximum_mag, fmaximum_mag_num, fminimum, fminimum_num, fminimum_mag, fminimum_mag_num and corresponding functions for float, long double, _FloatN and _FloatNx.
About vector support. I want to try to implement vector support alternatively, transform floats into integers using shifts and compare them as integers preserving float semantics, even -0 < +0. I've tried this approach for scalars but current approach produces less code. Probably vectors can benefit more.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30284 | Yes, thank you, missed it as fp16 implies dq. We actually can check only VT == MVT::f16 because above the predicate Subtarget.hasFP16() && VT == MVT::f16 has been checked already. Do we want to avoid such implicit implication? Alternative is VT != MVT::f16 && Subtarget.hasDQI() || VT == MVT::f16 && Subtarget.hasFP16(). | |
34086 | Format doesn't work, we have 80+ chars if it's aligned with return. We don't need getNode, I missed it when moved from combine to lowering. |
LGTM.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30284 | Oh, we have checked hasFP16 in line 29876. So the way used here is correct. Sorry for the noise. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30222 | You should be able to drop this early-out - we should never get here | |
30229 | Again, you can drop this as the setOperationAction calls should ensure we never get here - replace it with an assertion if you're worried. | |
30267 | auto * | |
30269 | auto * | |
llvm/test/CodeGen/X86/avx512fp16-fminimum-fmaximum.ll | ||
154 | please can you add vector test coverage to ensure we scalarize? | |
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
963 | please can you add vector test coverage to ensure we scalarize? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30229 | Yes, you are right. Everything must be handled in X86TargetLowering. Dropped these checks. | |
llvm/test/CodeGen/X86/avx512fp16-fminimum-fmaximum.ll | ||
154 | Yes, I've added them. Also it reminded me about several commented tests with the intrinsics. Uncommented them as well. |
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
711 | add nounwind attribute to get rid of the .cfi noise |
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
711 | I've allowed myself to add nounwind in half.ll since I've touched it. I think the attribute was missed. |
@pengfei @goldstein.w.n Any more comments?
llvm/test/CodeGen/X86/half.ll | ||
---|---|---|
957 | pre-commit the nounwind change to keep it separate from this patch - you shouldn't really need dso_local either |
I have a side question (not delaying landing this one)
It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?
Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...
llvm/test/CodeGen/X86/half.ll | ||
---|---|---|
957 | I haven't got commit access yet, here it is https://reviews.llvm.org/D149114 |
Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?
BTW. my measurements on micro benchmarking (I know micro might cause that branch predictor works good), the version https://godbolt.org/z/rEj9GPfnY is the best one for scalar but it cannot be implemented in SelectionDAG as has a CFG.
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
120 | Here is what I mentioned in terms of non-optimial vectorized version at least on AVX. |
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
120 | It is exactly what I said in the beginning of this discussion: side question not delaying landing this patch. |
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
120 | yup - cheers |
Both versions are incorrect. They don't work as expected in case of (-0.0, 0.0), (0.0, -0.0) inputs. Because comiss and max treat negative and positive zeros as equal.
Take a look carefully. In case of equality we check the sign of the first value. Thus comparison using ucommis is ok.
There is a need of %cmp = icmp slt i32 %bc, 0 -> %cmp = icmp sge i32 %bc, 0 to make valid.
Yeah, I got the idea that generic case may be more efficient implemented in such way.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1006 | Do we have test coverage with SSE1 only? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1006 | Apparently, no. There is a fatal error: error in backend: Access past stack top! with doubles and +sse,-sse2 It seems I need to split float and double tests into two separate files to test SSE1 only. Are there better alternatives? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1006 | I'd be very tempted to limit float maximum/minimum to SSE2 or later tbh |
Make the format align with its context.