Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I can't find any lib calls or specific intrinsics for them. I'd like to add them separately if we have any users or needs of them, do we?
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1004–1005 | Make the format align with its context. | |
2133–2134 | ditto. | |
30266 | Put Max/Min together is a bit confusing. My first impression is it can return either +0 or -0 for a single comparison. | |
30295 | Need hasFP16() for f16. | |
34102 | ditto format. |
I heard glibc is supporting them. I'm fine with leaving it to the future. Do you have plan on the vector support?
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
5 | Guess you missed a AVX512F, i.e. AVX,AVX512,AVX512F |
Addressed formatting comments.
Check f16 explicitly even if avx512f16 implies avx512dq for now.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30295 | (VT == MVT::f16 && Subtarget.hasFP16()) || Subtarget.hasDQI() |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30295 | Sorry, mistake. It should be: |
Indeed, found this
- <math.h> functions for floating-point maximum and minimum, corresponding to new operations in IEEE 754-2019, and corresponding <tgmath.h> macros, are added from draft ISO C2X: fmaximum, fmaximum_num, fmaximum_mag, fmaximum_mag_num, fminimum, fminimum_num, fminimum_mag, fminimum_mag_num and corresponding functions for float, long double, _FloatN and _FloatNx.
About vector support. I want to try to implement vector support alternatively, transform floats into integers using shifts and compare them as integers preserving float semantics, even -0 < +0. I've tried this approach for scalars but current approach produces less code. Probably vectors can benefit more.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30295 | Yes, thank you, missed it as fp16 implies dq. We actually can check only VT == MVT::f16 because above the predicate Subtarget.hasFP16() && VT == MVT::f16 has been checked already. Do we want to avoid such implicit implication? Alternative is VT != MVT::f16 && Subtarget.hasDQI() || VT == MVT::f16 && Subtarget.hasFP16(). | |
34102 | Format doesn't work, we have 80+ chars if it's aligned with return. We don't need getNode, I missed it when moved from combine to lowering. |
LGTM.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30295 | Oh, we have checked hasFP16 in line 29876. So the way used here is correct. Sorry for the noise. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30233 | You should be able to drop this early-out - we should never get here | |
30240 | Again, you can drop this as the setOperationAction calls should ensure we never get here - replace it with an assertion if you're worried. | |
30278 | auto * | |
30280 | auto * | |
llvm/test/CodeGen/X86/avx512fp16-fminimum-fmaximum.ll | ||
154 | please can you add vector test coverage to ensure we scalarize? | |
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
963 | please can you add vector test coverage to ensure we scalarize? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
30240 | Yes, you are right. Everything must be handled in X86TargetLowering. Dropped these checks. | |
llvm/test/CodeGen/X86/avx512fp16-fminimum-fmaximum.ll | ||
154 | Yes, I've added them. Also it reminded me about several commented tests with the intrinsics. Uncommented them as well. |
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
711 | add nounwind attribute to get rid of the .cfi noise |
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
711 | I've allowed myself to add nounwind in half.ll since I've touched it. I think the attribute was missed. |
@pengfei @goldstein.w.n Any more comments?
llvm/test/CodeGen/X86/half.ll | ||
---|---|---|
957 | pre-commit the nounwind change to keep it separate from this patch - you shouldn't really need dso_local either |
I have a side question (not delaying landing this one)
It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?
Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...
llvm/test/CodeGen/X86/half.ll | ||
---|---|---|
957 | I haven't got commit access yet, here it is https://reviews.llvm.org/D149114 |
Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?
BTW. my measurements on micro benchmarking (I know micro might cause that branch predictor works good), the version https://godbolt.org/z/rEj9GPfnY is the best one for scalar but it cannot be implemented in SelectionDAG as has a CFG.
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
121 | Here is what I mentioned in terms of non-optimial vectorized version at least on AVX. |
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
121 | It is exactly what I said in the beginning of this discussion: side question not delaying landing this patch. |
llvm/test/CodeGen/X86/fminimum-fmaximum.ll | ||
---|---|---|
121 | yup - cheers |
Both versions are incorrect. They don't work as expected in case of (-0.0, 0.0), (0.0, -0.0) inputs. Because comiss and max treat negative and positive zeros as equal.
Take a look carefully. In case of equality we check the sign of the first value. Thus comparison using ucommis is ok.
There is a need of %cmp = icmp slt i32 %bc, 0 -> %cmp = icmp sge i32 %bc, 0 to make valid.
Yeah, I got the idea that generic case may be more efficient implemented in such way.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1006 | Do we have test coverage with SSE1 only? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1006 | Apparently, no. There is a fatal error: error in backend: Access past stack top! with doubles and +sse,-sse2 It seems I need to split float and double tests into two separate files to test SSE1 only. Are there better alternatives? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1006 | I'd be very tempted to limit float maximum/minimum to SSE2 or later tbh |