This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/X86/X86ISelLowering.cpp
53226	The code doesn't combine anything, should be moved to `LowerFMinimumFMaximum`?
53267	Should we not do it for `hasNoSignedZeros`?
llvm/test/CodeGen/X86/fminimum-fmaximum.ll
182	The test does show anything interesting.

pengfei added inline comments.Mar 8 2023, 6:43 PM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
182	does -> doesn't

Rebased.
Supported cases with nsz and nnan.
Updated tests.

e-kud added a reviewer: RKSimon.Mar 9 2023, 7:11 PM

e-kud marked 3 inline comments as done.

e-kud added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
53226	I've tried to find a better place to handle `ISD::FMAXIMUM` but there is no place in chain Selecting->Combining->Legalization. So, I was inspired by `ISD::FMAXNUM` that is actually lowered during combining. If you are about naming solely, it is not a big deal to change the name, but all callees in `PerformDAGCombine` are named as `combine*` even for `ISD::FMAXNUM`. Do we really want to have a single `LowerFMinimumFMaximum`?
llvm/test/CodeGen/X86/fminimum-fmaximum.ll
182	Probably, yes. I wanted to show that we are able to fold all these checks with constant arguments. But we've already tested folding of zero and nan checks. Original FMAX and FMIN are tested as well. Dropped it.

Harbormaster completed remote builds in B218575: Diff 504008.Mar 9 2023, 7:52 PM

Do you have plan to support minimumNumber?

llvm/lib/Target/X86/X86ISelLowering.cpp
53226	No, not the name. Did you try set action of `FMAXIMUM` to `Custom`? But I'm fine given it's similar to `combineFMinNumFMaxNum`.
53237	`Subtarget.hasFP16() && VT == MVT::f16`
53258	Better to give a table like above // Op1 Op1 // Num xNaN +0 -0 // ----------------- -------------- // Num \| Max \| qNaN \| +0 \| +0 \| +0 \| // Op0 ----------------- Op0 -------------- // xNaN \| qNaN \| qNaN \| -0 \| +0 \| -0 \| // ----------------- --------------
53284–53287	Can we check if it is 0 or NaN? We can put both in the second operand I think.

pengfei added inline comments.Mar 10 2023, 12:13 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
53284–53287	We can use `vfpclassps/d` on `AVX512DQ` to optimize it.

RKSimon added reviewers: pengfei, goldstein.w.n.Mar 10 2023, 3:34 AM

pengfei added inline comments.Mar 12 2023, 3:08 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
53226	Answer my previous question: `combineFMinNumFMaxNum` was intended here due to the reason described in D15294. I suppose the problem doesn't exist to `combineFMinimumFMaximum`. So I prefer to `LowerFMinimumFMaximum`.

In D145634#4183847, @pengfei wrote:

Do you have plan to support minimumNumber?

Sorry, I didn't get it. Could you be more specific?

llvm/lib/Target/X86/X86ISelLowering.cpp
53226	Yes, thank you. I was unaware of this mechanism. It works but I've needed to include extra logic because with `Custom` lowering there is no more `setcc` combining.
53284–53287	It was my first attempt to compare with 0 and NaN at the same time. We have two problems. The first is that comparison with zero returns ZF regardless positive or negative zero is provided. The second is that we still need to know whether the second operand is NaN or not. We may have `(0.0, NaN)` as arguments. We checked that the first op is not NaN and is zero. Replaced it with the second when the second is NaN. It seems to me that we always need to check both operands on NaN and one check on zero. We can use `vfpclassps/d` on `AVX512DQ` if one of operands is known never NaN. Working on it.

In D145634#4187888, @e-kud wrote:

In D145634#4183847, @pengfei wrote:

Do you have plan to support minimumNumber?

Sorry, I didn't get it. Could you be more specific?

It's accompanying function of minimum in IEEE-754 2019, it will be introduced in new C/C++ standard too. I thought you are working for that.

Rebased.
Moved from combine to lowering.
Supported f16 version.
Added optimization for avx512dq.
Added and updated tests.

In D145634#4187914, @pengfei wrote:

In D145634#4187888, @e-kud wrote:

In D145634#4183847, @pengfei wrote:

Do you have plan to support minimumNumber?

Sorry, I didn't get it. Could you be more specific?

It's accompanying function of minimum in IEEE-754 2019, it will be introduced in new C/C++ standard too. I thought you are working for that.

I can't find any lib calls or specific intrinsics for them. I'd like to add them separately if we have any users or needs of them, do we?

e-kud retitled this revision from [X86] Support llvm.{min,max}imum.f{32,64} to [X86] Support llvm.{min,max}imum.f{16,32,64}.Mar 14 2023, 4:26 PM

Harbormaster completed remote builds in B219510: Diff 505312.Mar 14 2023, 5:30 PM

Broke formatting for premerge checks

Harbormaster completed remote builds in B219532: Diff 505346.Mar 14 2023, 7:25 PM

pengfei added inline comments.Mar 14 2023, 8:20 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
1004–1005	Make the format align with its context.
2127–2128	ditto.
29897	Put Max/Min together is a bit confusing. My first impression is it can return either +0 or -0 for a single comparison.
29926	Need `hasFP16()` for `f16`.
33600	ditto format. Why do we need `getNode`?

In D145634#4194990, @e-kud wrote:

In D145634#4187914, @pengfei wrote:

In D145634#4187888, @e-kud wrote:

In D145634#4183847, @pengfei wrote:

Do you have plan to support minimumNumber?

Sorry, I didn't get it. Could you be more specific?

It's accompanying function of minimum in IEEE-754 2019, it will be introduced in new C/C++ standard too. I thought you are working for that.

I can't find any lib calls or specific intrinsics for them. I'd like to add them separately if we have any users or needs of them, do we?

I heard glibc is supporting them. I'm fine with leaving it to the future. Do you have plan on the vector support?

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
5	Guess you missed a AVX512F, i.e. `AVX,AVX512,AVX512F`

Addressed formatting comments.
Check f16 explicitly even if avx512f16 implies avx512dq for now.

pengfei added inline comments.Mar 15 2023, 8:08 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
29926	`(VT == MVT::f16 && Subtarget.hasFP16()) \|\| Subtarget.hasDQI()`

pengfei added inline comments.Mar 15 2023, 8:10 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
29926	Sorry, mistake. It should be: `(VT != MVT::f16 && Subtarget.hasDQI()) \|\| Subtarget.hasFP16()`

In D145634#4195338, @pengfei wrote:

In D145634#4194990, @e-kud wrote:

In D145634#4187914, @pengfei wrote:

In D145634#4187888, @e-kud wrote:

In D145634#4183847, @pengfei wrote:

Do you have plan to support minimumNumber?

Sorry, I didn't get it. Could you be more specific?

It's accompanying function of minimum in IEEE-754 2019, it will be introduced in new C/C++ standard too. I thought you are working for that.

I can't find any lib calls or specific intrinsics for them. I'd like to add them separately if we have any users or needs of them, do we?

I heard glibc is supporting them. I'm fine with leaving it to the future. Do you have plan on the vector support?

Indeed, found this

<math.h> functions for floating-point maximum and minimum, corresponding to new operations in IEEE 754-2019, and corresponding <tgmath.h> macros, are added from draft ISO C2X: fmaximum, fmaximum_num, fmaximum_mag, fmaximum_mag_num, fminimum, fminimum_num, fminimum_mag, fminimum_mag_num and corresponding functions for float, long double, _FloatN and _FloatNx.

About vector support. I want to try to implement vector support alternatively, transform floats into integers using shifts and compare them as integers preserving float semantics, even -0 < +0. I've tried this approach for scalars but current approach produces less code. Probably vectors can benefit more.

llvm/lib/Target/X86/X86ISelLowering.cpp
29926	Yes, thank you, missed it as `fp16` implies `dq`. We actually can check only `VT == MVT::f16` because above the predicate `Subtarget.hasFP16() && VT == MVT::f16` has been checked already. Do we want to avoid such implicit implication? Alternative is `VT != MVT::f16 && Subtarget.hasDQI() \|\| VT == MVT::f16 && Subtarget.hasFP16()`.
33600	Format doesn't work, we have 80+ chars if it's aligned with `return`. We don't need `getNode`, I missed it when moved from combine to lowering.

LGTM.

llvm/lib/Target/X86/X86ISelLowering.cpp
29926	Oh, we have checked `hasFP16` in line 29876. So the way used here is correct. Sorry for the noise.

This revision is now accepted and ready to land.Mar 15 2023, 8:13 PM

Harbormaster completed remote builds in B219777: Diff 505684.Mar 15 2023, 8:50 PM

@RKSimon @goldstein.w.n ping.

Support i686 target: can't use integer representation of double -0.0

Harbormaster completed remote builds in B223475: Diff 510659.Apr 3 2023, 8:39 PM

Rebased

Harbormaster completed remote builds in B224312: Diff 511826.Apr 7 2023, 5:16 PM

@RKSimon @goldstein.w.n ping

I think you can merge it. We don't need all reviewers sign off.

RKSimon added inline comments.Apr 13 2023, 8:01 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
29864	You should be able to drop this early-out - we should never get here
29871	Again, you can drop this as the setOperationAction calls should ensure we never get here - replace it with an assertion if you're worried.
29909	auto *
29911	auto *
llvm/test/CodeGen/X86/avx512fp16-fminimum-fmaximum.ll
153 ↗	(On Diff #511826)	please can you add vector test coverage to ensure we scalarize?
llvm/test/CodeGen/X86/fminimum-fmaximum.ll
963	please can you add vector test coverage to ensure we scalarize?

e-kud marked 6 inline comments as done.Apr 14 2023, 7:49 PM

e-kud added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
29871	Yes, you are right. Everything must be handled in `X86TargetLowering`. Dropped these checks.
llvm/test/CodeGen/X86/avx512fp16-fminimum-fmaximum.ll
153 ↗	(On Diff #511826)	Yes, I've added them. Also it reminded me about several commented tests with the intrinsics. Uncommented them as well.

Rebased.
Uncommented existing tests for the intrinsics.
Addressed to comments.

Harbormaster completed remote builds in B225797: Diff 513836.Apr 14 2023, 8:22 PM

RKSimon added inline comments.Apr 18 2023, 5:19 AM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
711	add nounwind attribute to get rid of the .cfi noise

e-kud marked an inline comment as done.Apr 18 2023, 5:19 PM

e-kud added inline comments.

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
711	I've allowed myself to add `nounwind` in `half.ll` since I've touched it. I think the attribute was missed.

Rebased.
Added nounwind attribute to tests.

Harbormaster completed remote builds in B226491: Diff 514793.Apr 18 2023, 6:34 PM

nikic mentioned this in D148691: [X86] Add lowering for fp minimum/maximum.Apr 19 2023, 12:36 AM

skatkov added a subscriber: skatkov.Apr 19 2023, 7:51 PM

@pengfei @goldstein.w.n Any more comments?

llvm/test/CodeGen/X86/half.ll
957 ↗	(On Diff #514793)	pre-commit the nounwind change to keep it separate from this patch - you shouldn't really need dso_local either

pengfei accepted this revision.Apr 23 2023, 4:08 AM

This revision is now accepted and ready to land.Apr 23 2023, 4:08 AM

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Rebased.
Excluded refactor of half.ll.

Harbormaster completed remote builds in B227863: Diff 516594.Apr 24 2023, 7:22 PM

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

llvm/test/CodeGen/X86/half.ll
957 ↗	(On Diff #514793)	I haven't got commit access yet, here it is https://reviews.llvm.org/D149114

In D145634#4294264, @e-kud wrote:

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?

In D145634#4294383, @skatkov wrote:

In D145634#4294264, @e-kud wrote:

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?

BTW. my measurements on micro benchmarking (I know micro might cause that branch predictor works good), the version https://godbolt.org/z/rEj9GPfnY is the best one for scalar but it cannot be implemented in SelectionDAG as has a CFG.

skatkov added inline comments.Apr 24 2023, 9:06 PM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
121	Here is what I mentioned in terms of non-optimial vectorized version at least on AVX.

LGTM

llvm/lib/Target/X86/X86ISelLowering.cpp
29971	(style) remove braces from single line if()
llvm/test/CodeGen/X86/fminimum-fmaximum.ll
121	For now we just want x86 to support the intrinsics, vector optimization is better handled as a followup.

skatkov added inline comments.Apr 25 2023, 1:09 AM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
121	It is exactly what I said in the beginning of this discussion: side question not delaying landing this patch.

RKSimon added inline comments.Apr 25 2023, 1:58 AM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
121	yup - cheers

Fix single line if style.

In D145634#4294388, @skatkov wrote:

In D145634#4294383, @skatkov wrote:

In D145634#4294264, @e-kud wrote:

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?

BTW. my measurements on micro benchmarking (I know micro might cause that branch predictor works good), the version https://godbolt.org/z/rEj9GPfnY is the best one for scalar but it cannot be implemented in SelectionDAG as has a CFG.

Both versions are incorrect. They don't work as expected in case of (-0.0, 0.0), (0.0, -0.0) inputs. Because comiss and max treat negative and positive zeros as equal.

Harbormaster completed remote builds in B228010: Diff 516778.Apr 25 2023, 7:07 AM

In D145634#4295568, @e-kud wrote:

In D145634#4294388, @skatkov wrote:

In D145634#4294383, @skatkov wrote:

In D145634#4294264, @e-kud wrote:

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?

BTW. my measurements on micro benchmarking (I know micro might cause that branch predictor works good), the version https://godbolt.org/z/rEj9GPfnY is the best one for scalar but it cannot be implemented in SelectionDAG as has a CFG.

Both versions are incorrect. They don't work as expected in case of (-0.0, 0.0), (0.0, -0.0) inputs. Because comiss and max treat negative and positive zeros as equal.

Take a look carefully. In case of equality we check the sign of the first value. Thus comparison using ucommis is ok.

In D145634#4295747, @skatkov wrote:

In D145634#4295568, @e-kud wrote:

In D145634#4294388, @skatkov wrote:

In D145634#4294383, @skatkov wrote:

In D145634#4294264, @e-kud wrote:

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?

BTW. my measurements on micro benchmarking (I know micro might cause that branch predictor works good), the version https://godbolt.org/z/rEj9GPfnY is the best one for scalar but it cannot be implemented in SelectionDAG as has a CFG.

Both versions are incorrect. They don't work as expected in case of (-0.0, 0.0), (0.0, -0.0) inputs. Because comiss and max treat negative and positive zeros as equal.

Take a look carefully. In case of equality we check the sign of the first value. Thus comparison using ucommis is ok.

There is a need of %cmp = icmp slt i32 %bc, 0 -> %cmp = icmp sge i32 %bc, 0 to make valid.
Yeah, I got the idea that generic case may be more efficient implemented in such way.

skatkov added a child revision: D149729: [X86] Avoid usage constant NaN for fminimum/fmaximum lowering.May 3 2023, 12:56 AM

I wonder whether any problems with landing this patch?

RKSimon added inline comments.May 4 2023, 5:43 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
1006	Do we have test coverage with SSE1 only?

This revision was landed with ongoing or failed builds.May 4 2023, 6:05 AM

Closed by commit rGa82d27a9a685: [X86] Support llvm.{min,max}imum.f{16,32,64} (authored by e-kud, committed by pengfei). · Explain Why

This revision was automatically updated to reflect the committed changes.

pengfei added a commit: rGa82d27a9a685: [X86] Support llvm.{min,max}imum.f{16,32,64}.

e-kud marked an inline comment as done.May 4 2023, 6:40 AM

e-kud added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
1006	Apparently, no. There is a `fatal error: error in backend: Access past stack top!` with `double`s and `+sse,-sse2` It seems I need to split `float` and `double` tests into two separate files to test SSE1 only. Are there better alternatives?

RKSimon added inline comments.May 4 2023, 7:08 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
1006	I'd be very tempted to limit float maximum/minimum to SSE2 or later tbh

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

80 lines

test/

CodeGen/

X86/

fminimum-fmaximum.ll

491 lines

Diff 504008

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 995 Lines • ▼ Show 20 Lines	X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
if (!Subtarget.useSoftFloat() && Subtarget.hasMMX()) {		if (!Subtarget.useSoftFloat() && Subtarget.hasMMX()) {
addRegisterClass(MVT::x86mmx, &X86::VR64RegClass);		addRegisterClass(MVT::x86mmx, &X86::VR64RegClass);
// No operations on x86mmx supported, everything uses intrinsics.		// No operations on x86mmx supported, everything uses intrinsics.
}		}

if (!Subtarget.useSoftFloat() && Subtarget.hasSSE1()) {		if (!Subtarget.useSoftFloat() && Subtarget.hasSSE1()) {
addRegisterClass(MVT::v4f32, Subtarget.hasVLX() ? &X86::VR128XRegClass		addRegisterClass(MVT::v4f32, Subtarget.hasVLX() ? &X86::VR128XRegClass
: &X86::VR128RegClass);		: &X86::VR128RegClass);

setOperationAction(ISD::FNEG, MVT::v4f32, Custom);		setOperationAction(ISD::FNEG, MVT::v4f32, Custom);
		pengfeiUnsubmitted Done Reply Inline Actions Make the format align with its context. pengfei: Make the format align with its context.
setOperationAction(ISD::FABS, MVT::v4f32, Custom);		setOperationAction(ISD::FABS, MVT::v4f32, Custom);
		RKSimonUnsubmitted Not Done Reply Inline Actions Do we have test coverage with SSE1 only? RKSimon: Do we have test coverage with SSE1 only?
		e-kudAuthorUnsubmitted Done Reply Inline Actions Apparently, no. There is a `fatal error: error in backend: Access past stack top!` with `double`s and `+sse,-sse2` It seems I need to split `float` and `double` tests into two separate files to test SSE1 only. Are there better alternatives? e-kud: Apparently, no. There is a `fatal error: error in backend: Access past stack top!` with…
		RKSimonUnsubmitted Not Done Reply Inline Actions I'd be very tempted to limit float maximum/minimum to SSE2 or later tbh RKSimon: I'd be very tempted to limit float maximum/minimum to SSE2 or later tbh
setOperationAction(ISD::FCOPYSIGN, MVT::v4f32, Custom);		setOperationAction(ISD::FCOPYSIGN, MVT::v4f32, Custom);
setOperationAction(ISD::BUILD_VECTOR, MVT::v4f32, Custom);		setOperationAction(ISD::BUILD_VECTOR, MVT::v4f32, Custom);
setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v4f32, Custom);		setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v4f32, Custom);
setOperationAction(ISD::VSELECT, MVT::v4f32, Custom);		setOperationAction(ISD::VSELECT, MVT::v4f32, Custom);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4f32, Custom);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4f32, Custom);
setOperationAction(ISD::SELECT, MVT::v4f32, Custom);		setOperationAction(ISD::SELECT, MVT::v4f32, Custom);

setOperationAction(ISD::LOAD, MVT::v2f32, Custom);		setOperationAction(ISD::LOAD, MVT::v2f32, Custom);
▲ Show 20 Lines • Show All 1,104 Lines • ▼ Show 20 Lines	if (!Subtarget.useSoftFloat() && Subtarget.hasFP16()) {
setOperationAction(ISD::SETCC, MVT::f16, Custom);		setOperationAction(ISD::SETCC, MVT::f16, Custom);
setOperationAction(ISD::STRICT_FSETCC, MVT::f16, Custom);		setOperationAction(ISD::STRICT_FSETCC, MVT::f16, Custom);
setOperationAction(ISD::STRICT_FSETCCS, MVT::f16, Custom);		setOperationAction(ISD::STRICT_FSETCCS, MVT::f16, Custom);
setOperationAction(ISD::STRICT_FROUND, MVT::f16, Promote);		setOperationAction(ISD::STRICT_FROUND, MVT::f16, Promote);
setOperationAction(ISD::FROUNDEVEN, MVT::f16, Legal);		setOperationAction(ISD::FROUNDEVEN, MVT::f16, Legal);
setOperationAction(ISD::STRICT_FROUNDEVEN, MVT::f16, Legal);		setOperationAction(ISD::STRICT_FROUNDEVEN, MVT::f16, Legal);
setOperationAction(ISD::FP_ROUND, MVT::f16, Custom);		setOperationAction(ISD::FP_ROUND, MVT::f16, Custom);
setOperationAction(ISD::STRICT_FP_ROUND, MVT::f16, Custom);		setOperationAction(ISD::STRICT_FP_ROUND, MVT::f16, Custom);
setOperationAction(ISD::FP_EXTEND, MVT::f32, Legal);		setOperationAction(ISD::FP_EXTEND, MVT::f32, Legal);
setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f32, Legal);		setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f32, Legal);
		pengfeiUnsubmitted Done Reply Inline Actions ditto. pengfei: ditto.

setCondCodeAction(ISD::SETOEQ, MVT::f16, Expand);		setCondCodeAction(ISD::SETOEQ, MVT::f16, Expand);
setCondCodeAction(ISD::SETUNE, MVT::f16, Expand);		setCondCodeAction(ISD::SETUNE, MVT::f16, Expand);

if (Subtarget.useAVX512Regs()) {		if (Subtarget.useAVX512Regs()) {
setGroup(MVT::v32f16);		setGroup(MVT::v32f16);
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v32f16, Custom);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v32f16, Custom);
setOperationAction(ISD::SINT_TO_FP, MVT::v32i16, Legal);		setOperationAction(ISD::SINT_TO_FP, MVT::v32i16, Legal);
▲ Show 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	setTargetDAGCombine({ISD::VECTOR_SHUFFLE,
ISD::ADD,		ISD::ADD,
ISD::FADD,		ISD::FADD,
ISD::FSUB,		ISD::FSUB,
ISD::FNEG,		ISD::FNEG,
ISD::FMA,		ISD::FMA,
ISD::STRICT_FMA,		ISD::STRICT_FMA,
ISD::FMINNUM,		ISD::FMINNUM,
ISD::FMAXNUM,		ISD::FMAXNUM,
		ISD::FMINIMUM,
		ISD::FMAXIMUM,
ISD::SUB,		ISD::SUB,
ISD::LOAD,		ISD::LOAD,
ISD::MLOAD,		ISD::MLOAD,
ISD::STORE,		ISD::STORE,
ISD::MSTORE,		ISD::MSTORE,
ISD::TRUNCATE,		ISD::TRUNCATE,
ISD::ZERO_EXTEND,		ISD::ZERO_EXTEND,
ISD::ANY_EXTEND,		ISD::ANY_EXTEND,
▲ Show 20 Lines • Show All 27,428 Lines • ▼ Show 20 Lines	static SDValue LowerMINMAX(SDValue Op, const X86Subtarget &Subtarget,
return SDValue();		return SDValue();
}		}

static SDValue LowerABD(SDValue Op, const X86Subtarget &Subtarget,		static SDValue LowerABD(SDValue Op, const X86Subtarget &Subtarget,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();

// For AVX1 cases, split to use legal ops.		// For AVX1 cases, split to use legal ops.
if (VT.is256BitVector() && !Subtarget.hasInt256())		if (VT.is256BitVector() && !Subtarget.hasInt256())
		RKSimonUnsubmitted Done Reply Inline Actions You should be able to drop this early-out - we should never get here RKSimon: You should be able to drop this early-out - we should never get here
return splitVectorIntBinary(Op, DAG);		return splitVectorIntBinary(Op, DAG);

if ((VT == MVT::v32i16 \|\| VT == MVT::v64i8) && !Subtarget.useBWIRegs())		if ((VT == MVT::v32i16 \|\| VT == MVT::v64i8) && !Subtarget.useBWIRegs())
return splitVectorIntBinary(Op, DAG);		return splitVectorIntBinary(Op, DAG);

// TODO: Add TargetLowering expandABD() support.		// TODO: Add TargetLowering expandABD() support.
SDLoc dl(Op);		SDLoc dl(Op);
		RKSimonUnsubmitted Done Reply Inline Actions Again, you can drop this as the setOperationAction calls should ensure we never get here - replace it with an assertion if you're worried. RKSimon: Again, you can drop this as the setOperationAction calls should ensure we never get here…
		e-kudAuthorUnsubmitted Done Reply Inline Actions Yes, you are right. Everything must be handled in `X86TargetLowering`. Dropped these checks. e-kud: Yes, you are right. Everything must be handled in `X86TargetLowering`. Dropped these checks.
bool IsSigned = Op.getOpcode() == ISD::ABDS;		bool IsSigned = Op.getOpcode() == ISD::ABDS;
SDValue LHS = DAG.getFreeze(Op.getOperand(0));		SDValue LHS = DAG.getFreeze(Op.getOperand(0));
SDValue RHS = DAG.getFreeze(Op.getOperand(1));		SDValue RHS = DAG.getFreeze(Op.getOperand(1));
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

// abds(lhs, rhs) -> sub(smax(lhs,rhs), smin(lhs,rhs))		// abds(lhs, rhs) -> sub(smax(lhs,rhs), smin(lhs,rhs))
// abdu(lhs, rhs) -> sub(umax(lhs,rhs), umin(lhs,rhs))		// abdu(lhs, rhs) -> sub(umax(lhs,rhs), umin(lhs,rhs))
unsigned MaxOpc = IsSigned ? ISD::SMAX : ISD::UMAX;		unsigned MaxOpc = IsSigned ? ISD::SMAX : ISD::UMAX;
Show All 9 Lines	static SDValue LowerABD(SDValue Op, const X86Subtarget &Subtarget,
EVT CCVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);		EVT CCVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
ISD::CondCode CC = IsSigned ? ISD::CondCode::SETGT : ISD::CondCode::SETUGT;		ISD::CondCode CC = IsSigned ? ISD::CondCode::SETGT : ISD::CondCode::SETUGT;
SDValue Cmp = DAG.getSetCC(dl, CCVT, LHS, RHS, CC);		SDValue Cmp = DAG.getSetCC(dl, CCVT, LHS, RHS, CC);
return DAG.getSelect(dl, VT, Cmp, DAG.getNode(ISD::SUB, dl, VT, LHS, RHS),		return DAG.getSelect(dl, VT, Cmp, DAG.getNode(ISD::SUB, dl, VT, LHS, RHS),
DAG.getNode(ISD::SUB, dl, VT, RHS, LHS));		DAG.getNode(ISD::SUB, dl, VT, RHS, LHS));
}		}

static SDValue LowerMUL(SDValue Op, const X86Subtarget &Subtarget,		static SDValue LowerMUL(SDValue Op, const X86Subtarget &Subtarget,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
		pengfeiUnsubmitted Done Reply Inline Actions Put Max/Min together is a bit confusing. My first impression is it can return either +0 or -0 for a single comparison. pengfei: Put Max/Min together is a bit confusing. My first impression is it can return either +0 or -0…
SDLoc dl(Op);		SDLoc dl(Op);
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();

// Decompose 256-bit ops into 128-bit ops.		// Decompose 256-bit ops into 128-bit ops.
if (VT.is256BitVector() && !Subtarget.hasInt256())		if (VT.is256BitVector() && !Subtarget.hasInt256())
return splitVectorIntBinary(Op, DAG);		return splitVectorIntBinary(Op, DAG);

if ((VT == MVT::v32i16 \|\| VT == MVT::v64i8) && !Subtarget.hasBWI())		if ((VT == MVT::v32i16 \|\| VT == MVT::v64i8) && !Subtarget.hasBWI())
return splitVectorIntBinary(Op, DAG);		return splitVectorIntBinary(Op, DAG);

SDValue A = Op.getOperand(0);		SDValue A = Op.getOperand(0);
SDValue B = Op.getOperand(1);		SDValue B = Op.getOperand(1);
		RKSimonUnsubmitted Done Reply Inline Actions auto * RKSimon: auto *

// Lower v16i8/v32i8/v64i8 mul as sign-extension to v8i16/v16i16/v32i16		// Lower v16i8/v32i8/v64i8 mul as sign-extension to v8i16/v16i16/v32i16
		RKSimonUnsubmitted Done Reply Inline Actions auto * RKSimon: auto *
// vector pairs, multiply and truncate.		// vector pairs, multiply and truncate.
if (VT == MVT::v16i8 \|\| VT == MVT::v32i8 \|\| VT == MVT::v64i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v32i8 \|\| VT == MVT::v64i8) {
unsigned NumElts = VT.getVectorNumElements();		unsigned NumElts = VT.getVectorNumElements();

if ((VT == MVT::v16i8 && Subtarget.hasInt256()) \|\|		if ((VT == MVT::v16i8 && Subtarget.hasInt256()) \|\|
(VT == MVT::v32i8 && Subtarget.canExtendTo512BW())) {		(VT == MVT::v32i8 && Subtarget.canExtendTo512BW())) {
MVT ExVT = MVT::getVectorVT(MVT::i16, VT.getVectorNumElements());		MVT ExVT = MVT::getVectorVT(MVT::i16, VT.getVectorNumElements());
return DAG.getNode(		return DAG.getNode(
ISD::TRUNCATE, dl, VT,		ISD::TRUNCATE, dl, VT,
DAG.getNode(ISD::MUL, dl, ExVT,		DAG.getNode(ISD::MUL, dl, ExVT,
DAG.getNode(ISD::ANY_EXTEND, dl, ExVT, A),		DAG.getNode(ISD::ANY_EXTEND, dl, ExVT, A),
DAG.getNode(ISD::ANY_EXTEND, dl, ExVT, B)));		DAG.getNode(ISD::ANY_EXTEND, dl, ExVT, B)));
}		}

MVT ExVT = MVT::getVectorVT(MVT::i16, NumElts / 2);		MVT ExVT = MVT::getVectorVT(MVT::i16, NumElts / 2);
		pengfeiUnsubmitted Done Reply Inline Actions Need `hasFP16()` for `f16`. pengfei: Need `hasFP16()` for `f16`.
		e-kudAuthorUnsubmitted Done Reply Inline Actions Yes, thank you, missed it as `fp16` implies `dq`. We actually can check only `VT == MVT::f16` because above the predicate `Subtarget.hasFP16() && VT == MVT::f16` has been checked already. Do we want to avoid such implicit implication? Alternative is `VT != MVT::f16 && Subtarget.hasDQI() \|\| VT == MVT::f16 && Subtarget.hasFP16()`. e-kud: Yes, thank you, missed it as `fp16` implies `dq`. We actually can check only `VT == MVT::f16`…
		pengfeiUnsubmitted Done Reply Inline Actions `(VT == MVT::f16 && Subtarget.hasFP16()) \|\| Subtarget.hasDQI()` pengfei: `(VT == MVT::f16 && Subtarget.hasFP16()) \|\| Subtarget.hasDQI()`
		pengfeiUnsubmitted Done Reply Inline Actions Sorry, mistake. It should be: `(VT != MVT::f16 && Subtarget.hasDQI()) \|\| Subtarget.hasFP16()` pengfei: Sorry, mistake. It should be: `(VT != MVT::f16 && Subtarget.hasDQI()) \|\| Subtarget.hasFP16()`
		pengfeiUnsubmitted Done Reply Inline Actions Oh, we have checked `hasFP16` in line 29876. So the way used here is correct. Sorry for the noise. pengfei: Oh, we have checked `hasFP16` in line 29876. So the way used here is correct. Sorry for the…

// Extract the lo/hi parts to any extend to i16.		// Extract the lo/hi parts to any extend to i16.
// We're going to mask off the low byte of each result element of the		// We're going to mask off the low byte of each result element of the
// pmullw, so it doesn't matter what's in the high byte of each 16-bit		// pmullw, so it doesn't matter what's in the high byte of each 16-bit
// element.		// element.
SDValue Undef = DAG.getUNDEF(VT);		SDValue Undef = DAG.getUNDEF(VT);
SDValue ALo = DAG.getBitcast(ExVT, getUnpackl(DAG, dl, VT, A, Undef));		SDValue ALo = DAG.getBitcast(ExVT, getUnpackl(DAG, dl, VT, A, Undef));
SDValue AHi = DAG.getBitcast(ExVT, getUnpackh(DAG, dl, VT, A, Undef));		SDValue AHi = DAG.getBitcast(ExVT, getUnpackh(DAG, dl, VT, A, Undef));
Show All 28 Lines	static SDValue LowerMUL(SDValue Op, const X86Subtarget &Subtarget,
if (VT == MVT::v4i32) {		if (VT == MVT::v4i32) {
assert(Subtarget.hasSSE2() && !Subtarget.hasSSE41() &&		assert(Subtarget.hasSSE2() && !Subtarget.hasSSE41() &&
"Should not custom lower when pmulld is available!");		"Should not custom lower when pmulld is available!");

// Extract the odd parts.		// Extract the odd parts.
static const int UnpackMask[] = { 1, -1, 3, -1 };		static const int UnpackMask[] = { 1, -1, 3, -1 };
SDValue Aodds = DAG.getVectorShuffle(VT, dl, A, A, UnpackMask);		SDValue Aodds = DAG.getVectorShuffle(VT, dl, A, A, UnpackMask);
SDValue Bodds = DAG.getVectorShuffle(VT, dl, B, B, UnpackMask);		SDValue Bodds = DAG.getVectorShuffle(VT, dl, B, B, UnpackMask);

		RKSimonUnsubmitted Done Reply Inline Actions (style) remove braces from single line if() RKSimon: (style) remove braces from single line if()
// Multiply the even parts.		// Multiply the even parts.
SDValue Evens = DAG.getNode(X86ISD::PMULUDQ, dl, MVT::v2i64,		SDValue Evens = DAG.getNode(X86ISD::PMULUDQ, dl, MVT::v2i64,
DAG.getBitcast(MVT::v2i64, A),		DAG.getBitcast(MVT::v2i64, A),
DAG.getBitcast(MVT::v2i64, B));		DAG.getBitcast(MVT::v2i64, B));
// Now multiply odd parts.		// Now multiply odd parts.
SDValue Odds = DAG.getNode(X86ISD::PMULUDQ, dl, MVT::v2i64,		SDValue Odds = DAG.getNode(X86ISD::PMULUDQ, dl, MVT::v2i64,
DAG.getBitcast(MVT::v2i64, Aodds),		DAG.getBitcast(MVT::v2i64, Aodds),
DAG.getBitcast(MVT::v2i64, Bodds));		DAG.getBitcast(MVT::v2i64, Bodds));
▲ Show 20 Lines • Show All 3,612 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::USUBSAT:		case ISD::USUBSAT:
case ISD::SSUBSAT: return LowerADDSAT_SUBSAT(Op, DAG, Subtarget);		case ISD::SSUBSAT: return LowerADDSAT_SUBSAT(Op, DAG, Subtarget);
case ISD::SMAX:		case ISD::SMAX:
case ISD::SMIN:		case ISD::SMIN:
case ISD::UMAX:		case ISD::UMAX:
case ISD::UMIN: return LowerMINMAX(Op, Subtarget, DAG);		case ISD::UMIN: return LowerMINMAX(Op, Subtarget, DAG);
case ISD::ABS: return LowerABS(Op, Subtarget, DAG);		case ISD::ABS: return LowerABS(Op, Subtarget, DAG);
case ISD::ABDS:		case ISD::ABDS:
case ISD::ABDU: return LowerABD(Op, Subtarget, DAG);		case ISD::ABDU: return LowerABD(Op, Subtarget, DAG);
		pengfeiUnsubmitted Done Reply Inline Actions ditto format. Why do we need `getNode`? pengfei: ditto format. Why do we need `getNode`?
		e-kudAuthorUnsubmitted Done Reply Inline Actions Format doesn't work, we have 80+ chars if it's aligned with `return`. We don't need `getNode`, I missed it when moved from combine to lowering. e-kud: Format doesn't work, we have 80+ chars if it's aligned with `return`. We don't need `getNode`…
case ISD::AVGCEILU: return LowerAVG(Op, Subtarget, DAG);		case ISD::AVGCEILU: return LowerAVG(Op, Subtarget, DAG);
case ISD::FSINCOS: return LowerFSINCOS(Op, Subtarget, DAG);		case ISD::FSINCOS: return LowerFSINCOS(Op, Subtarget, DAG);
case ISD::MLOAD: return LowerMLOAD(Op, Subtarget, DAG);		case ISD::MLOAD: return LowerMLOAD(Op, Subtarget, DAG);
case ISD::MSTORE: return LowerMSTORE(Op, Subtarget, DAG);		case ISD::MSTORE: return LowerMSTORE(Op, Subtarget, DAG);
case ISD::MGATHER: return LowerMGATHER(Op, Subtarget, DAG);		case ISD::MGATHER: return LowerMGATHER(Op, Subtarget, DAG);
case ISD::MSCATTER: return LowerMSCATTER(Op, Subtarget, DAG);		case ISD::MSCATTER: return LowerMSCATTER(Op, Subtarget, DAG);
case ISD::GC_TRANSITION_START:		case ISD::GC_TRANSITION_START:
case ISD::GC_TRANSITION_END: return LowerGC_TRANSITION(Op, DAG);		case ISD::GC_TRANSITION_END: return LowerGC_TRANSITION(Op, DAG);
▲ Show 20 Lines • Show All 19,609 Lines • ▼ Show 20 Lines	static SDValue combineFMinNumFMaxNum(SDNode *N, SelectionDAG &DAG,
SDValue MinOrMax = DAG.getNode(MinMaxOp, DL, VT, Op1, Op0);		SDValue MinOrMax = DAG.getNode(MinMaxOp, DL, VT, Op1, Op0);
SDValue IsOp0Nan = DAG.getSetCC(DL, SetCCType, Op0, Op0, ISD::SETUO);		SDValue IsOp0Nan = DAG.getSetCC(DL, SetCCType, Op0, Op0, ISD::SETUO);

// If Op0 is a NaN, select Op1. Otherwise, select the max. If both operands		// If Op0 is a NaN, select Op1. Otherwise, select the max. If both operands
// are NaN, the NaN value of Op1 is the result.		// are NaN, the NaN value of Op1 is the result.
return DAG.getSelect(DL, VT, IsOp0Nan, Op1, MinOrMax);		return DAG.getSelect(DL, VT, IsOp0Nan, Op1, MinOrMax);
}		}

		static SDValue combineFMinimumFMaximum(SDNode *N, SelectionDAG &DAG,
		pengfeiUnsubmitted Done Reply Inline Actions The code doesn't combine anything, should be moved to `LowerFMinimumFMaximum`? pengfei: The code doesn't combine anything, should be moved to `LowerFMinimumFMaximum`?
		e-kudAuthorUnsubmitted Done Reply Inline Actions I've tried to find a better place to handle `ISD::FMAXIMUM` but there is no place in chain Selecting->Combining->Legalization. So, I was inspired by `ISD::FMAXNUM` that is actually lowered during combining. If you are about naming solely, it is not a big deal to change the name, but all callees in `PerformDAGCombine` are named as `combine` even for `ISD::FMAXNUM`. Do we really want to have a single `LowerFMinimumFMaximum`? e-kud:* I've tried to find a better place to handle `ISD::FMAXIMUM` but there is no place in chain…
		pengfeiUnsubmitted Done Reply Inline Actions No, not the name. Did you try set action of `FMAXIMUM` to `Custom`? But I'm fine given it's similar to `combineFMinNumFMaxNum`. pengfei: No, not the name. Did you try set action of `FMAXIMUM` to `Custom`? But I'm fine given it's…
		pengfeiUnsubmitted Done Reply Inline Actions Answer my previous question: `combineFMinNumFMaxNum` was intended here due to the reason described in D15294. I suppose the problem doesn't exist to `combineFMinimumFMaximum`. So I prefer to `LowerFMinimumFMaximum`. pengfei: Answer my previous question: `combineFMinNumFMaxNum` was intended here due to the reason…
		e-kudAuthorUnsubmitted Done Reply Inline Actions Yes, thank you. I was unaware of this mechanism. It works but I've needed to include extra logic because with `Custom` lowering there is no more `setcc` combining. e-kud: Yes, thank you. I was unaware of this mechanism. It works but I've needed to include extra…
		const X86Subtarget &Subtarget) {
		assert((N->getOpcode() == ISD::FMAXIMUM \|\|
		N->getOpcode() == ISD::FMINIMUM) &&
		"Expected FMAXIMUM or FMINIMUM opcode");
		EVT VT = N->getValueType(0);
		if (Subtarget.useSoftFloat())
		return SDValue();

		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

		if (!((Subtarget.hasSSE1() && VT == MVT::f32) \|\|
		pengfeiUnsubmitted Done Reply Inline Actions `Subtarget.hasFP16() && VT == MVT::f16` pengfei: `Subtarget.hasFP16() && VT == MVT::f16`
		(Subtarget.hasSSE2() && VT == MVT::f64)))
		return SDValue();

		SDValue Op0 = N->getOperand(0);
		SDValue Op1 = N->getOperand(1);
		SDLoc DL(N);
		uint64_t SizeInBits = VT.getFixedSizeInBits();
		APInt PreferredZero;
		EVT IVT = MVT::getIntegerVT(SizeInBits);
		X86ISD::NodeType MinMaxOp;
		if (N->getOpcode() == ISD::FMAXIMUM) {
		PreferredZero = APInt::getZero(SizeInBits);
		MinMaxOp = X86ISD::FMAX;
		} else {
		PreferredZero = APInt::getSignedMinValue(SizeInBits);
		MinMaxOp = X86ISD::FMIN;
		}
		EVT SetCCType = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
		VT);

		// We reuse FMAX and FMIN operations which are not commutative. They return
		pengfeiUnsubmitted Done Reply Inline Actions Better to give a table like above // Op1 Op1 // Num xNaN +0 -0 // ----------------- -------------- // Num \| Max \| qNaN \| +0 \| +0 \| +0 \| // Op0 ----------------- Op0 -------------- // xNaN \| qNaN \| qNaN \| -0 \| +0 \| -0 \| // ----------------- -------------- pengfei: Better to give a table like above ``` // Op1 Op1…
		// the second operand if at least one of operands is NaN or if both operands
		// are equal (even if they are zeroes with different sign).
		//
		// Here we try to determine the correct order.
		//
		// We check if any of operands is NaN and return NaN. Then we check if any of
		// operands is zero or negative zero (for fmaximum and fminimum respectively)
		// to ensure the correct zero is returned.
		auto IsPreferredZero = [PreferredZero](SDValue Op) {
		pengfeiUnsubmitted Done Reply Inline Actions Should we not do it for `hasNoSignedZeros`? pengfei: Should we not do it for `hasNoSignedZeros`?
		Op = peekThroughBitcasts(Op);
		if (ConstantFPSDNode *CstOp = dyn_cast<ConstantFPSDNode>(Op))
		return CstOp->getValueAPF().bitcastToAPInt() == PreferredZero;
		if (ConstantSDNode *CstOp = dyn_cast<ConstantSDNode>(Op))
		return CstOp->getAPIntValue() == PreferredZero;
		return false;
		};

		SDValue MinMax;
		if (DAG.getTarget().Options.NoSignedZerosFPMath \|\|
		N->getFlags().hasNoSignedZeros() \|\|
		IsPreferredZero(Op1)) {
		MinMax = DAG.getNode(MinMaxOp, DL, VT, Op0, Op1, N->getFlags());
		} else if (IsPreferredZero(Op0)) {
		MinMax = DAG.getNode(MinMaxOp, DL, VT, Op1, Op0, N->getFlags());
		} else {
		SDValue IsOp0Zero = DAG.getSetCC(DL, SetCCType,
		DAG.getNode(ISD::BITCAST, DL, IVT, Op0),
		DAG.getConstant(PreferredZero, DL, IVT),
		ISD::SETEQ);
		pengfeiUnsubmitted Done Reply Inline Actions Can we check if it is 0 or NaN? We can put both in the second operand I think. pengfei: Can we check if it is 0 or NaN? We can put both in the second operand I think.
		pengfeiUnsubmitted Done Reply Inline Actions We can use `vfpclassps/d` on `AVX512DQ` to optimize it. pengfei: We can use `vfpclassps/d` on `AVX512DQ` to optimize it.
		e-kudAuthorUnsubmitted Done Reply Inline Actions It was my first attempt to compare with 0 and NaN at the same time. We have two problems. The first is that comparison with zero returns ZF regardless positive or negative zero is provided. The second is that we still need to know whether the second operand is NaN or not. We may have `(0.0, NaN)` as arguments. We checked that the first op is not NaN and is zero. Replaced it with the second when the second is NaN. It seems to me that we always need to check both operands on NaN and one check on zero. We can use `vfpclassps/d` on `AVX512DQ` if one of operands is known never NaN. Working on it. e-kud: It was my first attempt to compare with 0 and NaN at the same time. We have two problems. The…
		SDValue NewOp0 = DAG.getSelect(DL, VT, IsOp0Zero, Op1, Op0);
		SDValue NewOp1 = DAG.getSelect(DL, VT, IsOp0Zero, Op0, Op1);
		MinMax = DAG.getNode(MinMaxOp, DL, VT, NewOp0, NewOp1, N->getFlags());
		}

		if (DAG.isKnownNeverNaN(Op0) && DAG.isKnownNeverNaN(Op1)) {
		return MinMax;
		}

		APFloat NaNValue = APFloat::getNaN(VT == MVT::f32 ? APFloat::IEEEsingle() : APFloat::IEEEdouble());
		SDValue isNan = DAG.getSetCC(DL, SetCCType, Op0, Op1, ISD::SETUO);
		return DAG.getSelect(DL, VT, isNan, DAG.getConstantFP(NaNValue, DL, VT), MinMax);
		}

static SDValue combineX86INT_TO_FP(SDNode *N, SelectionDAG &DAG,		static SDValue combineX86INT_TO_FP(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

APInt DemandedElts = APInt::getAllOnes(VT.getVectorNumElements());		APInt DemandedElts = APInt::getAllOnes(VT.getVectorNumElements());
if (TLI.SimplifyDemandedVectorElts(SDValue(N, 0), DemandedElts, DCI))		if (TLI.SimplifyDemandedVectorElts(SDValue(N, 0), DemandedElts, DCI))
return SDValue(N, 0);		return SDValue(N, 0);
▲ Show 20 Lines • Show All 4,019 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::PerformDAGCombine(SDNode *N,
case X86ISD::FAND: return combineFAnd(N, DAG, Subtarget);		case X86ISD::FAND: return combineFAnd(N, DAG, Subtarget);
case X86ISD::FANDN: return combineFAndn(N, DAG, Subtarget);		case X86ISD::FANDN: return combineFAndn(N, DAG, Subtarget);
case X86ISD::FXOR:		case X86ISD::FXOR:
case X86ISD::FOR: return combineFOr(N, DAG, DCI, Subtarget);		case X86ISD::FOR: return combineFOr(N, DAG, DCI, Subtarget);
case X86ISD::FMIN:		case X86ISD::FMIN:
case X86ISD::FMAX: return combineFMinFMax(N, DAG);		case X86ISD::FMAX: return combineFMinFMax(N, DAG);
case ISD::FMINNUM:		case ISD::FMINNUM:
case ISD::FMAXNUM: return combineFMinNumFMaxNum(N, DAG, Subtarget);		case ISD::FMAXNUM: return combineFMinNumFMaxNum(N, DAG, Subtarget);
		case ISD::FMINIMUM:
		case ISD::FMAXIMUM: return combineFMinimumFMaximum(N, DAG, Subtarget);
case X86ISD::CVTSI2P:		case X86ISD::CVTSI2P:
case X86ISD::CVTUI2P: return combineX86INT_TO_FP(N, DAG, DCI);		case X86ISD::CVTUI2P: return combineX86INT_TO_FP(N, DAG, DCI);
case X86ISD::CVTP2SI:		case X86ISD::CVTP2SI:
case X86ISD::CVTP2UI:		case X86ISD::CVTP2UI:
case X86ISD::STRICT_CVTTP2SI:		case X86ISD::STRICT_CVTTP2SI:
case X86ISD::CVTTP2SI:		case X86ISD::CVTTP2SI:
case X86ISD::STRICT_CVTTP2UI:		case X86ISD::STRICT_CVTTP2UI:
case X86ISD::CVTTP2UI:		case X86ISD::CVTTP2UI:
▲ Show 20 Lines • Show All 1,411 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fminimum-fmaximum.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE2
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefixes=AVX,AVX1
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX,AVX512

				pengfeiUnsubmitted Done Reply Inline Actions Guess you missed a AVX512F, i.e. `AVX,AVX512,AVX512F` pengfei: Guess you missed a AVX512F, i.e. `AVX,AVX512,AVX512F`
				declare float @llvm.maximum.f32(float, float)
				declare double @llvm.maximum.f64(double, double)
				declare float @llvm.minimum.f32(float, float)
				declare double @llvm.minimum.f64(double, double)

				;
				; fmaximum
				;

				define float @test_fmaximum(float %x, float %y) {
				; SSE2-LABEL: test_fmaximum:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movd %xmm0, %eax
				; SSE2-NEXT: testl %eax, %eax
				; SSE2-NEXT: movdqa %xmm0, %xmm3
				; SSE2-NEXT: movdqa %xmm1, %xmm2
				; SSE2-NEXT: je .LBB0_2
				; SSE2-NEXT: # %bb.1:
				; SSE2-NEXT: movdqa %xmm1, %xmm3
				; SSE2-NEXT: movdqa %xmm0, %xmm2
				; SSE2-NEXT: .LBB0_2:
				; SSE2-NEXT: maxss %xmm3, %xmm2
				; SSE2-NEXT: cmpunordss %xmm1, %xmm0
				; SSE2-NEXT: movaps %xmm0, %xmm3
				; SSE2-NEXT: andnps %xmm2, %xmm3
				; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; SSE2-NEXT: andps %xmm0, %xmm1
				; SSE2-NEXT: orps %xmm3, %xmm1
				; SSE2-NEXT: movaps %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fmaximum:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vmovd %xmm0, %eax
				; AVX1-NEXT: testl %eax, %eax
				; AVX1-NEXT: vmovdqa %xmm0, %xmm2
				; AVX1-NEXT: vmovdqa %xmm1, %xmm3
				; AVX1-NEXT: je .LBB0_2
				; AVX1-NEXT: # %bb.1:
				; AVX1-NEXT: vmovdqa %xmm1, %xmm2
				; AVX1-NEXT: vmovdqa %xmm0, %xmm3
				; AVX1-NEXT: .LBB0_2:
				; AVX1-NEXT: vmaxss %xmm2, %xmm3, %xmm2
				; AVX1-NEXT: vcmpunordss %xmm1, %xmm0, %xmm0
				; AVX1-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fmaximum:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vmovd %xmm0, %eax
				; AVX512-NEXT: testl %eax, %eax
				; AVX512-NEXT: sete %al
				; AVX512-NEXT: kmovw %eax, %k1
				; AVX512-NEXT: vmovdqa %xmm0, %xmm2
				; AVX512-NEXT: vmovss %xmm1, %xmm2, %xmm2 {%k1}
				; AVX512-NEXT: vcmpunordss %xmm1, %xmm0, %k2
				; AVX512-NEXT: vmovss %xmm0, %xmm1, %xmm1 {%k1}
				; AVX512-NEXT: vmaxss %xmm1, %xmm2, %xmm0
				; AVX512-NEXT: vmovss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k2}
				; AVX512-NEXT: retq
				%1 = tail call float @llvm.maximum.f32(float %x, float %y)
				ret float %1
				}

				define float @test_fmaximum_nan0(float %x, float %y) {
				; SSE2-LABEL: test_fmaximum_nan0:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fmaximum_nan0:
				; AVX: # %bb.0:
				; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; AVX-NEXT: retq
				%1 = tail call float @llvm.maximum.f32(float 0x7fff000000000000, float %y)
				ret float %1
				}

				define float @test_fmaximum_nan1(float %x, float %y) {
				; SSE2-LABEL: test_fmaximum_nan1:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fmaximum_nan1:
				; AVX: # %bb.0:
				; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; AVX-NEXT: retq
				%1 = tail call float @llvm.maximum.f32(float %x, float 0x7fff000000000000)
				ret float %1
				}

				define float @test_fmaximum_nnan(float %x, float %y) {
				; SSE2-LABEL: test_fmaximum_nnan:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movaps %xmm0, %xmm2
				; SSE2-NEXT: addss %xmm1, %xmm0
				; SSE2-NEXT: subss %xmm1, %xmm2
				; SSE2-NEXT: movd %xmm0, %eax
				; SSE2-NEXT: testl %eax, %eax
				; SSE2-NEXT: je .LBB3_1
				; SSE2-NEXT: # %bb.2:
				; SSE2-NEXT: maxss %xmm2, %xmm0
				; SSE2-NEXT: retq
				; SSE2-NEXT: .LBB3_1:
				; SSE2-NEXT: movaps %xmm0, %xmm1
				; SSE2-NEXT: movaps %xmm2, %xmm0
				; SSE2-NEXT: maxss %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fmaximum_nnan:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vaddss %xmm1, %xmm0, %xmm2
				; AVX1-NEXT: vsubss %xmm1, %xmm0, %xmm1
				; AVX1-NEXT: vmovd %xmm2, %eax
				; AVX1-NEXT: testl %eax, %eax
				skatkovUnsubmitted Done Reply Inline Actions Here is what I mentioned in terms of non-optimial vectorized version at least on AVX. skatkov: Here is what I mentioned in terms of non-optimial vectorized version at least on AVX.
				RKSimonUnsubmitted Done Reply Inline Actions For now we just want x86 to support the intrinsics, vector optimization is better handled as a followup. RKSimon: For now we just want x86 to support the intrinsics, vector optimization is better handled as a…
				skatkovUnsubmitted Done Reply Inline Actions It is exactly what I said in the beginning of this discussion: side question not delaying landing this patch. skatkov: It is exactly what I said in the beginning of this discussion: side question not delaying…
				RKSimonUnsubmitted Done Reply Inline Actions yup - cheers RKSimon: yup - cheers
				; AVX1-NEXT: je .LBB3_1
				; AVX1-NEXT: # %bb.2:
				; AVX1-NEXT: vmaxss %xmm1, %xmm2, %xmm0
				; AVX1-NEXT: retq
				; AVX1-NEXT: .LBB3_1:
				; AVX1-NEXT: vmovaps %xmm2, %xmm0
				; AVX1-NEXT: vmaxss %xmm0, %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fmaximum_nnan:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vaddss %xmm1, %xmm0, %xmm2
				; AVX512-NEXT: vsubss %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: vmovd %xmm2, %eax
				; AVX512-NEXT: testl %eax, %eax
				; AVX512-NEXT: sete %al
				; AVX512-NEXT: kmovw %eax, %k1
				; AVX512-NEXT: vmovaps %xmm0, %xmm1
				; AVX512-NEXT: vmovss %xmm2, %xmm1, %xmm1 {%k1}
				; AVX512-NEXT: vmovss %xmm0, %xmm2, %xmm2 {%k1}
				; AVX512-NEXT: vmaxss %xmm1, %xmm2, %xmm0
				; AVX512-NEXT: retq
				%1 = fadd nnan float %x, %y
				%2 = fsub nnan float %x, %y
				%3 = tail call float @llvm.maximum.f32(float %1, float %2)
				ret float %3
				}

				define double @test_fmaximum_zero0(double %x, double %y) {
				; SSE2-LABEL: test_fmaximum_zero0:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movapd %xmm1, %xmm0
				; SSE2-NEXT: cmpunordsd %xmm1, %xmm0
				; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
				; SSE2-NEXT: andpd %xmm0, %xmm2
				; SSE2-NEXT: xorpd %xmm3, %xmm3
				; SSE2-NEXT: maxsd %xmm3, %xmm1
				; SSE2-NEXT: andnpd %xmm1, %xmm0
				; SSE2-NEXT: orpd %xmm2, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fmaximum_zero0:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vxorpd %xmm0, %xmm0, %xmm0
				; AVX1-NEXT: vmaxsd %xmm0, %xmm1, %xmm0
				; AVX1-NEXT: vcmpunordsd %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fmaximum_zero0:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vxorpd %xmm0, %xmm0, %xmm0
				; AVX512-NEXT: vmaxsd %xmm0, %xmm1, %xmm0
				; AVX512-NEXT: vcmpunordsd %xmm1, %xmm1, %k1
				; AVX512-NEXT: vmovsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k1}
				; AVX512-NEXT: retq
				%1 = tail call double @llvm.maximum.f64(double 0.0, double %y)
				ret double %1
				}

				define double @test_fmaximum_zero1(double %x, double %y) {
				pengfeiUnsubmitted Done Reply Inline Actions The test does show anything interesting. pengfei: The test does show anything interesting.
				pengfeiUnsubmitted Done Reply Inline Actions does -> doesn't pengfei: does -> doesn't
				e-kudAuthorUnsubmitted Done Reply Inline Actions Probably, yes. I wanted to show that we are able to fold all these checks with constant arguments. But we've already tested folding of zero and nan checks. Original FMAX and FMIN are tested as well. Dropped it. e-kud: Probably, yes. I wanted to show that we are able to fold all these checks with constant…
				; SSE2-LABEL: test_fmaximum_zero1:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movapd %xmm0, %xmm1
				; SSE2-NEXT: cmpunordsd %xmm0, %xmm1
				; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
				; SSE2-NEXT: andpd %xmm1, %xmm2
				; SSE2-NEXT: xorpd %xmm3, %xmm3
				; SSE2-NEXT: maxsd %xmm3, %xmm0
				; SSE2-NEXT: andnpd %xmm0, %xmm1
				; SSE2-NEXT: orpd %xmm2, %xmm1
				; SSE2-NEXT: movapd %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fmaximum_zero1:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vxorpd %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vmaxsd %xmm1, %xmm0, %xmm1
				; AVX1-NEXT: vcmpunordsd %xmm0, %xmm0, %xmm0
				; AVX1-NEXT: vblendvpd %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fmaximum_zero1:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vxorpd %xmm1, %xmm1, %xmm1
				; AVX512-NEXT: vmaxsd %xmm1, %xmm0, %xmm1
				; AVX512-NEXT: vcmpunordsd %xmm0, %xmm0, %k1
				; AVX512-NEXT: vmovsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1 {%k1}
				; AVX512-NEXT: vmovapd %xmm1, %xmm0
				; AVX512-NEXT: retq
				%1 = tail call double @llvm.maximum.f64(double %x, double 0.0)
				ret double %1
				}

				define double @test_fmaximum_zero2(double %x, double %y) {
				; SSE2-LABEL: test_fmaximum_zero2:
				; SSE2: # %bb.0:
				; SSE2-NEXT: xorps %xmm0, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fmaximum_zero2:
				; AVX: # %bb.0:
				; AVX-NEXT: vxorps %xmm0, %xmm0, %xmm0
				; AVX-NEXT: retq
				%1 = tail call double @llvm.maximum.f64(double 0.0, double -0.0)
				ret double %1
				}

				define float @test_fmaximum_nsz(float %x, float %y) "no-signed-zeros-fp-math"="true" {
				; SSE2-LABEL: test_fmaximum_nsz:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movaps %xmm0, %xmm2
				; SSE2-NEXT: maxss %xmm1, %xmm2
				; SSE2-NEXT: cmpunordss %xmm1, %xmm0
				; SSE2-NEXT: movaps %xmm0, %xmm1
				; SSE2-NEXT: andnps %xmm2, %xmm1
				; SSE2-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
				; SSE2-NEXT: andps %xmm2, %xmm0
				; SSE2-NEXT: orps %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fmaximum_nsz:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vcmpunordss %xmm1, %xmm0, %xmm2
				; AVX1-NEXT: vmaxss %xmm1, %xmm0, %xmm0
				; AVX1-NEXT: vblendvps %xmm2, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fmaximum_nsz:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vcmpunordss %xmm1, %xmm0, %k1
				; AVX512-NEXT: vmaxss %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: vmovss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k1}
				; AVX512-NEXT: retq
				%1 = tail call float @llvm.maximum.f32(float %x, float %y)
				ret float %1
				}

				;
				; fminimum
				;

				define float @test_fminimum(float %x, float %y) {
				; SSE2-LABEL: test_fminimum:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movd %xmm0, %eax
				; SSE2-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; SSE2-NEXT: movdqa %xmm0, %xmm3
				; SSE2-NEXT: movdqa %xmm1, %xmm2
				; SSE2-NEXT: je .LBB8_2
				; SSE2-NEXT: # %bb.1:
				; SSE2-NEXT: movdqa %xmm1, %xmm3
				; SSE2-NEXT: movdqa %xmm0, %xmm2
				; SSE2-NEXT: .LBB8_2:
				; SSE2-NEXT: minss %xmm3, %xmm2
				; SSE2-NEXT: cmpunordss %xmm1, %xmm0
				; SSE2-NEXT: movaps %xmm0, %xmm3
				; SSE2-NEXT: andnps %xmm2, %xmm3
				; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; SSE2-NEXT: andps %xmm0, %xmm1
				; SSE2-NEXT: orps %xmm3, %xmm1
				; SSE2-NEXT: movaps %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fminimum:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vmovd %xmm0, %eax
				; AVX1-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; AVX1-NEXT: vmovdqa %xmm0, %xmm2
				; AVX1-NEXT: vmovdqa %xmm1, %xmm3
				; AVX1-NEXT: je .LBB8_2
				; AVX1-NEXT: # %bb.1:
				; AVX1-NEXT: vmovdqa %xmm1, %xmm2
				; AVX1-NEXT: vmovdqa %xmm0, %xmm3
				; AVX1-NEXT: .LBB8_2:
				; AVX1-NEXT: vminss %xmm2, %xmm3, %xmm2
				; AVX1-NEXT: vcmpunordss %xmm1, %xmm0, %xmm0
				; AVX1-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fminimum:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vmovd %xmm0, %eax
				; AVX512-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; AVX512-NEXT: sete %al
				; AVX512-NEXT: kmovw %eax, %k1
				; AVX512-NEXT: vmovdqa %xmm0, %xmm2
				; AVX512-NEXT: vmovss %xmm1, %xmm2, %xmm2 {%k1}
				; AVX512-NEXT: vcmpunordss %xmm1, %xmm0, %k2
				; AVX512-NEXT: vmovss %xmm0, %xmm1, %xmm1 {%k1}
				; AVX512-NEXT: vminss %xmm1, %xmm2, %xmm0
				; AVX512-NEXT: vmovss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k2}
				; AVX512-NEXT: retq
				%1 = tail call float @llvm.minimum.f32(float %x, float %y)
				ret float %1
				}

				define float @test_fminimum_nan0(float %x, float %y) {
				; SSE2-LABEL: test_fminimum_nan0:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fminimum_nan0:
				; AVX: # %bb.0:
				; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; AVX-NEXT: retq
				%1 = tail call float @llvm.minimum.f32(float 0x7fff000000000000, float %y)
				ret float %1
				}

				define float @test_fminimum_nan1(float %x, float %y) {
				; SSE2-LABEL: test_fminimum_nan1:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fminimum_nan1:
				; AVX: # %bb.0:
				; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; AVX-NEXT: retq
				%1 = tail call float @llvm.minimum.f32(float %x, float 0x7fff000000000000)
				ret float %1
				}

				define float @test_fminimum_nnan(float %x, float %y) "no-nans-fp-math"="true" {
				; SSE2-LABEL: test_fminimum_nnan:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movd %xmm0, %eax
				; SSE2-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; SSE2-NEXT: je .LBB11_1
				; SSE2-NEXT: # %bb.2:
				; SSE2-NEXT: minss %xmm1, %xmm0
				; SSE2-NEXT: retq
				; SSE2-NEXT: .LBB11_1:
				; SSE2-NEXT: movdqa %xmm0, %xmm2
				; SSE2-NEXT: movaps %xmm1, %xmm0
				; SSE2-NEXT: minss %xmm2, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fminimum_nnan:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vmovd %xmm0, %eax
				; AVX1-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; AVX1-NEXT: je .LBB11_1
				; AVX1-NEXT: # %bb.2:
				; AVX1-NEXT: vminss %xmm1, %xmm0, %xmm0
				; AVX1-NEXT: retq
				; AVX1-NEXT: .LBB11_1:
				; AVX1-NEXT: vmovdqa %xmm0, %xmm2
				; AVX1-NEXT: vminss %xmm2, %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fminimum_nnan:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vmovd %xmm0, %eax
				; AVX512-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; AVX512-NEXT: sete %al
				; AVX512-NEXT: kmovw %eax, %k1
				; AVX512-NEXT: vmovaps %xmm1, %xmm2
				; AVX512-NEXT: vmovss %xmm0, %xmm2, %xmm2 {%k1}
				; AVX512-NEXT: vmovss %xmm1, %xmm0, %xmm0 {%k1}
				; AVX512-NEXT: vminss %xmm2, %xmm0, %xmm0
				; AVX512-NEXT: retq
				%1 = tail call float @llvm.minimum.f32(float %x, float %y)
				ret float %1
				}

				define double @test_fminimum_zero0(double %x, double %y) {
				; SSE2-LABEL: test_fminimum_zero0:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movapd %xmm1, %xmm0
				; SSE2-NEXT: cmpunordsd %xmm1, %xmm0
				; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
				; SSE2-NEXT: andpd %xmm0, %xmm2
				; SSE2-NEXT: minsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
				; SSE2-NEXT: andnpd %xmm1, %xmm0
				; SSE2-NEXT: orpd %xmm2, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fminimum_zero0:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vcmpunordsd %xmm1, %xmm1, %xmm0
				; AVX1-NEXT: vminsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
				; AVX1-NEXT: vblendvpd %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fminimum_zero0:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vcmpunordsd %xmm1, %xmm1, %k1
				; AVX512-NEXT: vminsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
				; AVX512-NEXT: vmovsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k1}
				; AVX512-NEXT: retq
				%1 = tail call double @llvm.minimum.f64(double -0.0, double %y)
				ret double %1
				}

				define double @test_fminimum_zero1(double %x, double %y) {
				; SSE2-LABEL: test_fminimum_zero1:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movapd %xmm0, %xmm1
				; SSE2-NEXT: cmpunordsd %xmm0, %xmm1
				; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
				; SSE2-NEXT: andpd %xmm1, %xmm2
				; SSE2-NEXT: minsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
				; SSE2-NEXT: andnpd %xmm0, %xmm1
				; SSE2-NEXT: orpd %xmm2, %xmm1
				; SSE2-NEXT: movapd %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fminimum_zero1:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vcmpunordsd %xmm0, %xmm0, %xmm1
				; AVX1-NEXT: vminsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fminimum_zero1:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vcmpunordsd %xmm0, %xmm0, %k1
				; AVX512-NEXT: vminsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX512-NEXT: vmovsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k1}
				; AVX512-NEXT: retq
				%1 = tail call double @llvm.minimum.f64(double %x, double -0.0)
				ret double %1
				}

				define double @test_fminimum_zero2(double %x, double %y) {
				; SSE2-LABEL: test_fminimum_zero2:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fminimum_zero2:
				; AVX: # %bb.0:
				; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
				; AVX-NEXT: retq
				%1 = tail call double @llvm.minimum.f64(double -0.0, double 0.0)
				ret double %1
				}

				define float @test_fminimum_nsz(float %x, float %y) {
				; SSE2-LABEL: test_fminimum_nsz:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movaps %xmm0, %xmm2
				; SSE2-NEXT: minss %xmm1, %xmm2
				; SSE2-NEXT: cmpunordss %xmm1, %xmm0
				; SSE2-NEXT: movaps %xmm0, %xmm1
				; SSE2-NEXT: andnps %xmm2, %xmm1
				; SSE2-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
				; SSE2-NEXT: andps %xmm2, %xmm0
				; SSE2-NEXT: orps %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fminimum_nsz:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vcmpunordss %xmm1, %xmm0, %xmm2
				; AVX1-NEXT: vminss %xmm1, %xmm0, %xmm0
				; AVX1-NEXT: vblendvps %xmm2, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fminimum_nsz:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vcmpunordss %xmm1, %xmm0, %k1
				; AVX512-NEXT: vminss %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: vmovss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k1}
				; AVX512-NEXT: retq
				%1 = tail call nsz float @llvm.minimum.f32(float %x, float %y)
				ret float %1
				}
				RKSimonUnsubmitted Done Reply Inline Actions please can you add vector test coverage to ensure we scalarize? RKSimon: please can you add vector test coverage to ensure we scalarize?
				RKSimonUnsubmitted Done Reply Inline Actions add nounwind attribute to get rid of the .cfi noise RKSimon: add nounwind attribute to get rid of the .cfi noise
				e-kudAuthorUnsubmitted Done Reply Inline Actions I've allowed myself to add `nounwind` in `half.ll` since I've touched it. I think the attribute was missed. e-kud: I've allowed myself to add `nounwind` in `half.ll` since I've touched it. I think the attribute…

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Support llvm.{min,max}imum.f{16,32,64}ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 504008

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/fminimum-fmaximum.ll

[X86] Support llvm.{min,max}imum.f{16,32,64}
ClosedPublic