This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
TargetLowering.cpp
-
Target/X86/
-
X86/
29/30
X86ISelLowering.cpp
-
test/
-
Analysis/CostModel/X86/
-
CostModel/
-
X86/
-
intrinsic-cost-kinds.ll
-
CodeGen/X86/
-
X86/
2/2
avx512fp16-fminimum-fmaximum.ll
-
extract-fp.ll
-
extractelement-fp.ll
11/11
fminimum-fmaximum.ll
2/2
half.ll

Differential D145634

[X86] Support llvm.{min,max}imum.f{16,32,64}
ClosedPublic

Authored by e-kud on Mar 8 2023, 5:01 PM.

Download Raw Diff

Details

Reviewers

RKSimon
pengfei
goldstein.w.n

Commits

rGa82d27a9a685: [X86] Support llvm.{min,max}imum.f{16,32,64}

Summary

Addresses https://github.com/llvm/llvm-project/issues/53353

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	1,090 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::use-after-scope-capture.cpp
	1,450 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::use-after-scope-capture.cpp

Event Timeline

e-kud created this revision.Mar 8 2023, 5:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 5:01 PM

Herald added subscribers: pengfei, hiraditya. · View Herald Transcript

e-kud published this revision for review.Mar 8 2023, 5:09 PM

e-kud edited the summary of this revision. (Show Details)

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 5:10 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B218254: Diff 503574.Mar 8 2023, 6:19 PM

pengfei added inline comments.Mar 8 2023, 6:36 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
53777	The code doesn't combine anything, should be moved to `LowerFMinimumFMaximum`?
53818	Should we not do it for `hasNoSignedZeros`?
llvm/test/CodeGen/X86/fminimum-fmaximum.ll
182	The test does show anything interesting.

pengfei added inline comments.Mar 8 2023, 6:43 PM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
182	does -> doesn't

Rebased.
Supported cases with nsz and nnan.
Updated tests.

e-kud added a reviewer: RKSimon.Mar 9 2023, 7:11 PM

e-kud marked 3 inline comments as done.

e-kud added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
53777	I've tried to find a better place to handle `ISD::FMAXIMUM` but there is no place in chain Selecting->Combining->Legalization. So, I was inspired by `ISD::FMAXNUM` that is actually lowered during combining. If you are about naming solely, it is not a big deal to change the name, but all callees in `PerformDAGCombine` are named as `combine*` even for `ISD::FMAXNUM`. Do we really want to have a single `LowerFMinimumFMaximum`?
llvm/test/CodeGen/X86/fminimum-fmaximum.ll
182	Probably, yes. I wanted to show that we are able to fold all these checks with constant arguments. But we've already tested folding of zero and nan checks. Original FMAX and FMIN are tested as well. Dropped it.

Harbormaster completed remote builds in B218575: Diff 504008.Mar 9 2023, 7:52 PM

Do you have plan to support minimumNumber?

llvm/lib/Target/X86/X86ISelLowering.cpp
53777	No, not the name. Did you try set action of `FMAXIMUM` to `Custom`? But I'm fine given it's similar to `combineFMinNumFMaxNum`.
53788	`Subtarget.hasFP16() && VT == MVT::f16`
53809	Better to give a table like above // Op1 Op1 // Num xNaN +0 -0 // ----------------- -------------- // Num \| Max \| qNaN \| +0 \| +0 \| +0 \| // Op0 ----------------- Op0 -------------- // xNaN \| qNaN \| qNaN \| -0 \| +0 \| -0 \| // ----------------- --------------
53835–53838	Can we check if it is 0 or NaN? We can put both in the second operand I think.

pengfei added inline comments.Mar 10 2023, 12:13 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
53835–53838	We can use `vfpclassps/d` on `AVX512DQ` to optimize it.

RKSimon added reviewers: pengfei, goldstein.w.n.Mar 10 2023, 3:34 AM

pengfei added inline comments.Mar 12 2023, 3:08 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
53777	Answer my previous question: `combineFMinNumFMaxNum` was intended here due to the reason described in D15294. I suppose the problem doesn't exist to `combineFMinimumFMaximum`. So I prefer to `LowerFMinimumFMaximum`.

In D145634#4183847, @pengfei wrote:

Do you have plan to support minimumNumber?

Sorry, I didn't get it. Could you be more specific?

llvm/lib/Target/X86/X86ISelLowering.cpp
53777	Yes, thank you. I was unaware of this mechanism. It works but I've needed to include extra logic because with `Custom` lowering there is no more `setcc` combining.
53835–53838	It was my first attempt to compare with 0 and NaN at the same time. We have two problems. The first is that comparison with zero returns ZF regardless positive or negative zero is provided. The second is that we still need to know whether the second operand is NaN or not. We may have `(0.0, NaN)` as arguments. We checked that the first op is not NaN and is zero. Replaced it with the second when the second is NaN. It seems to me that we always need to check both operands on NaN and one check on zero. We can use `vfpclassps/d` on `AVX512DQ` if one of operands is known never NaN. Working on it.

In D145634#4187888, @e-kud wrote:

In D145634#4183847, @pengfei wrote:

Do you have plan to support minimumNumber?

Sorry, I didn't get it. Could you be more specific?

It's accompanying function of minimum in IEEE-754 2019, it will be introduced in new C/C++ standard too. I thought you are working for that.

Rebased.
Moved from combine to lowering.
Supported f16 version.
Added optimization for avx512dq.
Added and updated tests.

In D145634#4187914, @pengfei wrote:

In D145634#4187888, @e-kud wrote:

In D145634#4183847, @pengfei wrote:

Do you have plan to support minimumNumber?

Sorry, I didn't get it. Could you be more specific?

It's accompanying function of minimum in IEEE-754 2019, it will be introduced in new C/C++ standard too. I thought you are working for that.

I can't find any lib calls or specific intrinsics for them. I'd like to add them separately if we have any users or needs of them, do we?

e-kud retitled this revision from [X86] Support llvm.{min,max}imum.f{32,64} to [X86] Support llvm.{min,max}imum.f{16,32,64}.Mar 14 2023, 4:26 PM

Harbormaster completed remote builds in B219510: Diff 505312.Mar 14 2023, 5:30 PM

Broke formatting for premerge checks

Harbormaster completed remote builds in B219532: Diff 505346.Mar 14 2023, 7:25 PM

pengfei added inline comments.Mar 14 2023, 8:20 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
1004–1005	Make the format align with its context.
2133–2134	ditto.
30243	Put Max/Min together is a bit confusing. My first impression is it can return either +0 or -0 for a single comparison.
30272	Need `hasFP16()` for `f16`.
34068	ditto format. Why do we need `getNode`?

In D145634#4194990, @e-kud wrote:

In D145634#4187914, @pengfei wrote:

In D145634#4187888, @e-kud wrote:

In D145634#4183847, @pengfei wrote:

Do you have plan to support minimumNumber?

Sorry, I didn't get it. Could you be more specific?

It's accompanying function of minimum in IEEE-754 2019, it will be introduced in new C/C++ standard too. I thought you are working for that.

I can't find any lib calls or specific intrinsics for them. I'd like to add them separately if we have any users or needs of them, do we?

I heard glibc is supporting them. I'm fine with leaving it to the future. Do you have plan on the vector support?

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
5	Guess you missed a AVX512F, i.e. `AVX,AVX512,AVX512F`

Addressed formatting comments.
Check f16 explicitly even if avx512f16 implies avx512dq for now.

pengfei added inline comments.Mar 15 2023, 8:08 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
30272	`(VT == MVT::f16 && Subtarget.hasFP16()) \|\| Subtarget.hasDQI()`

pengfei added inline comments.Mar 15 2023, 8:10 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
30272	Sorry, mistake. It should be: `(VT != MVT::f16 && Subtarget.hasDQI()) \|\| Subtarget.hasFP16()`

In D145634#4195338, @pengfei wrote:

In D145634#4194990, @e-kud wrote:

In D145634#4187914, @pengfei wrote:

In D145634#4187888, @e-kud wrote:

In D145634#4183847, @pengfei wrote:

Do you have plan to support minimumNumber?

Sorry, I didn't get it. Could you be more specific?

It's accompanying function of minimum in IEEE-754 2019, it will be introduced in new C/C++ standard too. I thought you are working for that.

I can't find any lib calls or specific intrinsics for them. I'd like to add them separately if we have any users or needs of them, do we?

I heard glibc is supporting them. I'm fine with leaving it to the future. Do you have plan on the vector support?

Indeed, found this

<math.h> functions for floating-point maximum and minimum, corresponding to new operations in IEEE 754-2019, and corresponding <tgmath.h> macros, are added from draft ISO C2X: fmaximum, fmaximum_num, fmaximum_mag, fmaximum_mag_num, fminimum, fminimum_num, fminimum_mag, fminimum_mag_num and corresponding functions for float, long double, _FloatN and _FloatNx.

About vector support. I want to try to implement vector support alternatively, transform floats into integers using shifts and compare them as integers preserving float semantics, even -0 < +0. I've tried this approach for scalars but current approach produces less code. Probably vectors can benefit more.

llvm/lib/Target/X86/X86ISelLowering.cpp
30272	Yes, thank you, missed it as `fp16` implies `dq`. We actually can check only `VT == MVT::f16` because above the predicate `Subtarget.hasFP16() && VT == MVT::f16` has been checked already. Do we want to avoid such implicit implication? Alternative is `VT != MVT::f16 && Subtarget.hasDQI() \|\| VT == MVT::f16 && Subtarget.hasFP16()`.
34068	Format doesn't work, we have 80+ chars if it's aligned with `return`. We don't need `getNode`, I missed it when moved from combine to lowering.

LGTM.

llvm/lib/Target/X86/X86ISelLowering.cpp
30272	Oh, we have checked `hasFP16` in line 29876. So the way used here is correct. Sorry for the noise.

This revision is now accepted and ready to land.Mar 15 2023, 8:13 PM

Harbormaster completed remote builds in B219777: Diff 505684.Mar 15 2023, 8:50 PM

@RKSimon @goldstein.w.n ping.

Support i686 target: can't use integer representation of double -0.0

Harbormaster completed remote builds in B223475: Diff 510659.Apr 3 2023, 8:39 PM

Rebased

Harbormaster completed remote builds in B224312: Diff 511826.Apr 7 2023, 5:16 PM

@RKSimon @goldstein.w.n ping

I think you can merge it. We don't need all reviewers sign off.

RKSimon added inline comments.Apr 13 2023, 8:01 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
30210	You should be able to drop this early-out - we should never get here
30217	Again, you can drop this as the setOperationAction calls should ensure we never get here - replace it with an assertion if you're worried.
30255	auto *
30257	auto *
llvm/test/CodeGen/X86/avx512fp16-fminimum-fmaximum.ll
154	please can you add vector test coverage to ensure we scalarize?
llvm/test/CodeGen/X86/fminimum-fmaximum.ll
963	please can you add vector test coverage to ensure we scalarize?

e-kud marked 6 inline comments as done.Apr 14 2023, 7:49 PM

e-kud added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
30217	Yes, you are right. Everything must be handled in `X86TargetLowering`. Dropped these checks.
llvm/test/CodeGen/X86/avx512fp16-fminimum-fmaximum.ll
154	Yes, I've added them. Also it reminded me about several commented tests with the intrinsics. Uncommented them as well.

Rebased.
Uncommented existing tests for the intrinsics.
Addressed to comments.

Harbormaster completed remote builds in B225797: Diff 513836.Apr 14 2023, 8:22 PM

RKSimon added inline comments.Apr 18 2023, 5:19 AM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
710	add nounwind attribute to get rid of the .cfi noise

e-kud marked an inline comment as done.Apr 18 2023, 5:19 PM

e-kud added inline comments.

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
710	I've allowed myself to add `nounwind` in `half.ll` since I've touched it. I think the attribute was missed.

Rebased.
Added nounwind attribute to tests.

Harbormaster completed remote builds in B226491: Diff 514793.Apr 18 2023, 6:34 PM

nikic mentioned this in D148691: [X86] Add lowering for fp minimum/maximum.Apr 19 2023, 12:36 AM

skatkov added a subscriber: skatkov.Apr 19 2023, 7:51 PM

@pengfei @goldstein.w.n Any more comments?

llvm/test/CodeGen/X86/half.ll
957	pre-commit the nounwind change to keep it separate from this patch - you shouldn't really need dso_local either

pengfei accepted this revision.Apr 23 2023, 4:08 AM

This revision is now accepted and ready to land.Apr 23 2023, 4:08 AM

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Rebased.
Excluded refactor of half.ll.

Harbormaster completed remote builds in B227863: Diff 516594.Apr 24 2023, 7:22 PM

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

llvm/test/CodeGen/X86/half.ll
957	I haven't got commit access yet, here it is https://reviews.llvm.org/D149114

In D145634#4294264, @e-kud wrote:

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?

In D145634#4294383, @skatkov wrote:

In D145634#4294264, @e-kud wrote:

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?

BTW. my measurements on micro benchmarking (I know micro might cause that branch predictor works good), the version https://godbolt.org/z/rEj9GPfnY is the best one for scalar but it cannot be implemented in SelectionDAG as has a CFG.

skatkov added inline comments.Apr 24 2023, 9:06 PM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
121	Here is what I mentioned in terms of non-optimial vectorized version at least on AVX.

LGTM

llvm/lib/Target/X86/X86ISelLowering.cpp
30317	(style) remove braces from single line if()
llvm/test/CodeGen/X86/fminimum-fmaximum.ll
121	For now we just want x86 to support the intrinsics, vector optimization is better handled as a followup.

skatkov added inline comments.Apr 25 2023, 1:09 AM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
121	It is exactly what I said in the beginning of this discussion: side question not delaying landing this patch.

RKSimon added inline comments.Apr 25 2023, 1:58 AM

llvm/test/CodeGen/X86/fminimum-fmaximum.ll
121	yup - cheers

Fix single line if style.

In D145634#4294388, @skatkov wrote:

In D145634#4294383, @skatkov wrote:

In D145634#4294264, @e-kud wrote:

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?

BTW. my measurements on micro benchmarking (I know micro might cause that branch predictor works good), the version https://godbolt.org/z/rEj9GPfnY is the best one for scalar but it cannot be implemented in SelectionDAG as has a CFG.

Both versions are incorrect. They don't work as expected in case of (-0.0, 0.0), (0.0, -0.0) inputs. Because comiss and max treat negative and positive zeros as equal.

Harbormaster completed remote builds in B228010: Diff 516778.Apr 25 2023, 7:07 AM

In D145634#4295568, @e-kud wrote:

In D145634#4294388, @skatkov wrote:

In D145634#4294383, @skatkov wrote:

In D145634#4294264, @e-kud wrote:

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?

BTW. my measurements on micro benchmarking (I know micro might cause that branch predictor works good), the version https://godbolt.org/z/rEj9GPfnY is the best one for scalar but it cannot be implemented in SelectionDAG as has a CFG.

Both versions are incorrect. They don't work as expected in case of (-0.0, 0.0), (0.0, -0.0) inputs. Because comiss and max treat negative and positive zeros as equal.

Take a look carefully. In case of equality we check the sign of the first value. Thus comparison using ucommis is ok.

In D145634#4295747, @skatkov wrote:

In D145634#4295568, @e-kud wrote:

In D145634#4294388, @skatkov wrote:

In D145634#4294383, @skatkov wrote:

In D145634#4294264, @e-kud wrote:

In D145634#4291019, @skatkov wrote:

I have a side question (not delaying landing this one)

It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
So I wonder whether there are some plans to improve it as follow-up?

Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for PCMPEQ{W,D} because all fp comparison instructions treat -0.0 and 0.0 as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...

Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
I mean at least for avx?

BTW. my measurements on micro benchmarking (I know micro might cause that branch predictor works good), the version https://godbolt.org/z/rEj9GPfnY is the best one for scalar but it cannot be implemented in SelectionDAG as has a CFG.

Both versions are incorrect. They don't work as expected in case of (-0.0, 0.0), (0.0, -0.0) inputs. Because comiss and max treat negative and positive zeros as equal.

Take a look carefully. In case of equality we check the sign of the first value. Thus comparison using ucommis is ok.

There is a need of %cmp = icmp slt i32 %bc, 0 -> %cmp = icmp sge i32 %bc, 0 to make valid.
Yeah, I got the idea that generic case may be more efficient implemented in such way.

skatkov added a child revision: D149729: [X86] Avoid usage constant NaN for fminimum/fmaximum lowering.May 3 2023, 12:56 AM

I wonder whether any problems with landing this patch?

RKSimon added inline comments.May 4 2023, 5:43 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
1006	Do we have test coverage with SSE1 only?

This revision was landed with ongoing or failed builds.May 4 2023, 6:05 AM

Closed by commit rGa82d27a9a685: [X86] Support llvm.{min,max}imum.f{16,32,64} (authored by e-kud, committed by pengfei). · Explain Why

This revision was automatically updated to reflect the committed changes.

pengfei added a commit: rGa82d27a9a685: [X86] Support llvm.{min,max}imum.f{16,32,64}.

e-kud marked an inline comment as done.May 4 2023, 6:40 AM

e-kud added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
1006	Apparently, no. There is a `fatal error: error in backend: Access past stack top!` with `double`s and `+sse,-sse2` It seems I need to split `float` and `double` tests into two separate files to test SSE1 only. Are there better alternatives?

RKSimon added inline comments.May 4 2023, 7:08 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
1006	I'd be very tempted to limit float maximum/minimum to SSE2 or later tbh

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

4 lines

Target/

X86/

X86ISelLowering.cpp

132 lines

test/

Analysis/

CostModel/

X86/

intrinsic-cost-kinds.ll

16 lines

CodeGen/

X86/

avx512fp16-fminimum-fmaximum.ll

229 lines

44 lines

216 lines

1058 lines

25 lines

Diff 513836

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,987 Lines • ▼ Show 20 Lines	if (isOperationLegalOrCustom(NewOp, VT)) {
}		}

return DAG.getNode(NewOp, dl, VT, Quiet0, Quiet1, Node->getFlags());		return DAG.getNode(NewOp, dl, VT, Quiet0, Quiet1, Node->getFlags());
}		}

// If the target has FMINIMUM/FMAXIMUM but not FMINNUM/FMAXNUM use that		// If the target has FMINIMUM/FMAXIMUM but not FMINNUM/FMAXNUM use that
// instead if there are no NaNs and there can't be an incompatible zero		// instead if there are no NaNs and there can't be an incompatible zero
// compare: at least one operand isn't +/-0, or there are no signed-zeros.		// compare: at least one operand isn't +/-0, or there are no signed-zeros.
if (Node->getFlags().hasNoNaNs() &&		if ((Node->getFlags().hasNoNaNs() \|\|
		(DAG.isKnownNeverNaN(Node->getOperand(0)) &&
		DAG.isKnownNeverNaN(Node->getOperand(1)))) &&
(Node->getFlags().hasNoSignedZeros() \|\|		(Node->getFlags().hasNoSignedZeros() \|\|
DAG.isKnownNeverZeroFloat(Node->getOperand(0)) \|\|		DAG.isKnownNeverZeroFloat(Node->getOperand(0)) \|\|
DAG.isKnownNeverZeroFloat(Node->getOperand(1)))) {		DAG.isKnownNeverZeroFloat(Node->getOperand(1)))) {
unsigned IEEE2018Op =		unsigned IEEE2018Op =
Node->getOpcode() == ISD::FMINNUM ? ISD::FMINIMUM : ISD::FMAXIMUM;		Node->getOpcode() == ISD::FMINNUM ? ISD::FMINIMUM : ISD::FMAXIMUM;
if (isOperationLegalOrCustom(IEEE2018Op, VT))		if (isOperationLegalOrCustom(IEEE2018Op, VT))
return DAG.getNode(IEEE2018Op, dl, VT, Node->getOperand(0),		return DAG.getNode(IEEE2018Op, dl, VT, Node->getOperand(0),
Node->getOperand(1), Node->getFlags());		Node->getOperand(1), Node->getFlags());
▲ Show 20 Lines • Show All 2,576 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 995 Lines • ▼ Show 20 Lines	X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
if (!Subtarget.useSoftFloat() && Subtarget.hasMMX()) {		if (!Subtarget.useSoftFloat() && Subtarget.hasMMX()) {
addRegisterClass(MVT::x86mmx, &X86::VR64RegClass);		addRegisterClass(MVT::x86mmx, &X86::VR64RegClass);
// No operations on x86mmx supported, everything uses intrinsics.		// No operations on x86mmx supported, everything uses intrinsics.
}		}

if (!Subtarget.useSoftFloat() && Subtarget.hasSSE1()) {		if (!Subtarget.useSoftFloat() && Subtarget.hasSSE1()) {
addRegisterClass(MVT::v4f32, Subtarget.hasVLX() ? &X86::VR128XRegClass		addRegisterClass(MVT::v4f32, Subtarget.hasVLX() ? &X86::VR128XRegClass
: &X86::VR128RegClass);		: &X86::VR128RegClass);

		setOperationAction(ISD::FMAXIMUM, MVT::f32, Custom);
		pengfeiUnsubmitted Done Reply Inline Actions Make the format align with its context. pengfei: Make the format align with its context.
		setOperationAction(ISD::FMINIMUM, MVT::f32, Custom);
		RKSimonUnsubmitted Not Done Reply Inline Actions Do we have test coverage with SSE1 only? RKSimon: Do we have test coverage with SSE1 only?
		e-kudAuthorUnsubmitted Done Reply Inline Actions Apparently, no. There is a `fatal error: error in backend: Access past stack top!` with `double`s and `+sse,-sse2` It seems I need to split `float` and `double` tests into two separate files to test SSE1 only. Are there better alternatives? e-kud: Apparently, no. There is a `fatal error: error in backend: Access past stack top!` with…
		RKSimonUnsubmitted Not Done Reply Inline Actions I'd be very tempted to limit float maximum/minimum to SSE2 or later tbh RKSimon: I'd be very tempted to limit float maximum/minimum to SSE2 or later tbh

setOperationAction(ISD::FNEG, MVT::v4f32, Custom);		setOperationAction(ISD::FNEG, MVT::v4f32, Custom);
setOperationAction(ISD::FABS, MVT::v4f32, Custom);		setOperationAction(ISD::FABS, MVT::v4f32, Custom);
setOperationAction(ISD::FCOPYSIGN, MVT::v4f32, Custom);		setOperationAction(ISD::FCOPYSIGN, MVT::v4f32, Custom);
setOperationAction(ISD::BUILD_VECTOR, MVT::v4f32, Custom);		setOperationAction(ISD::BUILD_VECTOR, MVT::v4f32, Custom);
setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v4f32, Custom);		setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v4f32, Custom);
setOperationAction(ISD::VSELECT, MVT::v4f32, Custom);		setOperationAction(ISD::VSELECT, MVT::v4f32, Custom);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4f32, Custom);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4f32, Custom);
setOperationAction(ISD::SELECT, MVT::v4f32, Custom);		setOperationAction(ISD::SELECT, MVT::v4f32, Custom);
Show All 20 Lines	addRegisterClass(MVT::v8i16, Subtarget.hasVLX() ? &X86::VR128XRegClass
: &X86::VR128RegClass);		: &X86::VR128RegClass);
addRegisterClass(MVT::v8f16, Subtarget.hasVLX() ? &X86::VR128XRegClass		addRegisterClass(MVT::v8f16, Subtarget.hasVLX() ? &X86::VR128XRegClass
: &X86::VR128RegClass);		: &X86::VR128RegClass);
addRegisterClass(MVT::v4i32, Subtarget.hasVLX() ? &X86::VR128XRegClass		addRegisterClass(MVT::v4i32, Subtarget.hasVLX() ? &X86::VR128XRegClass
: &X86::VR128RegClass);		: &X86::VR128RegClass);
addRegisterClass(MVT::v2i64, Subtarget.hasVLX() ? &X86::VR128XRegClass		addRegisterClass(MVT::v2i64, Subtarget.hasVLX() ? &X86::VR128XRegClass
: &X86::VR128RegClass);		: &X86::VR128RegClass);

		setOperationAction(ISD::FMAXIMUM, MVT::f64, Custom);
		setOperationAction(ISD::FMINIMUM, MVT::f64, Custom);

for (auto VT : { MVT::v2i8, MVT::v4i8, MVT::v8i8,		for (auto VT : { MVT::v2i8, MVT::v4i8, MVT::v8i8,
MVT::v2i16, MVT::v4i16, MVT::v2i32 }) {		MVT::v2i16, MVT::v4i16, MVT::v2i32 }) {
setOperationAction(ISD::SDIV, VT, Custom);		setOperationAction(ISD::SDIV, VT, Custom);
setOperationAction(ISD::SREM, VT, Custom);		setOperationAction(ISD::SREM, VT, Custom);
setOperationAction(ISD::UDIV, VT, Custom);		setOperationAction(ISD::UDIV, VT, Custom);
setOperationAction(ISD::UREM, VT, Custom);		setOperationAction(ISD::UREM, VT, Custom);
}		}

▲ Show 20 Lines • Show All 1,070 Lines • ▼ Show 20 Lines	if (!Subtarget.useSoftFloat() && Subtarget.hasFP16()) {
setOperationAction(ISD::SETCC, MVT::f16, Custom);		setOperationAction(ISD::SETCC, MVT::f16, Custom);
setOperationAction(ISD::STRICT_FSETCC, MVT::f16, Custom);		setOperationAction(ISD::STRICT_FSETCC, MVT::f16, Custom);
setOperationAction(ISD::STRICT_FSETCCS, MVT::f16, Custom);		setOperationAction(ISD::STRICT_FSETCCS, MVT::f16, Custom);
setOperationAction(ISD::STRICT_FROUND, MVT::f16, Promote);		setOperationAction(ISD::STRICT_FROUND, MVT::f16, Promote);
setOperationAction(ISD::FROUNDEVEN, MVT::f16, Legal);		setOperationAction(ISD::FROUNDEVEN, MVT::f16, Legal);
setOperationAction(ISD::STRICT_FROUNDEVEN, MVT::f16, Legal);		setOperationAction(ISD::STRICT_FROUNDEVEN, MVT::f16, Legal);
setOperationAction(ISD::FP_ROUND, MVT::f16, Custom);		setOperationAction(ISD::FP_ROUND, MVT::f16, Custom);
setOperationAction(ISD::STRICT_FP_ROUND, MVT::f16, Custom);		setOperationAction(ISD::STRICT_FP_ROUND, MVT::f16, Custom);
		setOperationAction(ISD::FMAXIMUM, MVT::f16, Custom);
		setOperationAction(ISD::FMINIMUM, MVT::f16, Custom);
		pengfeiUnsubmitted Done Reply Inline Actions ditto. pengfei: ditto.
setOperationAction(ISD::FP_EXTEND, MVT::f32, Legal);		setOperationAction(ISD::FP_EXTEND, MVT::f32, Legal);
setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f32, Legal);		setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f32, Legal);

setCondCodeAction(ISD::SETOEQ, MVT::f16, Expand);		setCondCodeAction(ISD::SETOEQ, MVT::f16, Expand);
setCondCodeAction(ISD::SETUNE, MVT::f16, Expand);		setCondCodeAction(ISD::SETUNE, MVT::f16, Expand);

if (Subtarget.useAVX512Regs()) {		if (Subtarget.useAVX512Regs()) {
setGroup(MVT::v32f16);		setGroup(MVT::v32f16);
▲ Show 20 Lines • Show All 28,054 Lines • ▼ Show 20 Lines	if (VT.isVector() && Op.getOpcode() == ISD::UMAX &&
return DAG.getNode(ISD::SUB, DL, VT, X,		return DAG.getNode(ISD::SUB, DL, VT, X,
DAG.getSetCC(DL, VT, X, Zero, ISD::SETEQ));		DAG.getSetCC(DL, VT, X, Zero, ISD::SETEQ));
}		}

// Default to expand.		// Default to expand.
return SDValue();		return SDValue();
}		}

		static SDValue LowerFMINIMUM_FMAXIMUM(SDValue Op, const X86Subtarget &Subtarget,
		SelectionDAG &DAG) {
		assert((Op.getOpcode() == ISD::FMAXIMUM \|\| Op.getOpcode() == ISD::FMINIMUM) &&
		"Expected FMAXIMUM or FMINIMUM opcode");
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		EVT VT = Op.getValueType();
		RKSimonUnsubmitted Done Reply Inline Actions You should be able to drop this early-out - we should never get here RKSimon: You should be able to drop this early-out - we should never get here
		SDValue X = Op.getOperand(0);
		SDValue Y = Op.getOperand(1);
		SDLoc DL(Op);
		uint64_t SizeInBits = VT.getFixedSizeInBits();
		APInt PreferredZero = APInt::getZero(SizeInBits);
		EVT IVT = MVT::getIntegerVT(SizeInBits);
		X86ISD::NodeType MinMaxOp;
		RKSimonUnsubmitted Done Reply Inline Actions Again, you can drop this as the setOperationAction calls should ensure we never get here - replace it with an assertion if you're worried. RKSimon: Again, you can drop this as the setOperationAction calls should ensure we never get here…
		e-kudAuthorUnsubmitted Done Reply Inline Actions Yes, you are right. Everything must be handled in `X86TargetLowering`. Dropped these checks. e-kud: Yes, you are right. Everything must be handled in `X86TargetLowering`. Dropped these checks.
		if (Op.getOpcode() == ISD::FMAXIMUM) {
		MinMaxOp = X86ISD::FMAX;
		} else {
		PreferredZero.setSignBit();
		MinMaxOp = X86ISD::FMIN;
		}
		EVT SetCCType =
		TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);

		// The tables below show the expected result of Max in cases of NaN and
		// signed zeros.
		//
		// Y Y
		// Num xNaN +0 -0
		// --------------- ---------------
		// Num \| Max \| qNaN \| +0 \| +0 \| +0 \|
		// X --------------- X ---------------
		// xNaN \| qNaN \| qNaN \| -0 \| +0 \| -0 \|
		// --------------- ---------------
		//
		// It is achieved by means of FMAX/FMIN with preliminary checks and operand
		// reordering.
		//
		// We check if any of operands is NaN and return NaN. Then we check if any of
		// operands is zero or negative zero (for fmaximum and fminimum respectively)
		// to ensure the correct zero is returned.
		pengfeiUnsubmitted Done Reply Inline Actions Put Max/Min together is a bit confusing. My first impression is it can return either +0 or -0 for a single comparison. pengfei: Put Max/Min together is a bit confusing. My first impression is it can return either +0 or -0…
		auto IsPreferredZero = [PreferredZero](SDValue Op) {
		Op = peekThroughBitcasts(Op);
		if (auto *CstOp = dyn_cast<ConstantFPSDNode>(Op))
		return CstOp->getValueAPF().bitcastToAPInt() == PreferredZero;
		if (auto *CstOp = dyn_cast<ConstantSDNode>(Op))
		return CstOp->getAPIntValue() == PreferredZero;
		return false;
		};

		SDValue MinMax;
		bool IsXNeverNaN = DAG.isKnownNeverNaN(X);
		bool IsYNeverNaN = DAG.isKnownNeverNaN(Y);
		RKSimonUnsubmitted Done Reply Inline Actions auto * RKSimon: auto *
		if (DAG.getTarget().Options.NoSignedZerosFPMath \|\|
		Op->getFlags().hasNoSignedZeros() \|\| IsPreferredZero(Y) \|\|
		RKSimonUnsubmitted Done Reply Inline Actions auto * RKSimon: auto *
		DAG.isKnownNeverZeroFloat(X)) {
		MinMax = DAG.getNode(MinMaxOp, DL, VT, X, Y, Op->getFlags());
		} else if (IsPreferredZero(X) \|\| DAG.isKnownNeverZeroFloat(Y)) {
		MinMax = DAG.getNode(MinMaxOp, DL, VT, Y, X, Op->getFlags());
		} else if ((VT == MVT::f16 \|\| Subtarget.hasDQI()) &&
		(Op->getFlags().hasNoNaNs() \|\| IsXNeverNaN \|\| IsYNeverNaN)) {
		if (IsXNeverNaN)
		std::swap(X, Y);
		// VFPCLASSS consumes a vector type. So provide a minimal one corresponded
		// xmm register.
		MVT VectorType = MVT::getVectorVT(VT.getSimpleVT(), 128 / SizeInBits);
		SDValue VX = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VectorType, X);
		// Bits of classes:
		// Bits Imm8[0] Imm8[1] Imm8[2] Imm8[3] Imm8[4] Imm8[5] Imm8[6] Imm8[7]
		// Class QNAN PosZero NegZero PosINF NegINF Denormal Negative SNAN
		pengfeiUnsubmitted Done Reply Inline Actions Need `hasFP16()` for `f16`. pengfei: Need `hasFP16()` for `f16`.
		e-kudAuthorUnsubmitted Done Reply Inline Actions Yes, thank you, missed it as `fp16` implies `dq`. We actually can check only `VT == MVT::f16` because above the predicate `Subtarget.hasFP16() && VT == MVT::f16` has been checked already. Do we want to avoid such implicit implication? Alternative is `VT != MVT::f16 && Subtarget.hasDQI() \|\| VT == MVT::f16 && Subtarget.hasFP16()`. e-kud: Yes, thank you, missed it as `fp16` implies `dq`. We actually can check only `VT == MVT::f16`…
		pengfeiUnsubmitted Done Reply Inline Actions `(VT == MVT::f16 && Subtarget.hasFP16()) \|\| Subtarget.hasDQI()` pengfei: `(VT == MVT::f16 && Subtarget.hasFP16()) \|\| Subtarget.hasDQI()`
		pengfeiUnsubmitted Done Reply Inline Actions Sorry, mistake. It should be: `(VT != MVT::f16 && Subtarget.hasDQI()) \|\| Subtarget.hasFP16()` pengfei: Sorry, mistake. It should be: `(VT != MVT::f16 && Subtarget.hasDQI()) \|\| Subtarget.hasFP16()`
		pengfeiUnsubmitted Done Reply Inline Actions Oh, we have checked `hasFP16` in line 29876. So the way used here is correct. Sorry for the noise. pengfei: Oh, we have checked `hasFP16` in line 29876. So the way used here is correct. Sorry for the…
		SDValue Imm = DAG.getTargetConstant(MinMaxOp == X86ISD::FMAX ? 0b11 : 0b101,
		DL, MVT::i32);
		SDValue IsNanZero = DAG.getNode(X86ISD::VFPCLASSS, DL, MVT::v1i1, VX, Imm);
		SDValue Ins = DAG.getNode(ISD::INSERT_SUBVECTOR, DL, MVT::v8i1,
		DAG.getConstant(0, DL, MVT::v8i1), IsNanZero,
		DAG.getIntPtrConstant(0, DL));
		SDValue NeedSwap = DAG.getBitcast(MVT::i8, Ins);
		SDValue NewX = DAG.getSelect(DL, VT, NeedSwap, Y, X);
		SDValue NewY = DAG.getSelect(DL, VT, NeedSwap, X, Y);
		return DAG.getNode(MinMaxOp, DL, VT, NewX, NewY, Op->getFlags());
		} else {
		SDValue IsXZero;
		if (Subtarget.is64Bit() \|\| VT != MVT::f64) {
		SDValue XInt = DAG.getNode(ISD::BITCAST, DL, IVT, X);
		SDValue ZeroCst = DAG.getConstant(PreferredZero, DL, IVT);
		IsXZero = DAG.getSetCC(DL, SetCCType, XInt, ZeroCst, ISD::SETEQ);
		} else {
		assert(VT == MVT::f64);
		SDValue Ins = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, MVT::v2f64,
		DAG.getConstantFP(0, DL, MVT::v2f64), X,
		DAG.getIntPtrConstant(0, DL));
		SDValue VX = DAG.getNode(ISD::BITCAST, DL, MVT::v4f32, Ins);
		SDValue Lo = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, VX,
		DAG.getIntPtrConstant(0, DL));
		Lo = DAG.getBitcast(MVT::i32, Lo);
		SDValue Hi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, VX,
		DAG.getIntPtrConstant(1, DL));
		Hi = DAG.getBitcast(MVT::i32, Hi);
		PreferredZero = APInt::getZero(SizeInBits / 2);
		if (MinMaxOp == X86ISD::FMIN)
		PreferredZero.setSignBit();
		IsXZero = DAG.getNode(ISD::XOR, DL, MVT::i32, Hi,
		DAG.getConstant(PreferredZero, DL, MVT::i32));
		IsXZero = DAG.getNode(ISD::OR, DL, MVT::i32, Lo, IsXZero);
		IsXZero = DAG.getSetCC(DL, SetCCType, IsXZero,
		DAG.getConstant(0, DL, MVT::i32), ISD::SETEQ);
		}
		SDValue NewX = DAG.getSelect(DL, VT, IsXZero, Y, X);
		SDValue NewY = DAG.getSelect(DL, VT, IsXZero, X, Y);
		MinMax = DAG.getNode(MinMaxOp, DL, VT, NewX, NewY, Op->getFlags());
		}

		if (Op->getFlags().hasNoNaNs() \|\| (IsXNeverNaN && IsYNeverNaN)) {
		return MinMax;
		}
		RKSimonUnsubmitted Done Reply Inline Actions (style) remove braces from single line if() RKSimon: (style) remove braces from single line if()

		APFloat NaNValue = APFloat::getNaN(DAG.EVTToAPFloatSemantics(VT));
		SDValue IsNaN = DAG.getSetCC(DL, SetCCType, IsXNeverNaN ? Y : X,
		IsYNeverNaN ? X : Y, ISD::SETUO);
		return DAG.getSelect(DL, VT, IsNaN, DAG.getConstantFP(NaNValue, DL, VT),
		MinMax);
		}

static SDValue LowerABD(SDValue Op, const X86Subtarget &Subtarget,		static SDValue LowerABD(SDValue Op, const X86Subtarget &Subtarget,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();

// For AVX1 cases, split to use legal ops.		// For AVX1 cases, split to use legal ops.
if (VT.is256BitVector() && !Subtarget.hasInt256())		if (VT.is256BitVector() && !Subtarget.hasInt256())
return splitVectorIntBinary(Op, DAG);		return splitVectorIntBinary(Op, DAG);

▲ Show 20 Lines • Show All 3,724 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::UADDSAT:		case ISD::UADDSAT:
case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::USUBSAT:		case ISD::USUBSAT:
case ISD::SSUBSAT: return LowerADDSAT_SUBSAT(Op, DAG, Subtarget);		case ISD::SSUBSAT: return LowerADDSAT_SUBSAT(Op, DAG, Subtarget);
case ISD::SMAX:		case ISD::SMAX:
case ISD::SMIN:		case ISD::SMIN:
case ISD::UMAX:		case ISD::UMAX:
case ISD::UMIN: return LowerMINMAX(Op, Subtarget, DAG);		case ISD::UMIN: return LowerMINMAX(Op, Subtarget, DAG);
		case ISD::FMINIMUM:
		case ISD::FMAXIMUM:
		return LowerFMINIMUM_FMAXIMUM(Op, Subtarget, DAG);
		pengfeiUnsubmitted Done Reply Inline Actions ditto format. Why do we need `getNode`? pengfei: ditto format. Why do we need `getNode`?
		e-kudAuthorUnsubmitted Done Reply Inline Actions Format doesn't work, we have 80+ chars if it's aligned with `return`. We don't need `getNode`, I missed it when moved from combine to lowering. e-kud: Format doesn't work, we have 80+ chars if it's aligned with `return`. We don't need `getNode`…
case ISD::ABS: return LowerABS(Op, Subtarget, DAG);		case ISD::ABS: return LowerABS(Op, Subtarget, DAG);
case ISD::ABDS:		case ISD::ABDS:
case ISD::ABDU: return LowerABD(Op, Subtarget, DAG);		case ISD::ABDU: return LowerABD(Op, Subtarget, DAG);
case ISD::AVGCEILU: return LowerAVG(Op, Subtarget, DAG);		case ISD::AVGCEILU: return LowerAVG(Op, Subtarget, DAG);
case ISD::FSINCOS: return LowerFSINCOS(Op, Subtarget, DAG);		case ISD::FSINCOS: return LowerFSINCOS(Op, Subtarget, DAG);
case ISD::MLOAD: return LowerMLOAD(Op, Subtarget, DAG);		case ISD::MLOAD: return LowerMLOAD(Op, Subtarget, DAG);
case ISD::MSTORE: return LowerMSTORE(Op, Subtarget, DAG);		case ISD::MSTORE: return LowerMSTORE(Op, Subtarget, DAG);
case ISD::MGATHER: return LowerMGATHER(Op, Subtarget, DAG);		case ISD::MGATHER: return LowerMGATHER(Op, Subtarget, DAG);
▲ Show 20 Lines • Show All 19,692 Lines • ▼ Show 20 Lines	static SDValue combineFMinNumFMaxNum(SDNode *N, SelectionDAG &DAG,
SDValue MinOrMax = DAG.getNode(MinMaxOp, DL, VT, Op1, Op0);		SDValue MinOrMax = DAG.getNode(MinMaxOp, DL, VT, Op1, Op0);
SDValue IsOp0Nan = DAG.getSetCC(DL, SetCCType, Op0, Op0, ISD::SETUO);		SDValue IsOp0Nan = DAG.getSetCC(DL, SetCCType, Op0, Op0, ISD::SETUO);

// If Op0 is a NaN, select Op1. Otherwise, select the max. If both operands		// If Op0 is a NaN, select Op1. Otherwise, select the max. If both operands
// are NaN, the NaN value of Op1 is the result.		// are NaN, the NaN value of Op1 is the result.
return DAG.getSelect(DL, VT, IsOp0Nan, Op1, MinOrMax);		return DAG.getSelect(DL, VT, IsOp0Nan, Op1, MinOrMax);
}		}

static SDValue combineX86INT_TO_FP(SDNode *N, SelectionDAG &DAG,		static SDValue combineX86INT_TO_FP(SDNode *N, SelectionDAG &DAG,
		pengfeiUnsubmitted Done Reply Inline Actions The code doesn't combine anything, should be moved to `LowerFMinimumFMaximum`? pengfei: The code doesn't combine anything, should be moved to `LowerFMinimumFMaximum`?
		e-kudAuthorUnsubmitted Done Reply Inline Actions I've tried to find a better place to handle `ISD::FMAXIMUM` but there is no place in chain Selecting->Combining->Legalization. So, I was inspired by `ISD::FMAXNUM` that is actually lowered during combining. If you are about naming solely, it is not a big deal to change the name, but all callees in `PerformDAGCombine` are named as `combine` even for `ISD::FMAXNUM`. Do we really want to have a single `LowerFMinimumFMaximum`? e-kud:* I've tried to find a better place to handle `ISD::FMAXIMUM` but there is no place in chain…
		pengfeiUnsubmitted Done Reply Inline Actions No, not the name. Did you try set action of `FMAXIMUM` to `Custom`? But I'm fine given it's similar to `combineFMinNumFMaxNum`. pengfei: No, not the name. Did you try set action of `FMAXIMUM` to `Custom`? But I'm fine given it's…
		pengfeiUnsubmitted Done Reply Inline Actions Answer my previous question: `combineFMinNumFMaxNum` was intended here due to the reason described in D15294. I suppose the problem doesn't exist to `combineFMinimumFMaximum`. So I prefer to `LowerFMinimumFMaximum`. pengfei: Answer my previous question: `combineFMinNumFMaxNum` was intended here due to the reason…
		e-kudAuthorUnsubmitted Done Reply Inline Actions Yes, thank you. I was unaware of this mechanism. It works but I've needed to include extra logic because with `Custom` lowering there is no more `setcc` combining. e-kud: Yes, thank you. I was unaware of this mechanism. It works but I've needed to include extra…
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

APInt DemandedElts = APInt::getAllOnes(VT.getVectorNumElements());		APInt DemandedElts = APInt::getAllOnes(VT.getVectorNumElements());
if (TLI.SimplifyDemandedVectorElts(SDValue(N, 0), DemandedElts, DCI))		if (TLI.SimplifyDemandedVectorElts(SDValue(N, 0), DemandedElts, DCI))
return SDValue(N, 0);		return SDValue(N, 0);

// Convert a full vector load into vzload when not all bits are needed.		// Convert a full vector load into vzload when not all bits are needed.
SDValue In = N->getOperand(0);		SDValue In = N->getOperand(0);
MVT InVT = In.getSimpleValueType();		MVT InVT = In.getSimpleValueType();
		pengfeiUnsubmitted Done Reply Inline Actions `Subtarget.hasFP16() && VT == MVT::f16` pengfei: `Subtarget.hasFP16() && VT == MVT::f16`
if (VT.getVectorNumElements() < InVT.getVectorNumElements() &&		if (VT.getVectorNumElements() < InVT.getVectorNumElements() &&
ISD::isNormalLoad(In.getNode()) && In.hasOneUse()) {		ISD::isNormalLoad(In.getNode()) && In.hasOneUse()) {
assert(InVT.is128BitVector() && "Expected 128-bit input vector");		assert(InVT.is128BitVector() && "Expected 128-bit input vector");
LoadSDNode *LN = cast<LoadSDNode>(N->getOperand(0));		LoadSDNode *LN = cast<LoadSDNode>(N->getOperand(0));
unsigned NumBits = InVT.getScalarSizeInBits() * VT.getVectorNumElements();		unsigned NumBits = InVT.getScalarSizeInBits() * VT.getVectorNumElements();
MVT MemVT = MVT::getIntegerVT(NumBits);		MVT MemVT = MVT::getIntegerVT(NumBits);
MVT LoadVT = MVT::getVectorVT(MemVT, 128 / NumBits);		MVT LoadVT = MVT::getVectorVT(MemVT, 128 / NumBits);
if (SDValue VZLoad = narrowLoadToVZLoad(LN, MemVT, LoadVT, DAG)) {		if (SDValue VZLoad = narrowLoadToVZLoad(LN, MemVT, LoadVT, DAG)) {
SDLoc dl(N);		SDLoc dl(N);
SDValue Convert = DAG.getNode(N->getOpcode(), dl, VT,		SDValue Convert = DAG.getNode(N->getOpcode(), dl, VT,
DAG.getBitcast(InVT, VZLoad));		DAG.getBitcast(InVT, VZLoad));
DCI.CombineTo(N, Convert);		DCI.CombineTo(N, Convert);
DAG.ReplaceAllUsesOfValueWith(SDValue(LN, 1), VZLoad.getValue(1));		DAG.ReplaceAllUsesOfValueWith(SDValue(LN, 1), VZLoad.getValue(1));
DCI.recursivelyDeleteUnusedNodes(LN);		DCI.recursivelyDeleteUnusedNodes(LN);
return SDValue(N, 0);		return SDValue(N, 0);
}		}
}		}

return SDValue();		return SDValue();
}		}

		pengfeiUnsubmitted Done Reply Inline Actions Better to give a table like above // Op1 Op1 // Num xNaN +0 -0 // ----------------- -------------- // Num \| Max \| qNaN \| +0 \| +0 \| +0 \| // Op0 ----------------- Op0 -------------- // xNaN \| qNaN \| qNaN \| -0 \| +0 \| -0 \| // ----------------- -------------- pengfei: Better to give a table like above ``` // Op1 Op1…
static SDValue combineCVTP2I_CVTTP2I(SDNode *N, SelectionDAG &DAG,		static SDValue combineCVTP2I_CVTTP2I(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI) {
bool IsStrict = N->isTargetStrictFPOpcode();		bool IsStrict = N->isTargetStrictFPOpcode();
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

// Convert a full vector load into vzload when not all bits are needed.		// Convert a full vector load into vzload when not all bits are needed.
SDValue In = N->getOperand(IsStrict ? 1 : 0);		SDValue In = N->getOperand(IsStrict ? 1 : 0);
MVT InVT = In.getSimpleValueType();		MVT InVT = In.getSimpleValueType();
if (VT.getVectorNumElements() < InVT.getVectorNumElements() &&		if (VT.getVectorNumElements() < InVT.getVectorNumElements() &&
		pengfeiUnsubmitted Done Reply Inline Actions Should we not do it for `hasNoSignedZeros`? pengfei: Should we not do it for `hasNoSignedZeros`?
ISD::isNormalLoad(In.getNode()) && In.hasOneUse()) {		ISD::isNormalLoad(In.getNode()) && In.hasOneUse()) {
assert(InVT.is128BitVector() && "Expected 128-bit input vector");		assert(InVT.is128BitVector() && "Expected 128-bit input vector");
LoadSDNode *LN = cast<LoadSDNode>(In);		LoadSDNode *LN = cast<LoadSDNode>(In);
unsigned NumBits = InVT.getScalarSizeInBits() * VT.getVectorNumElements();		unsigned NumBits = InVT.getScalarSizeInBits() * VT.getVectorNumElements();
MVT MemVT = MVT::getFloatingPointVT(NumBits);		MVT MemVT = MVT::getFloatingPointVT(NumBits);
MVT LoadVT = MVT::getVectorVT(MemVT, 128 / NumBits);		MVT LoadVT = MVT::getVectorVT(MemVT, 128 / NumBits);
if (SDValue VZLoad = narrowLoadToVZLoad(LN, MemVT, LoadVT, DAG)) {		if (SDValue VZLoad = narrowLoadToVZLoad(LN, MemVT, LoadVT, DAG)) {
SDLoc dl(N);		SDLoc dl(N);
if (IsStrict) {		if (IsStrict) {
SDValue Convert =		SDValue Convert =
DAG.getNode(N->getOpcode(), dl, {VT, MVT::Other},		DAG.getNode(N->getOpcode(), dl, {VT, MVT::Other},
{N->getOperand(0), DAG.getBitcast(InVT, VZLoad)});		{N->getOperand(0), DAG.getBitcast(InVT, VZLoad)});
DCI.CombineTo(N, Convert, Convert.getValue(1));		DCI.CombineTo(N, Convert, Convert.getValue(1));
} else {		} else {
SDValue Convert =		SDValue Convert =
DAG.getNode(N->getOpcode(), dl, VT, DAG.getBitcast(InVT, VZLoad));		DAG.getNode(N->getOpcode(), dl, VT, DAG.getBitcast(InVT, VZLoad));
DCI.CombineTo(N, Convert);		DCI.CombineTo(N, Convert);
}		}
DAG.ReplaceAllUsesOfValueWith(SDValue(LN, 1), VZLoad.getValue(1));		DAG.ReplaceAllUsesOfValueWith(SDValue(LN, 1), VZLoad.getValue(1));
DCI.recursivelyDeleteUnusedNodes(LN);		DCI.recursivelyDeleteUnusedNodes(LN);
		pengfeiUnsubmitted Done Reply Inline Actions Can we check if it is 0 or NaN? We can put both in the second operand I think. pengfei: Can we check if it is 0 or NaN? We can put both in the second operand I think.
		pengfeiUnsubmitted Done Reply Inline Actions We can use `vfpclassps/d` on `AVX512DQ` to optimize it. pengfei: We can use `vfpclassps/d` on `AVX512DQ` to optimize it.
		e-kudAuthorUnsubmitted Done Reply Inline Actions It was my first attempt to compare with 0 and NaN at the same time. We have two problems. The first is that comparison with zero returns ZF regardless positive or negative zero is provided. The second is that we still need to know whether the second operand is NaN or not. We may have `(0.0, NaN)` as arguments. We checked that the first op is not NaN and is zero. Replaced it with the second when the second is NaN. It seems to me that we always need to check both operands on NaN and one check on zero. We can use `vfpclassps/d` on `AVX512DQ` if one of operands is known never NaN. Working on it. e-kud: It was my first attempt to compare with 0 and NaN at the same time. We have two problems. The…
return SDValue(N, 0);		return SDValue(N, 0);
}		}
}		}

return SDValue();		return SDValue();
}		}

/// Do target-specific dag combines on X86ISD::ANDNP nodes.		/// Do target-specific dag combines on X86ISD::ANDNP nodes.
▲ Show 20 Lines • Show All 5,240 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll

	Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines
	;			;
	%s = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.dynamic", metadata !"fpexcept.ignore")			%s = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.dynamic", metadata !"fpexcept.ignore")
	%t = call <16 x float> @llvm.experimental.constrained.fadd.v16f32(<16 x float> %va, <16 x float> %va, metadata !"round.dynamic", metadata !"fpexcept.ignore")			%t = call <16 x float> @llvm.experimental.constrained.fadd.v16f32(<16 x float> %va, <16 x float> %va, metadata !"round.dynamic", metadata !"fpexcept.ignore")
	ret void			ret void
	}			}

	define void @fmaximum(float %a, float %b, <16 x float> %va, <16 x float> %vb) {			define void @fmaximum(float %a, float %b, <16 x float> %va, <16 x float> %vb) {
	; THRU-LABEL: 'fmaximum'			; THRU-LABEL: 'fmaximum'
	; THRU-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %s = call float @llvm.maximum.f32(float %a, float %b)			; THRU-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s = call float @llvm.maximum.f32(float %a, float %b)
	; THRU-NEXT: Cost Model: Found an estimated cost of 196 for instruction: %v = call <16 x float> @llvm.maximum.v16f32(<16 x float> %va, <16 x float> %vb)			; THRU-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %v = call <16 x float> @llvm.maximum.v16f32(<16 x float> %va, <16 x float> %vb)
	; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; LATE-LABEL: 'fmaximum'			; LATE-LABEL: 'fmaximum'
	; LATE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %s = call float @llvm.maximum.f32(float %a, float %b)			; LATE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s = call float @llvm.maximum.f32(float %a, float %b)
	; LATE-NEXT: Cost Model: Found an estimated cost of 196 for instruction: %v = call <16 x float> @llvm.maximum.v16f32(<16 x float> %va, <16 x float> %vb)			; LATE-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %v = call <16 x float> @llvm.maximum.v16f32(<16 x float> %va, <16 x float> %vb)
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE-LABEL: 'fmaximum'			; SIZE-LABEL: 'fmaximum'
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s = call float @llvm.maximum.f32(float %a, float %b)			; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s = call float @llvm.maximum.f32(float %a, float %b)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %v = call <16 x float> @llvm.maximum.v16f32(<16 x float> %va, <16 x float> %vb)			; SIZE-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %v = call <16 x float> @llvm.maximum.v16f32(<16 x float> %va, <16 x float> %vb)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE_LATE-LABEL: 'fmaximum'			; SIZE_LATE-LABEL: 'fmaximum'
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %s = call float @llvm.maximum.f32(float %a, float %b)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s = call float @llvm.maximum.f32(float %a, float %b)
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 196 for instruction: %v = call <16 x float> @llvm.maximum.v16f32(<16 x float> %va, <16 x float> %vb)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %v = call <16 x float> @llvm.maximum.v16f32(<16 x float> %va, <16 x float> %vb)
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	%s = call float @llvm.maximum.f32(float %a, float %b)			%s = call float @llvm.maximum.f32(float %a, float %b)
	%v = call <16 x float> @llvm.maximum.v16f32(<16 x float> %va, <16 x float> %vb)			%v = call <16 x float> @llvm.maximum.v16f32(<16 x float> %va, <16 x float> %vb)
	ret void			ret void
	}			}

	define void @cttz(i32 %a, <16 x i32> %va) {			define void @cttz(i32 %a, <16 x i32> %va) {
	▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512fp16-fminimum-fmaximum.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -verify-machineinstrs -mtriple=x86_64-unknown-unknown -mattr=+avx512fp16 \| FileCheck %s

				declare half @llvm.minimum.f16(half, half)
				declare half @llvm.maximum.f16(half, half)
				declare <8 x half> @llvm.minimum.v8f16(<8 x half>, <8 x half>)
				declare <8 x half> @llvm.maximum.v8f16(<8 x half>, <8 x half>)

				define half @test_fminimum(half %x, half %y) {
				; CHECK-LABEL: test_fminimum:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovw %xmm0, %eax
				; CHECK-NEXT: movzwl %ax, %eax
				; CHECK-NEXT: cmpl $32768, %eax # imm = 0x8000
				; CHECK-NEXT: sete %al
				; CHECK-NEXT: kmovd %eax, %k1
				; CHECK-NEXT: vmovaps %xmm0, %xmm2
				; CHECK-NEXT: vmovsh %xmm1, %xmm0, %xmm2 {%k1}
				; CHECK-NEXT: vcmpunordsh %xmm1, %xmm0, %k2
				; CHECK-NEXT: vmovsh %xmm0, %xmm0, %xmm1 {%k1}
				; CHECK-NEXT: vminsh %xmm1, %xmm2, %xmm0
				; CHECK-NEXT: vmovsh {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
				; CHECK-NEXT: vmovsh %xmm1, %xmm0, %xmm0 {%k2}
				; CHECK-NEXT: retq
				%z = call half @llvm.minimum.f16(half %x, half %y)
				ret half %z
				}

				define <8 x half> @test_fminimum_scalarize(<8 x half> %x, <8 x half> %y) "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" {
				; CHECK-LABEL: test_fminimum_scalarize:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpsrldq {{.*#+}} xmm2 = xmm1[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; CHECK-NEXT: vpsrldq {{.*#+}} xmm3 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; CHECK-NEXT: vminsh %xmm2, %xmm3, %xmm2
				; CHECK-NEXT: vshufps {{.*#+}} xmm3 = xmm1[3,3,3,3]
				; CHECK-NEXT: vshufps {{.*#+}} xmm4 = xmm0[3,3,3,3]
				; CHECK-NEXT: vminsh %xmm3, %xmm4, %xmm3
				; CHECK-NEXT: vpunpcklwd {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3]
				; CHECK-NEXT: vpsrldq {{.*#+}} xmm3 = xmm1[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; CHECK-NEXT: vpsrldq {{.*#+}} xmm4 = xmm0[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; CHECK-NEXT: vminsh %xmm3, %xmm4, %xmm3
				; CHECK-NEXT: vpermilpd {{.*#+}} xmm4 = xmm1[1,0]
				; CHECK-NEXT: vpermilpd {{.*#+}} xmm5 = xmm0[1,0]
				; CHECK-NEXT: vminsh %xmm4, %xmm5, %xmm4
				; CHECK-NEXT: vpunpcklwd {{.*#+}} xmm3 = xmm4[0],xmm3[0],xmm4[1],xmm3[1],xmm4[2],xmm3[2],xmm4[3],xmm3[3]
				; CHECK-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]
				; CHECK-NEXT: vpsrlq $48, %xmm1, %xmm3
				; CHECK-NEXT: vpsrlq $48, %xmm0, %xmm4
				; CHECK-NEXT: vminsh %xmm3, %xmm4, %xmm3
				; CHECK-NEXT: vmovshdup {{.*#+}} xmm4 = xmm1[1,1,3,3]
				; CHECK-NEXT: vmovshdup {{.*#+}} xmm5 = xmm0[1,1,3,3]
				; CHECK-NEXT: vminsh %xmm4, %xmm5, %xmm4
				; CHECK-NEXT: vpunpcklwd {{.*#+}} xmm3 = xmm4[0],xmm3[0],xmm4[1],xmm3[1],xmm4[2],xmm3[2],xmm4[3],xmm3[3]
				; CHECK-NEXT: vminsh %xmm1, %xmm0, %xmm4
				; CHECK-NEXT: vpsrld $16, %xmm1, %xmm1
				; CHECK-NEXT: vpsrld $16, %xmm0, %xmm0
				; CHECK-NEXT: vminsh %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm4[0],xmm0[0],xmm4[1],xmm0[1],xmm4[2],xmm0[2],xmm4[3],xmm0[3]
				; CHECK-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]
				; CHECK-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
				; CHECK-NEXT: retq
				%r = call <8 x half> @llvm.minimum.v8f16(<8 x half> %x, <8 x half> %y)
				ret <8 x half> %r
				}

				define half @test_fminimum_nnan(half %x, half %y) "no-nans-fp-math"="true" {
				; CHECK-LABEL: test_fminimum_nnan:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vfpclasssh $5, %xmm1, %k1
				; CHECK-NEXT: vmovaps %xmm0, %xmm2
				; CHECK-NEXT: vmovsh %xmm1, %xmm0, %xmm2 {%k1}
				; CHECK-NEXT: vmovsh %xmm0, %xmm0, %xmm1 {%k1}
				; CHECK-NEXT: vminsh %xmm2, %xmm1, %xmm0
				; CHECK-NEXT: retq
				%1 = tail call half @llvm.minimum.f16(half %x, half %y)
				ret half %1
				}

				define half @test_fminimum_zero(half %x, half %y) {
				; CHECK-LABEL: test_fminimum_zero:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovsh {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2
				; CHECK-NEXT: vcmpunordsh %xmm1, %xmm1, %k1
				; CHECK-NEXT: vminsh {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
				; CHECK-NEXT: vmovsh %xmm2, %xmm0, %xmm0 {%k1}
				; CHECK-NEXT: retq
				%1 = tail call half @llvm.minimum.f16(half -0.0, half %y)
				ret half %1
				}

				define half @test_fminimum_nsz(half %x, half %y) {
				; CHECK-LABEL: test_fminimum_nsz:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vcmpunordsh %xmm1, %xmm0, %k1
				; CHECK-NEXT: vminsh %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vmovsh {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
				; CHECK-NEXT: vmovsh %xmm1, %xmm0, %xmm0 {%k1}
				; CHECK-NEXT: retq
				%1 = tail call nsz half @llvm.minimum.f16(half %x, half %y)
				ret half %1
				}

				define half @test_fminimum_combine_cmps(half %x, half %y) {
				; CHECK-LABEL: test_fminimum_combine_cmps:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vdivsh %xmm0, %xmm1, %xmm1
				; CHECK-NEXT: vfpclasssh $5, %xmm0, %k1
				; CHECK-NEXT: vmovaps %xmm1, %xmm2
				; CHECK-NEXT: vmovsh %xmm0, %xmm0, %xmm2 {%k1}
				; CHECK-NEXT: vmovsh %xmm1, %xmm0, %xmm0 {%k1}
				; CHECK-NEXT: vminsh %xmm2, %xmm0, %xmm0
				; CHECK-NEXT: retq
				%1 = fdiv nnan half %y, %x
				%2 = tail call half @llvm.minimum.f16(half %x, half %1)
				ret half %2
				}

				define half @test_fmaximum(half %x, half %y) {
				; CHECK-LABEL: test_fmaximum:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovw %xmm0, %eax
				; CHECK-NEXT: testw %ax, %ax
				; CHECK-NEXT: sete %al
				; CHECK-NEXT: kmovd %eax, %k1
				; CHECK-NEXT: vmovaps %xmm0, %xmm2
				; CHECK-NEXT: vmovsh %xmm1, %xmm0, %xmm2 {%k1}
				; CHECK-NEXT: vcmpunordsh %xmm1, %xmm0, %k2
				; CHECK-NEXT: vmovsh %xmm0, %xmm0, %xmm1 {%k1}
				; CHECK-NEXT: vmaxsh %xmm1, %xmm2, %xmm0
				; CHECK-NEXT: vmovsh {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
				; CHECK-NEXT: vmovsh %xmm1, %xmm0, %xmm0 {%k2}
				; CHECK-NEXT: retq
				%r = call half @llvm.maximum.f16(half %x, half %y)
				ret half %r
				}

				define <8 x half> @test_fmaximum_scalarize(<8 x half> %x, <8 x half> %y) "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" {
				; CHECK-LABEL: test_fmaximum_scalarize:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpsrldq {{.*#+}} xmm2 = xmm1[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; CHECK-NEXT: vpsrldq {{.*#+}} xmm3 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; CHECK-NEXT: vmaxsh %xmm2, %xmm3, %xmm2
				; CHECK-NEXT: vshufps {{.*#+}} xmm3 = xmm1[3,3,3,3]
				; CHECK-NEXT: vshufps {{.*#+}} xmm4 = xmm0[3,3,3,3]
				; CHECK-NEXT: vmaxsh %xmm3, %xmm4, %xmm3
				; CHECK-NEXT: vpunpcklwd {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3]
				; CHECK-NEXT: vpsrldq {{.*#+}} xmm3 = xmm1[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; CHECK-NEXT: vpsrldq {{.*#+}} xmm4 = xmm0[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
				; CHECK-NEXT: vmaxsh %xmm3, %xmm4, %xmm3
				; CHECK-NEXT: vpermilpd {{.*#+}} xmm4 = xmm1[1,0]
				; CHECK-NEXT: vpermilpd {{.*#+}} xmm5 = xmm0[1,0]
				; CHECK-NEXT: vmaxsh %xmm4, %xmm5, %xmm4
				; CHECK-NEXT: vpunpcklwd {{.*#+}} xmm3 = xmm4[0],xmm3[0],xmm4[1],xmm3[1],xmm4[2],xmm3[2],xmm4[3],xmm3[3]
				; CHECK-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]
				RKSimonUnsubmitted Done Reply Inline Actions please can you add vector test coverage to ensure we scalarize? RKSimon: please can you add vector test coverage to ensure we scalarize?
				e-kudAuthorUnsubmitted Done Reply Inline Actions Yes, I've added them. Also it reminded me about several commented tests with the intrinsics. Uncommented them as well. e-kud: Yes, I've added them. Also it reminded me about several commented tests with the intrinsics.
				; CHECK-NEXT: vpsrlq $48, %xmm1, %xmm3
				; CHECK-NEXT: vpsrlq $48, %xmm0, %xmm4
				; CHECK-NEXT: vmaxsh %xmm3, %xmm4, %xmm3
				; CHECK-NEXT: vmovshdup {{.*#+}} xmm4 = xmm1[1,1,3,3]
				; CHECK-NEXT: vmovshdup {{.*#+}} xmm5 = xmm0[1,1,3,3]
				; CHECK-NEXT: vmaxsh %xmm4, %xmm5, %xmm4
				; CHECK-NEXT: vpunpcklwd {{.*#+}} xmm3 = xmm4[0],xmm3[0],xmm4[1],xmm3[1],xmm4[2],xmm3[2],xmm4[3],xmm3[3]
				; CHECK-NEXT: vmaxsh %xmm1, %xmm0, %xmm4
				; CHECK-NEXT: vpsrld $16, %xmm1, %xmm1
				; CHECK-NEXT: vpsrld $16, %xmm0, %xmm0
				; CHECK-NEXT: vmaxsh %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm4[0],xmm0[0],xmm4[1],xmm0[1],xmm4[2],xmm0[2],xmm4[3],xmm0[3]
				; CHECK-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]
				; CHECK-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
				; CHECK-NEXT: retq
				%r = call <8 x half> @llvm.maximum.v8f16(<8 x half> %x, <8 x half> %y)
				ret <8 x half> %r
				}

				define half @test_fmaximum_nnan(half %x, half %y) {
				; CHECK-LABEL: test_fmaximum_nnan:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vaddsh %xmm1, %xmm0, %xmm2
				; CHECK-NEXT: vsubsh %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vfpclasssh $3, %xmm0, %k1
				; CHECK-NEXT: vmovaps %xmm2, %xmm1
				; CHECK-NEXT: vmovsh %xmm0, %xmm0, %xmm1 {%k1}
				; CHECK-NEXT: vmovsh %xmm2, %xmm0, %xmm0 {%k1}
				; CHECK-NEXT: vmaxsh %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: retq
				%1 = fadd nnan half %x, %y
				%2 = fsub nnan half %x, %y
				%3 = tail call half @llvm.maximum.f16(half %1, half %2)
				ret half %3
				}

				define half @test_fmaximum_zero(half %x, half %y) {
				; CHECK-LABEL: test_fmaximum_zero:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
				; CHECK-NEXT: vmaxsh %xmm0, %xmm1, %xmm0
				; CHECK-NEXT: vmovsh {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2
				; CHECK-NEXT: vcmpunordsh %xmm1, %xmm1, %k1
				; CHECK-NEXT: vmovsh %xmm2, %xmm0, %xmm0 {%k1}
				; CHECK-NEXT: retq
				%1 = tail call half @llvm.maximum.f16(half 0.0, half %y)
				ret half %1
				}

				define half @test_fmaximum_nsz(half %x, half %y) "no-signed-zeros-fp-math"="true" {
				; CHECK-LABEL: test_fmaximum_nsz:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vcmpunordsh %xmm1, %xmm0, %k1
				; CHECK-NEXT: vmaxsh %xmm1, %xmm0, %xmm0
				; CHECK-NEXT: vmovsh {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
				; CHECK-NEXT: vmovsh %xmm1, %xmm0, %xmm0 {%k1}
				; CHECK-NEXT: retq
				%1 = tail call half @llvm.maximum.f16(half %x, half %y)
				ret half %1
				}

				define half @test_fmaximum_combine_cmps(half %x, half %y) {
				; CHECK-LABEL: test_fmaximum_combine_cmps:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vdivsh %xmm0, %xmm1, %xmm1
				; CHECK-NEXT: vfpclasssh $3, %xmm0, %k1
				; CHECK-NEXT: vmovaps %xmm1, %xmm2
				; CHECK-NEXT: vmovsh %xmm0, %xmm0, %xmm2 {%k1}
				; CHECK-NEXT: vmovsh %xmm1, %xmm0, %xmm0 {%k1}
				; CHECK-NEXT: vmaxsh %xmm2, %xmm0, %xmm0
				; CHECK-NEXT: retq
				%1 = fdiv nnan half %y, %x
				%2 = tail call half @llvm.maximum.f16(half %x, half %1)
				ret half %2
				}

llvm/test/CodeGen/X86/extract-fp.ll

	Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]			; CHECK-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
	; CHECK-NEXT: minsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; CHECK-NEXT: minsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%v = call <2 x double> @llvm.minnum.v2f64(<2 x double> <double 0.0, double 1.0>, <2 x double> %x)			%v = call <2 x double> @llvm.minnum.v2f64(<2 x double> <double 0.0, double 1.0>, <2 x double> %x)
	%r = extractelement <2 x double> %v, i32 1			%r = extractelement <2 x double> %v, i32 1
	ret double %r			ret double %r
	}			}

	;define double @ext_maximum_v4f64(<2 x double> %x) nounwind {			define double @ext_maximum_v4f64(<2 x double> %x) nounwind {
	; %v = call <2 x double> @llvm.maximum.v2f64(<2 x double> %x, <2 x double> <double 42.0, double 43.0>)			; CHECK-LABEL: ext_maximum_v4f64:
	; %r = extractelement <2 x double> %v, i32 1			; CHECK: # %bb.0:
	; ret double %r			; CHECK-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
	;}			; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
				; CHECK-NEXT: maxsd %xmm0, %xmm1
	;define float @ext_minimum_v4f32(<4 x float> %x) nounwind {			; CHECK-NEXT: cmpunordsd %xmm0, %xmm0
	; %v = call <4 x float> @llvm.minimum.v4f32(<4 x float> %x, <4 x float> <float 0.0, float 1.0, float 2.0, float 42.0>)			; CHECK-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
	; %r = extractelement <4 x float> %v, i32 1			; CHECK-NEXT: andpd %xmm0, %xmm2
	; ret float %r			; CHECK-NEXT: andnpd %xmm1, %xmm0
	;}			; CHECK-NEXT: orpd %xmm2, %xmm0
				; CHECK-NEXT: retq
				%v = call <2 x double> @llvm.maximum.v2f64(<2 x double> %x, <2 x double> <double 42.0, double 43.0>)
				%r = extractelement <2 x double> %v, i32 1
				ret double %r
				}

				define float @ext_minimum_v4f32(<4 x float> %x) nounwind {
				; CHECK-LABEL: ext_minimum_v4f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
				; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; CHECK-NEXT: minss %xmm0, %xmm1
				; CHECK-NEXT: cmpunordss %xmm0, %xmm0
				; CHECK-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
				; CHECK-NEXT: andps %xmm0, %xmm2
				; CHECK-NEXT: andnps %xmm1, %xmm0
				; CHECK-NEXT: orps %xmm2, %xmm0
				; CHECK-NEXT: retq
				%v = call <4 x float> @llvm.minimum.v4f32(<4 x float> %x, <4 x float> <float 0.0, float 1.0, float 2.0, float 42.0>)
				%r = extractelement <4 x float> %v, i32 1
				ret float %r
				}

	declare <4 x float> @llvm.maxnum.v4f32(<4 x float>, <4 x float>)			declare <4 x float> @llvm.maxnum.v4f32(<4 x float>, <4 x float>)
	declare <2 x double> @llvm.minnum.v2f64(<2 x double>, <2 x double>)			declare <2 x double> @llvm.minnum.v2f64(<2 x double>, <2 x double>)
				declare <2 x double> @llvm.maximum.v2f64(<2 x double>, <2 x double>)
				declare <4 x float> @llvm.minimum.v4f32(<4 x float>, <4 x float>)

llvm/test/CodeGen/X86/extractelement-fp.ll

	Show First 20 Lines • Show All 666 Lines • ▼ Show 20 Lines
	; X86-NEXT: popl %ebp			; X86-NEXT: popl %ebp
	; X86-NEXT: vzeroupper			; X86-NEXT: vzeroupper
	; X86-NEXT: retl			; X86-NEXT: retl
	%v = call <4 x double> @llvm.minnum.v4f64(<4 x double> %x, <4 x double> %y)			%v = call <4 x double> @llvm.minnum.v4f64(<4 x double> %x, <4 x double> %y)
	%r = extractelement <4 x double> %v, i32 0			%r = extractelement <4 x double> %v, i32 0
	ret double %r			ret double %r
	}			}

	;define float @fmaximum_v4f32(<4 x float> %x, <4 x float> %y) nounwind {			define float @fmaximum_v4f32(<4 x float> %x, <4 x float> %y) nounwind {
	; %v = call <4 x float> @llvm.maximum.v4f32(<4 x float> %x, <4 x float> %y)			; X64-LABEL: fmaximum_v4f32:
	; %r = extractelement <4 x float> %v, i32 0			; X64: # %bb.0:
	; ret float %r			; X64-NEXT: vmovd %xmm0, %eax
	;}			; X64-NEXT: testl %eax, %eax
				; X64-NEXT: je .LBB30_1
	;define double @fmaximum_v4f64(<4 x double> %x, <4 x double> %y) nounwind {			; X64-NEXT: # %bb.2:
	; %v = call <4 x double> @llvm.maximum.v4f64(<4 x double> %x, <4 x double> %y)			; X64-NEXT: vmovdqa %xmm1, %xmm2
	; %r = extractelement <4 x double> %v, i32 0			; X64-NEXT: vmovdqa %xmm0, %xmm3
	; ret double %r			; X64-NEXT: jmp .LBB30_3
	;}			; X64-NEXT: .LBB30_1:
				; X64-NEXT: vmovdqa %xmm0, %xmm2
	;define float @fminimum_v4f32(<4 x float> %x, <4 x float> %y) nounwind {			; X64-NEXT: vmovdqa %xmm1, %xmm3
	; %v = call <4 x float> @llvm.minimum.v4f32(<4 x float> %x, <4 x float> %y)			; X64-NEXT: .LBB30_3:
	; %r = extractelement <4 x float> %v, i32 0			; X64-NEXT: vmaxss %xmm2, %xmm3, %xmm2
	; ret float %r			; X64-NEXT: vcmpunordss %xmm1, %xmm0, %xmm0
	;}			; X64-NEXT: vbroadcastss {{.*#+}} xmm1 = [NaN,NaN,NaN,NaN]
				; X64-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
	;define double @fminimum_v4f64(<4 x double> %x, <4 x double> %y) nounwind {			; X64-NEXT: retq
	; %v = call <4 x double> @llvm.minimum.v4f64(<4 x double> %x, <4 x double> %y)			;
	; %r = extractelement <4 x double> %v, i32 0			; X86-LABEL: fmaximum_v4f32:
	; ret double %r			; X86: # %bb.0:
	;}			; X86-NEXT: vmovd %xmm0, %eax
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: je .LBB30_1
				; X86-NEXT: # %bb.2:
				; X86-NEXT: vmovdqa %xmm1, %xmm2
				; X86-NEXT: vmovdqa %xmm0, %xmm3
				; X86-NEXT: jmp .LBB30_3
				; X86-NEXT: .LBB30_1:
				; X86-NEXT: vmovdqa %xmm0, %xmm2
				; X86-NEXT: vmovdqa %xmm1, %xmm3
				; X86-NEXT: .LBB30_3:
				; X86-NEXT: pushl %eax
				; X86-NEXT: vmaxss %xmm2, %xmm3, %xmm2
				; X86-NEXT: vcmpunordss %xmm1, %xmm0, %xmm0
				; X86-NEXT: vbroadcastss {{.*#+}} xmm1 = [NaN,NaN,NaN,NaN]
				; X86-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
				; X86-NEXT: vmovss %xmm0, (%esp)
				; X86-NEXT: flds (%esp)
				; X86-NEXT: popl %eax
				; X86-NEXT: retl
				%v = call <4 x float> @llvm.maximum.v4f32(<4 x float> %x, <4 x float> %y)
				%r = extractelement <4 x float> %v, i32 0
				ret float %r
				}

				define double @fmaximum_v4f64(<4 x double> %x, <4 x double> %y) nounwind {
				; X64-LABEL: fmaximum_v4f64:
				; X64: # %bb.0:
				; X64-NEXT: vmovq %xmm0, %rax
				; X64-NEXT: testq %rax, %rax
				; X64-NEXT: je .LBB31_1
				; X64-NEXT: # %bb.2:
				; X64-NEXT: vmovdqa %xmm1, %xmm2
				; X64-NEXT: vmovdqa %xmm0, %xmm3
				; X64-NEXT: jmp .LBB31_3
				; X64-NEXT: .LBB31_1:
				; X64-NEXT: vmovdqa %xmm0, %xmm2
				; X64-NEXT: vmovdqa %xmm1, %xmm3
				; X64-NEXT: .LBB31_3:
				; X64-NEXT: vmaxsd %xmm2, %xmm3, %xmm2
				; X64-NEXT: vcmpunordsd %xmm1, %xmm0, %xmm0
				; X64-NEXT: vblendvpd %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm0
				; X64-NEXT: vzeroupper
				; X64-NEXT: retq
				;
				; X86-LABEL: fmaximum_v4f64:
				; X86: # %bb.0:
				; X86-NEXT: vpextrd $1, %xmm0, %eax
				; X86-NEXT: vmovd %xmm0, %ecx
				; X86-NEXT: orl %eax, %ecx
				; X86-NEXT: je .LBB31_1
				; X86-NEXT: # %bb.2:
				; X86-NEXT: vmovdqa %xmm1, %xmm2
				; X86-NEXT: vmovdqa %xmm0, %xmm3
				; X86-NEXT: jmp .LBB31_3
				; X86-NEXT: .LBB31_1:
				; X86-NEXT: vmovdqa %xmm0, %xmm2
				; X86-NEXT: vmovdqa %xmm1, %xmm3
				; X86-NEXT: .LBB31_3:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: vmaxsd %xmm2, %xmm3, %xmm2
				; X86-NEXT: vcmpunordsd %xmm1, %xmm0, %xmm0
				; X86-NEXT: vblendvpd %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm2, %xmm0
				; X86-NEXT: vmovlpd %xmm0, (%esp)
				; X86-NEXT: fldl (%esp)
				; X86-NEXT: movl %ebp, %esp
				; X86-NEXT: popl %ebp
				; X86-NEXT: vzeroupper
				; X86-NEXT: retl
				%v = call <4 x double> @llvm.maximum.v4f64(<4 x double> %x, <4 x double> %y)
				%r = extractelement <4 x double> %v, i32 0
				ret double %r
				}

				define float @fminimum_v4f32(<4 x float> %x, <4 x float> %y) nounwind {
				; X64-LABEL: fminimum_v4f32:
				; X64: # %bb.0:
				; X64-NEXT: vmovd %xmm0, %eax
				; X64-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; X64-NEXT: je .LBB32_1
				; X64-NEXT: # %bb.2:
				; X64-NEXT: vmovdqa %xmm1, %xmm2
				; X64-NEXT: vmovdqa %xmm0, %xmm3
				; X64-NEXT: jmp .LBB32_3
				; X64-NEXT: .LBB32_1:
				; X64-NEXT: vmovdqa %xmm0, %xmm2
				; X64-NEXT: vmovdqa %xmm1, %xmm3
				; X64-NEXT: .LBB32_3:
				; X64-NEXT: vminss %xmm2, %xmm3, %xmm2
				; X64-NEXT: vcmpunordss %xmm1, %xmm0, %xmm0
				; X64-NEXT: vbroadcastss {{.*#+}} xmm1 = [NaN,NaN,NaN,NaN]
				; X64-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
				; X64-NEXT: retq
				;
				; X86-LABEL: fminimum_v4f32:
				; X86: # %bb.0:
				; X86-NEXT: vmovd %xmm0, %eax
				; X86-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; X86-NEXT: je .LBB32_1
				; X86-NEXT: # %bb.2:
				; X86-NEXT: vmovdqa %xmm1, %xmm2
				; X86-NEXT: vmovdqa %xmm0, %xmm3
				; X86-NEXT: jmp .LBB32_3
				; X86-NEXT: .LBB32_1:
				; X86-NEXT: vmovdqa %xmm0, %xmm2
				; X86-NEXT: vmovdqa %xmm1, %xmm3
				; X86-NEXT: .LBB32_3:
				; X86-NEXT: pushl %eax
				; X86-NEXT: vminss %xmm2, %xmm3, %xmm2
				; X86-NEXT: vcmpunordss %xmm1, %xmm0, %xmm0
				; X86-NEXT: vbroadcastss {{.*#+}} xmm1 = [NaN,NaN,NaN,NaN]
				; X86-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
				; X86-NEXT: vmovss %xmm0, (%esp)
				; X86-NEXT: flds (%esp)
				; X86-NEXT: popl %eax
				; X86-NEXT: retl
				%v = call <4 x float> @llvm.minimum.v4f32(<4 x float> %x, <4 x float> %y)
				%r = extractelement <4 x float> %v, i32 0
				ret float %r
				}

				define double @fminimum_v4f64(<4 x double> %x, <4 x double> %y) nounwind {
				; X64-LABEL: fminimum_v4f64:
				; X64: # %bb.0:
				; X64-NEXT: vmovq %xmm0, %rax
				; X64-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
				; X64-NEXT: cmpq %rcx, %rax
				; X64-NEXT: je .LBB33_1
				; X64-NEXT: # %bb.2:
				; X64-NEXT: vmovdqa %xmm1, %xmm2
				; X64-NEXT: vmovdqa %xmm0, %xmm3
				; X64-NEXT: jmp .LBB33_3
				; X64-NEXT: .LBB33_1:
				; X64-NEXT: vmovdqa %xmm0, %xmm2
				; X64-NEXT: vmovdqa %xmm1, %xmm3
				; X64-NEXT: .LBB33_3:
				; X64-NEXT: vminsd %xmm2, %xmm3, %xmm2
				; X64-NEXT: vcmpunordsd %xmm1, %xmm0, %xmm0
				; X64-NEXT: vblendvpd %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm0
				; X64-NEXT: vzeroupper
				; X64-NEXT: retq
				;
				; X86-LABEL: fminimum_v4f64:
				; X86: # %bb.0:
				; X86-NEXT: vmovd %xmm0, %eax
				; X86-NEXT: vpextrd $1, %xmm0, %ecx
				; X86-NEXT: addl $-2147483648, %ecx # imm = 0x80000000
				; X86-NEXT: orl %eax, %ecx
				; X86-NEXT: je .LBB33_1
				; X86-NEXT: # %bb.2:
				; X86-NEXT: vmovdqa %xmm1, %xmm2
				; X86-NEXT: vmovdqa %xmm0, %xmm3
				; X86-NEXT: jmp .LBB33_3
				; X86-NEXT: .LBB33_1:
				; X86-NEXT: vmovdqa %xmm0, %xmm2
				; X86-NEXT: vmovdqa %xmm1, %xmm3
				; X86-NEXT: .LBB33_3:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: vminsd %xmm2, %xmm3, %xmm2
				; X86-NEXT: vcmpunordsd %xmm1, %xmm0, %xmm0
				; X86-NEXT: vblendvpd %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm2, %xmm0
				; X86-NEXT: vmovlpd %xmm0, (%esp)
				; X86-NEXT: fldl (%esp)
				; X86-NEXT: movl %ebp, %esp
				; X86-NEXT: popl %ebp
				; X86-NEXT: vzeroupper
				; X86-NEXT: retl
				%v = call <4 x double> @llvm.minimum.v4f64(<4 x double> %x, <4 x double> %y)
				%r = extractelement <4 x double> %v, i32 0
				ret double %r
				}

	define float @maxps_v4f32(<4 x float> %x, <4 x float> %y) nounwind {			define float @maxps_v4f32(<4 x float> %x, <4 x float> %y) nounwind {
	; X64-LABEL: maxps_v4f32:			; X64-LABEL: maxps_v4f32:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vmaxss %xmm1, %xmm0, %xmm0			; X64-NEXT: vmaxss %xmm1, %xmm0, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: maxps_v4f32:			; X86-LABEL: maxps_v4f32:
	▲ Show 20 Lines • Show All 542 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fminimum-fmaximum.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE2
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefixes=AVX,AVX1
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX,AVX512,AVX512F
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512dq \| FileCheck %s --check-prefixes=AVX,AVX512,AVX512DQ
				pengfeiUnsubmitted Done Reply Inline Actions Guess you missed a AVX512F, i.e. `AVX,AVX512,AVX512F` pengfei: Guess you missed a AVX512F, i.e. `AVX,AVX512,AVX512F`
				; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefixes=X86

				declare float @llvm.maximum.f32(float, float)
				declare double @llvm.maximum.f64(double, double)
				declare float @llvm.minimum.f32(float, float)
				declare double @llvm.minimum.f64(double, double)
				declare <2 x double> @llvm.minimum.v2f64(<2 x double>, <2 x double>)
				declare <4 x float> @llvm.maximum.v4f32(<4 x float>, <4 x float>)

				;
				; fmaximum
				;

				define float @test_fmaximum(float %x, float %y) {
				; SSE2-LABEL: test_fmaximum:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movd %xmm0, %eax
				; SSE2-NEXT: testl %eax, %eax
				; SSE2-NEXT: movdqa %xmm0, %xmm3
				; SSE2-NEXT: movdqa %xmm1, %xmm2
				; SSE2-NEXT: je .LBB0_2
				; SSE2-NEXT: # %bb.1:
				; SSE2-NEXT: movdqa %xmm1, %xmm3
				; SSE2-NEXT: movdqa %xmm0, %xmm2
				; SSE2-NEXT: .LBB0_2:
				; SSE2-NEXT: maxss %xmm3, %xmm2
				; SSE2-NEXT: cmpunordss %xmm1, %xmm0
				; SSE2-NEXT: movaps %xmm0, %xmm3
				; SSE2-NEXT: andnps %xmm2, %xmm3
				; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; SSE2-NEXT: andps %xmm0, %xmm1
				; SSE2-NEXT: orps %xmm3, %xmm1
				; SSE2-NEXT: movaps %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fmaximum:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vmovd %xmm0, %eax
				; AVX1-NEXT: testl %eax, %eax
				; AVX1-NEXT: vmovdqa %xmm0, %xmm2
				; AVX1-NEXT: vmovdqa %xmm1, %xmm3
				; AVX1-NEXT: je .LBB0_2
				; AVX1-NEXT: # %bb.1:
				; AVX1-NEXT: vmovdqa %xmm1, %xmm2
				; AVX1-NEXT: vmovdqa %xmm0, %xmm3
				; AVX1-NEXT: .LBB0_2:
				; AVX1-NEXT: vmaxss %xmm2, %xmm3, %xmm2
				; AVX1-NEXT: vcmpunordss %xmm1, %xmm0, %xmm0
				; AVX1-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fmaximum:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vmovd %xmm0, %eax
				; AVX512-NEXT: testl %eax, %eax
				; AVX512-NEXT: sete %al
				; AVX512-NEXT: kmovw %eax, %k1
				; AVX512-NEXT: vmovdqa %xmm0, %xmm2
				; AVX512-NEXT: vmovss %xmm1, %xmm2, %xmm2 {%k1}
				; AVX512-NEXT: vcmpunordss %xmm1, %xmm0, %k2
				; AVX512-NEXT: vmovss %xmm0, %xmm1, %xmm1 {%k1}
				; AVX512-NEXT: vmaxss %xmm1, %xmm2, %xmm0
				; AVX512-NEXT: vmovss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k2}
				; AVX512-NEXT: retq
				;
				; X86-LABEL: test_fmaximum:
				; X86: # %bb.0:
				; X86-NEXT: pushl %eax
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; X86-NEXT: vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; X86-NEXT: vmovd %xmm1, %eax
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: vmovdqa %xmm1, %xmm2
				; X86-NEXT: vmovdqa %xmm0, %xmm3
				; X86-NEXT: je .LBB0_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: vmovdqa %xmm0, %xmm2
				; X86-NEXT: vmovdqa %xmm1, %xmm3
				; X86-NEXT: .LBB0_2:
				; X86-NEXT: vmaxss %xmm2, %xmm3, %xmm2
				; X86-NEXT: vcmpunordss %xmm0, %xmm1, %xmm0
				; X86-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm2, %xmm0
				; X86-NEXT: vmovss %xmm0, (%esp)
				; X86-NEXT: flds (%esp)
				; X86-NEXT: popl %eax
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = tail call float @llvm.maximum.f32(float %x, float %y)
				ret float %1
				}

				define <4 x float> @test_fmaximum_scalarize(<4 x float> %x, <4 x float> %y) "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" {
				; SSE2-LABEL: test_fmaximum_scalarize:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movaps %xmm1, %xmm2
				; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[3,3],xmm1[3,3]
				; SSE2-NEXT: movaps %xmm0, %xmm3
				; SSE2-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,3],xmm0[3,3]
				; SSE2-NEXT: maxss %xmm2, %xmm3
				; SSE2-NEXT: movaps %xmm1, %xmm2
				; SSE2-NEXT: unpckhpd {{.*#+}} xmm2 = xmm2[1],xmm1[1]
				; SSE2-NEXT: movaps %xmm0, %xmm4
				; SSE2-NEXT: unpckhpd {{.*#+}} xmm4 = xmm4[1],xmm0[1]
				; SSE2-NEXT: maxss %xmm2, %xmm4
				; SSE2-NEXT: unpcklps {{.*#+}} xmm4 = xmm4[0],xmm3[0],xmm4[1],xmm3[1]
				; SSE2-NEXT: movaps %xmm0, %xmm2
				; SSE2-NEXT: maxss %xmm1, %xmm2
				; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,1,1]
				; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
				; SSE2-NEXT: maxss %xmm1, %xmm0
				; SSE2-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
				; SSE2-NEXT: movlhps {{.*#+}} xmm2 = xmm2[0],xmm4[0]
				; SSE2-NEXT: movaps %xmm2, %xmm0
				; SSE2-NEXT: retq
				;
				skatkovUnsubmitted Done Reply Inline Actions Here is what I mentioned in terms of non-optimial vectorized version at least on AVX. skatkov: Here is what I mentioned in terms of non-optimial vectorized version at least on AVX.
				RKSimonUnsubmitted Done Reply Inline Actions For now we just want x86 to support the intrinsics, vector optimization is better handled as a followup. RKSimon: For now we just want x86 to support the intrinsics, vector optimization is better handled as a…
				skatkovUnsubmitted Done Reply Inline Actions It is exactly what I said in the beginning of this discussion: side question not delaying landing this patch. skatkov: It is exactly what I said in the beginning of this discussion: side question not delaying…
				RKSimonUnsubmitted Done Reply Inline Actions yup - cheers RKSimon: yup - cheers
				; AVX-LABEL: test_fmaximum_scalarize:
				; AVX: # %bb.0:
				; AVX-NEXT: vmaxss %xmm1, %xmm0, %xmm2
				; AVX-NEXT: vmovshdup {{.*#+}} xmm3 = xmm1[1,1,3,3]
				; AVX-NEXT: vmovshdup {{.*#+}} xmm4 = xmm0[1,1,3,3]
				; AVX-NEXT: vmaxss %xmm3, %xmm4, %xmm3
				; AVX-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[2,3]
				; AVX-NEXT: vpermilpd {{.*#+}} xmm3 = xmm1[1,0]
				; AVX-NEXT: vpermilpd {{.*#+}} xmm4 = xmm0[1,0]
				; AVX-NEXT: vmaxss %xmm3, %xmm4, %xmm3
				; AVX-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],xmm3[0],xmm2[3]
				; AVX-NEXT: vshufps {{.*#+}} xmm1 = xmm1[3,3,3,3]
				; AVX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[3,3,3,3]
				; AVX-NEXT: vmaxss %xmm1, %xmm0, %xmm0
				; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm2[0,1,2],xmm0[0]
				; AVX-NEXT: retq
				;
				; X86-LABEL: test_fmaximum_scalarize:
				; X86: # %bb.0:
				; X86-NEXT: vmaxss %xmm1, %xmm0, %xmm2
				; X86-NEXT: vmovshdup {{.*#+}} xmm3 = xmm1[1,1,3,3]
				; X86-NEXT: vmovshdup {{.*#+}} xmm4 = xmm0[1,1,3,3]
				; X86-NEXT: vmaxss %xmm3, %xmm4, %xmm3
				; X86-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[2,3]
				; X86-NEXT: vpermilpd {{.*#+}} xmm3 = xmm1[1,0]
				; X86-NEXT: vpermilpd {{.*#+}} xmm4 = xmm0[1,0]
				; X86-NEXT: vmaxss %xmm3, %xmm4, %xmm3
				; X86-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],xmm3[0],xmm2[3]
				; X86-NEXT: vshufps {{.*#+}} xmm1 = xmm1[3,3,3,3]
				; X86-NEXT: vshufps {{.*#+}} xmm0 = xmm0[3,3,3,3]
				; X86-NEXT: vmaxss %xmm1, %xmm0, %xmm0
				; X86-NEXT: vinsertps {{.*#+}} xmm0 = xmm2[0,1,2],xmm0[0]
				; X86-NEXT: retl
				%r = call <4 x float> @llvm.maximum.v4f32(<4 x float> %x, <4 x float> %y)
				ret <4 x float> %r
				}

				define float @test_fmaximum_nan0(float %x, float %y) {
				; SSE2-LABEL: test_fmaximum_nan0:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fmaximum_nan0:
				; AVX: # %bb.0:
				; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; AVX-NEXT: retq
				;
				; X86-LABEL: test_fmaximum_nan0:
				; X86: # %bb.0:
				; X86-NEXT: flds {{\.?LCPI[0-9]+_[0-9]+}}
				; X86-NEXT: retl
				%1 = tail call float @llvm.maximum.f32(float 0x7fff000000000000, float %y)
				ret float %1
				}

				define float @test_fmaximum_nan1(float %x, float %y) {
				; SSE2-LABEL: test_fmaximum_nan1:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; SSE2-NEXT: retq
				pengfeiUnsubmitted Done Reply Inline Actions The test does show anything interesting. pengfei: The test does show anything interesting.
				pengfeiUnsubmitted Done Reply Inline Actions does -> doesn't pengfei: does -> doesn't
				e-kudAuthorUnsubmitted Done Reply Inline Actions Probably, yes. I wanted to show that we are able to fold all these checks with constant arguments. But we've already tested folding of zero and nan checks. Original FMAX and FMIN are tested as well. Dropped it. e-kud: Probably, yes. I wanted to show that we are able to fold all these checks with constant…
				;
				; AVX-LABEL: test_fmaximum_nan1:
				; AVX: # %bb.0:
				; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; AVX-NEXT: retq
				;
				; X86-LABEL: test_fmaximum_nan1:
				; X86: # %bb.0:
				; X86-NEXT: flds {{\.?LCPI[0-9]+_[0-9]+}}
				; X86-NEXT: retl
				%1 = tail call float @llvm.maximum.f32(float %x, float 0x7fff000000000000)
				ret float %1
				}

				define float @test_fmaximum_nnan(float %x, float %y) {
				; SSE2-LABEL: test_fmaximum_nnan:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movaps %xmm0, %xmm2
				; SSE2-NEXT: addss %xmm1, %xmm0
				; SSE2-NEXT: subss %xmm1, %xmm2
				; SSE2-NEXT: movd %xmm0, %eax
				; SSE2-NEXT: testl %eax, %eax
				; SSE2-NEXT: je .LBB4_1
				; SSE2-NEXT: # %bb.2:
				; SSE2-NEXT: maxss %xmm2, %xmm0
				; SSE2-NEXT: retq
				; SSE2-NEXT: .LBB4_1:
				; SSE2-NEXT: movaps %xmm0, %xmm1
				; SSE2-NEXT: movaps %xmm2, %xmm0
				; SSE2-NEXT: maxss %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fmaximum_nnan:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vaddss %xmm1, %xmm0, %xmm2
				; AVX1-NEXT: vsubss %xmm1, %xmm0, %xmm1
				; AVX1-NEXT: vmovd %xmm2, %eax
				; AVX1-NEXT: testl %eax, %eax
				; AVX1-NEXT: je .LBB4_1
				; AVX1-NEXT: # %bb.2:
				; AVX1-NEXT: vmaxss %xmm1, %xmm2, %xmm0
				; AVX1-NEXT: retq
				; AVX1-NEXT: .LBB4_1:
				; AVX1-NEXT: vmovaps %xmm2, %xmm0
				; AVX1-NEXT: vmaxss %xmm0, %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512F-LABEL: test_fmaximum_nnan:
				; AVX512F: # %bb.0:
				; AVX512F-NEXT: vaddss %xmm1, %xmm0, %xmm2
				; AVX512F-NEXT: vsubss %xmm1, %xmm0, %xmm0
				; AVX512F-NEXT: vmovd %xmm2, %eax
				; AVX512F-NEXT: testl %eax, %eax
				; AVX512F-NEXT: sete %al
				; AVX512F-NEXT: kmovw %eax, %k1
				; AVX512F-NEXT: vmovaps %xmm0, %xmm1
				; AVX512F-NEXT: vmovss %xmm2, %xmm1, %xmm1 {%k1}
				; AVX512F-NEXT: vmovss %xmm0, %xmm2, %xmm2 {%k1}
				; AVX512F-NEXT: vmaxss %xmm1, %xmm2, %xmm0
				; AVX512F-NEXT: retq
				;
				; AVX512DQ-LABEL: test_fmaximum_nnan:
				; AVX512DQ: # %bb.0:
				; AVX512DQ-NEXT: vaddss %xmm1, %xmm0, %xmm2
				; AVX512DQ-NEXT: vsubss %xmm1, %xmm0, %xmm0
				; AVX512DQ-NEXT: vfpclassss $3, %xmm0, %k0
				; AVX512DQ-NEXT: kmovw %k0, %k1
				; AVX512DQ-NEXT: vmovaps %xmm2, %xmm1
				; AVX512DQ-NEXT: vmovss %xmm0, %xmm1, %xmm1 {%k1}
				; AVX512DQ-NEXT: vmovss %xmm2, %xmm0, %xmm0 {%k1}
				; AVX512DQ-NEXT: vmaxss %xmm1, %xmm0, %xmm0
				; AVX512DQ-NEXT: retq
				;
				; X86-LABEL: test_fmaximum_nnan:
				; X86: # %bb.0:
				; X86-NEXT: pushl %eax
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; X86-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
				; X86-NEXT: vaddss %xmm1, %xmm2, %xmm0
				; X86-NEXT: vsubss %xmm1, %xmm2, %xmm2
				; X86-NEXT: vmovd %xmm0, %eax
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: je .LBB4_1
				; X86-NEXT: # %bb.2:
				; X86-NEXT: vmovaps %xmm2, %xmm1
				; X86-NEXT: jmp .LBB4_3
				; X86-NEXT: .LBB4_1:
				; X86-NEXT: vmovaps %xmm0, %xmm1
				; X86-NEXT: vmovaps %xmm2, %xmm0
				; X86-NEXT: .LBB4_3:
				; X86-NEXT: vmaxss %xmm1, %xmm0, %xmm0
				; X86-NEXT: vmovss %xmm0, (%esp)
				; X86-NEXT: flds (%esp)
				; X86-NEXT: popl %eax
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = fadd nnan float %x, %y
				%2 = fsub nnan float %x, %y
				%3 = tail call float @llvm.maximum.f32(float %1, float %2)
				ret float %3
				}

				define double @test_fmaximum_zero0(double %x, double %y) {
				; SSE2-LABEL: test_fmaximum_zero0:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movapd %xmm1, %xmm0
				; SSE2-NEXT: cmpunordsd %xmm1, %xmm0
				; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
				; SSE2-NEXT: andpd %xmm0, %xmm2
				; SSE2-NEXT: xorpd %xmm3, %xmm3
				; SSE2-NEXT: maxsd %xmm3, %xmm1
				; SSE2-NEXT: andnpd %xmm1, %xmm0
				; SSE2-NEXT: orpd %xmm2, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fmaximum_zero0:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vxorpd %xmm0, %xmm0, %xmm0
				; AVX1-NEXT: vmaxsd %xmm0, %xmm1, %xmm0
				; AVX1-NEXT: vcmpunordsd %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fmaximum_zero0:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vxorpd %xmm0, %xmm0, %xmm0
				; AVX512-NEXT: vmaxsd %xmm0, %xmm1, %xmm0
				; AVX512-NEXT: vcmpunordsd %xmm1, %xmm1, %k1
				; AVX512-NEXT: vmovsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k1}
				; AVX512-NEXT: retq
				;
				; X86-LABEL: test_fmaximum_zero0:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %ebp, -8
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: .cfi_def_cfa_register %ebp
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
				; X86-NEXT: vxorpd %xmm1, %xmm1, %xmm1
				; X86-NEXT: vmaxsd %xmm1, %xmm0, %xmm1
				; X86-NEXT: vcmpunordsd %xmm0, %xmm0, %xmm0
				; X86-NEXT: vblendvpd %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1, %xmm0
				; X86-NEXT: vmovlpd %xmm0, (%esp)
				; X86-NEXT: fldl (%esp)
				; X86-NEXT: movl %ebp, %esp
				; X86-NEXT: popl %ebp
				; X86-NEXT: .cfi_def_cfa %esp, 4
				; X86-NEXT: retl
				%1 = tail call double @llvm.maximum.f64(double 0.0, double %y)
				ret double %1
				}

				define double @test_fmaximum_zero1(double %x, double %y) {
				; SSE2-LABEL: test_fmaximum_zero1:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movapd %xmm0, %xmm1
				; SSE2-NEXT: cmpunordsd %xmm0, %xmm1
				; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
				; SSE2-NEXT: andpd %xmm1, %xmm2
				; SSE2-NEXT: xorpd %xmm3, %xmm3
				; SSE2-NEXT: maxsd %xmm3, %xmm0
				; SSE2-NEXT: andnpd %xmm0, %xmm1
				; SSE2-NEXT: orpd %xmm2, %xmm1
				; SSE2-NEXT: movapd %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fmaximum_zero1:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vxorpd %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vmaxsd %xmm1, %xmm0, %xmm1
				; AVX1-NEXT: vcmpunordsd %xmm0, %xmm0, %xmm0
				; AVX1-NEXT: vblendvpd %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fmaximum_zero1:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vxorpd %xmm1, %xmm1, %xmm1
				; AVX512-NEXT: vmaxsd %xmm1, %xmm0, %xmm1
				; AVX512-NEXT: vcmpunordsd %xmm0, %xmm0, %k1
				; AVX512-NEXT: vmovsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1 {%k1}
				; AVX512-NEXT: vmovapd %xmm1, %xmm0
				; AVX512-NEXT: retq
				;
				; X86-LABEL: test_fmaximum_zero1:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %ebp, -8
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: .cfi_def_cfa_register %ebp
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
				; X86-NEXT: vxorpd %xmm1, %xmm1, %xmm1
				; X86-NEXT: vmaxsd %xmm1, %xmm0, %xmm1
				; X86-NEXT: vcmpunordsd %xmm0, %xmm0, %xmm0
				; X86-NEXT: vblendvpd %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1, %xmm0
				; X86-NEXT: vmovlpd %xmm0, (%esp)
				; X86-NEXT: fldl (%esp)
				; X86-NEXT: movl %ebp, %esp
				; X86-NEXT: popl %ebp
				; X86-NEXT: .cfi_def_cfa %esp, 4
				; X86-NEXT: retl
				%1 = tail call double @llvm.maximum.f64(double %x, double 0.0)
				ret double %1
				}

				define double @test_fmaximum_zero2(double %x, double %y) {
				; SSE2-LABEL: test_fmaximum_zero2:
				; SSE2: # %bb.0:
				; SSE2-NEXT: xorps %xmm0, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fmaximum_zero2:
				; AVX: # %bb.0:
				; AVX-NEXT: vxorps %xmm0, %xmm0, %xmm0
				; AVX-NEXT: retq
				;
				; X86-LABEL: test_fmaximum_zero2:
				; X86: # %bb.0:
				; X86-NEXT: fldz
				; X86-NEXT: retl
				%1 = tail call double @llvm.maximum.f64(double 0.0, double -0.0)
				ret double %1
				}

				define float @test_fmaximum_nsz(float %x, float %y) "no-signed-zeros-fp-math"="true" {
				; SSE2-LABEL: test_fmaximum_nsz:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movaps %xmm0, %xmm2
				; SSE2-NEXT: maxss %xmm1, %xmm2
				; SSE2-NEXT: cmpunordss %xmm1, %xmm0
				; SSE2-NEXT: movaps %xmm0, %xmm1
				; SSE2-NEXT: andnps %xmm2, %xmm1
				; SSE2-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
				; SSE2-NEXT: andps %xmm2, %xmm0
				; SSE2-NEXT: orps %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fmaximum_nsz:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vcmpunordss %xmm1, %xmm0, %xmm2
				; AVX1-NEXT: vmaxss %xmm1, %xmm0, %xmm0
				; AVX1-NEXT: vblendvps %xmm2, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fmaximum_nsz:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vcmpunordss %xmm1, %xmm0, %k1
				; AVX512-NEXT: vmaxss %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: vmovss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k1}
				; AVX512-NEXT: retq
				;
				; X86-LABEL: test_fmaximum_nsz:
				; X86: # %bb.0:
				; X86-NEXT: pushl %eax
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; X86-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; X86-NEXT: vcmpunordss %xmm0, %xmm1, %xmm2
				; X86-NEXT: vmaxss %xmm0, %xmm1, %xmm0
				; X86-NEXT: vblendvps %xmm2, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
				; X86-NEXT: vmovss %xmm0, (%esp)
				; X86-NEXT: flds (%esp)
				; X86-NEXT: popl %eax
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = tail call float @llvm.maximum.f32(float %x, float %y)
				ret float %1
				}

				define float @test_fmaximum_combine_cmps(float %x, float %y) {
				; SSE2-LABEL: test_fmaximum_combine_cmps:
				; SSE2: # %bb.0:
				; SSE2-NEXT: divss %xmm0, %xmm1
				; SSE2-NEXT: movd %xmm0, %eax
				; SSE2-NEXT: testl %eax, %eax
				; SSE2-NEXT: je .LBB9_1
				; SSE2-NEXT: # %bb.2:
				; SSE2-NEXT: movaps %xmm1, %xmm2
				; SSE2-NEXT: movaps %xmm0, %xmm1
				; SSE2-NEXT: jmp .LBB9_3
				; SSE2-NEXT: .LBB9_1:
				; SSE2-NEXT: movaps %xmm0, %xmm2
				; SSE2-NEXT: .LBB9_3:
				; SSE2-NEXT: maxss %xmm2, %xmm1
				; SSE2-NEXT: cmpunordss %xmm0, %xmm0
				; SSE2-NEXT: movaps %xmm0, %xmm2
				; SSE2-NEXT: andnps %xmm1, %xmm2
				; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; SSE2-NEXT: andps %xmm0, %xmm1
				; SSE2-NEXT: orps %xmm2, %xmm1
				; SSE2-NEXT: movaps %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fmaximum_combine_cmps:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vdivss %xmm0, %xmm1, %xmm1
				; AVX1-NEXT: vmovd %xmm0, %eax
				; AVX1-NEXT: testl %eax, %eax
				; AVX1-NEXT: je .LBB9_1
				; AVX1-NEXT: # %bb.2:
				; AVX1-NEXT: vmovaps %xmm1, %xmm2
				; AVX1-NEXT: vmovaps %xmm0, %xmm1
				; AVX1-NEXT: jmp .LBB9_3
				; AVX1-NEXT: .LBB9_1:
				; AVX1-NEXT: vmovaps %xmm0, %xmm2
				; AVX1-NEXT: .LBB9_3:
				; AVX1-NEXT: vmaxss %xmm2, %xmm1, %xmm1
				; AVX1-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0
				; AVX1-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512F-LABEL: test_fmaximum_combine_cmps:
				; AVX512F: # %bb.0:
				; AVX512F-NEXT: vdivss %xmm0, %xmm1, %xmm1
				; AVX512F-NEXT: vmovd %xmm0, %eax
				; AVX512F-NEXT: testl %eax, %eax
				; AVX512F-NEXT: sete %al
				; AVX512F-NEXT: kmovw %eax, %k1
				; AVX512F-NEXT: vmovaps %xmm1, %xmm2
				; AVX512F-NEXT: vmovss %xmm0, %xmm2, %xmm2 {%k1}
				; AVX512F-NEXT: vcmpunordss %xmm0, %xmm0, %k2
				; AVX512F-NEXT: vmovss %xmm1, %xmm0, %xmm0 {%k1}
				; AVX512F-NEXT: vmaxss %xmm2, %xmm0, %xmm0
				; AVX512F-NEXT: vmovss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k2}
				; AVX512F-NEXT: retq
				;
				; AVX512DQ-LABEL: test_fmaximum_combine_cmps:
				; AVX512DQ: # %bb.0:
				; AVX512DQ-NEXT: vdivss %xmm0, %xmm1, %xmm1
				; AVX512DQ-NEXT: vfpclassss $3, %xmm0, %k0
				; AVX512DQ-NEXT: kmovw %k0, %k1
				; AVX512DQ-NEXT: vmovaps %xmm1, %xmm2
				; AVX512DQ-NEXT: vmovss %xmm0, %xmm2, %xmm2 {%k1}
				; AVX512DQ-NEXT: vmovss %xmm1, %xmm0, %xmm0 {%k1}
				; AVX512DQ-NEXT: vmaxss %xmm2, %xmm0, %xmm0
				; AVX512DQ-NEXT: retq
				;
				; X86-LABEL: test_fmaximum_combine_cmps:
				; X86: # %bb.0:
				; X86-NEXT: pushl %eax
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; X86-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; X86-NEXT: vdivss %xmm0, %xmm1, %xmm1
				; X86-NEXT: vmovd %xmm0, %eax
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: je .LBB9_1
				; X86-NEXT: # %bb.2:
				; X86-NEXT: vmovaps %xmm1, %xmm2
				; X86-NEXT: vmovaps %xmm0, %xmm1
				; X86-NEXT: jmp .LBB9_3
				; X86-NEXT: .LBB9_1:
				; X86-NEXT: vmovaps %xmm0, %xmm2
				; X86-NEXT: .LBB9_3:
				; X86-NEXT: vmaxss %xmm2, %xmm1, %xmm1
				; X86-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0
				; X86-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1, %xmm0
				; X86-NEXT: vmovss %xmm0, (%esp)
				; X86-NEXT: flds (%esp)
				; X86-NEXT: popl %eax
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = fdiv nnan float %y, %x
				%2 = tail call float @llvm.maximum.f32(float %x, float %1)
				ret float %2
				}

				;
				; fminimum
				;

				define float @test_fminimum(float %x, float %y) {
				; SSE2-LABEL: test_fminimum:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movd %xmm0, %eax
				; SSE2-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; SSE2-NEXT: movdqa %xmm0, %xmm3
				; SSE2-NEXT: movdqa %xmm1, %xmm2
				; SSE2-NEXT: je .LBB10_2
				; SSE2-NEXT: # %bb.1:
				; SSE2-NEXT: movdqa %xmm1, %xmm3
				; SSE2-NEXT: movdqa %xmm0, %xmm2
				; SSE2-NEXT: .LBB10_2:
				; SSE2-NEXT: minss %xmm3, %xmm2
				; SSE2-NEXT: cmpunordss %xmm1, %xmm0
				; SSE2-NEXT: movaps %xmm0, %xmm3
				; SSE2-NEXT: andnps %xmm2, %xmm3
				; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; SSE2-NEXT: andps %xmm0, %xmm1
				; SSE2-NEXT: orps %xmm3, %xmm1
				; SSE2-NEXT: movaps %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fminimum:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vmovd %xmm0, %eax
				; AVX1-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; AVX1-NEXT: vmovdqa %xmm0, %xmm2
				; AVX1-NEXT: vmovdqa %xmm1, %xmm3
				; AVX1-NEXT: je .LBB10_2
				; AVX1-NEXT: # %bb.1:
				; AVX1-NEXT: vmovdqa %xmm1, %xmm2
				; AVX1-NEXT: vmovdqa %xmm0, %xmm3
				; AVX1-NEXT: .LBB10_2:
				; AVX1-NEXT: vminss %xmm2, %xmm3, %xmm2
				; AVX1-NEXT: vcmpunordss %xmm1, %xmm0, %xmm0
				; AVX1-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fminimum:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vmovd %xmm0, %eax
				; AVX512-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; AVX512-NEXT: sete %al
				; AVX512-NEXT: kmovw %eax, %k1
				; AVX512-NEXT: vmovdqa %xmm0, %xmm2
				; AVX512-NEXT: vmovss %xmm1, %xmm2, %xmm2 {%k1}
				; AVX512-NEXT: vcmpunordss %xmm1, %xmm0, %k2
				; AVX512-NEXT: vmovss %xmm0, %xmm1, %xmm1 {%k1}
				; AVX512-NEXT: vminss %xmm1, %xmm2, %xmm0
				; AVX512-NEXT: vmovss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k2}
				; AVX512-NEXT: retq
				;
				; X86-LABEL: test_fminimum:
				; X86: # %bb.0:
				; X86-NEXT: pushl %eax
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; X86-NEXT: vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; X86-NEXT: vmovd %xmm1, %eax
				; X86-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; X86-NEXT: vmovdqa %xmm1, %xmm2
				; X86-NEXT: vmovdqa %xmm0, %xmm3
				; X86-NEXT: je .LBB10_2
				; X86-NEXT: # %bb.1:
				; X86-NEXT: vmovdqa %xmm0, %xmm2
				; X86-NEXT: vmovdqa %xmm1, %xmm3
				; X86-NEXT: .LBB10_2:
				; X86-NEXT: vminss %xmm2, %xmm3, %xmm2
				; X86-NEXT: vcmpunordss %xmm0, %xmm1, %xmm0
				; X86-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm2, %xmm0
				; X86-NEXT: vmovss %xmm0, (%esp)
				; X86-NEXT: flds (%esp)
				; X86-NEXT: popl %eax
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = tail call float @llvm.minimum.f32(float %x, float %y)
				ret float %1
				}

				define <2 x double> @test_fminimum_scalarize(<2 x double> %x, <2 x double> %y) "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" {
				; SSE2-LABEL: test_fminimum_scalarize:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movapd %xmm0, %xmm2
				; SSE2-NEXT: minsd %xmm1, %xmm2
				; SSE2-NEXT: unpckhpd {{.*#+}} xmm1 = xmm1[1,1]
				; SSE2-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
				; SSE2-NEXT: minsd %xmm1, %xmm0
				; SSE2-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm0[0]
				; SSE2-NEXT: movapd %xmm2, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fminimum_scalarize:
				; AVX: # %bb.0:
				; AVX-NEXT: vminsd %xmm1, %xmm0, %xmm2
				; AVX-NEXT: vpermilpd {{.*#+}} xmm1 = xmm1[1,0]
				; AVX-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
				; AVX-NEXT: vminsd %xmm1, %xmm0, %xmm0
				; AVX-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm2[0],xmm0[0]
				; AVX-NEXT: retq
				;
				; X86-LABEL: test_fminimum_scalarize:
				; X86: # %bb.0:
				; X86-NEXT: vminsd %xmm1, %xmm0, %xmm2
				; X86-NEXT: vpermilpd {{.*#+}} xmm1 = xmm1[1,0]
				; X86-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
				; X86-NEXT: vminsd %xmm1, %xmm0, %xmm0
				; X86-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm2[0],xmm0[0]
				; X86-NEXT: retl
				%r = call <2 x double> @llvm.minimum.v2f64(<2 x double> %x, <2 x double> %y)
				ret <2 x double> %r
				}

				define float @test_fminimum_nan0(float %x, float %y) {
				; SSE2-LABEL: test_fminimum_nan0:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fminimum_nan0:
				; AVX: # %bb.0:
				; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; AVX-NEXT: retq
				;
				; X86-LABEL: test_fminimum_nan0:
				; X86: # %bb.0:
				; X86-NEXT: flds {{\.?LCPI[0-9]+_[0-9]+}}
				; X86-NEXT: retl
				%1 = tail call float @llvm.minimum.f32(float 0x7fff000000000000, float %y)
				ret float %1
				}

				define float @test_fminimum_nan1(float %x, float %y) {
				; SSE2-LABEL: test_fminimum_nan1:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fminimum_nan1:
				; AVX: # %bb.0:
				; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; AVX-NEXT: retq
				;
				; X86-LABEL: test_fminimum_nan1:
				; X86: # %bb.0:
				; X86-NEXT: flds {{\.?LCPI[0-9]+_[0-9]+}}
				; X86-NEXT: retl
				%1 = tail call float @llvm.minimum.f32(float %x, float 0x7fff000000000000)
				ret float %1
				}

				define double @test_fminimum_nnan(double %x, double %y) "no-nans-fp-math"="true" {
				RKSimonUnsubmitted Done Reply Inline Actions add nounwind attribute to get rid of the .cfi noise RKSimon: add nounwind attribute to get rid of the .cfi noise
				e-kudAuthorUnsubmitted Done Reply Inline Actions I've allowed myself to add `nounwind` in `half.ll` since I've touched it. I think the attribute was missed. e-kud: I've allowed myself to add `nounwind` in `half.ll` since I've touched it. I think the attribute…
				; SSE2-LABEL: test_fminimum_nnan:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movq %xmm0, %rax
				; SSE2-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
				; SSE2-NEXT: cmpq %rcx, %rax
				; SSE2-NEXT: je .LBB14_1
				; SSE2-NEXT: # %bb.2:
				; SSE2-NEXT: minsd %xmm1, %xmm0
				; SSE2-NEXT: retq
				; SSE2-NEXT: .LBB14_1:
				; SSE2-NEXT: movdqa %xmm0, %xmm2
				; SSE2-NEXT: movapd %xmm1, %xmm0
				; SSE2-NEXT: minsd %xmm2, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fminimum_nnan:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
				; AVX1-NEXT: cmpq %rcx, %rax
				; AVX1-NEXT: je .LBB14_1
				; AVX1-NEXT: # %bb.2:
				; AVX1-NEXT: vminsd %xmm1, %xmm0, %xmm0
				; AVX1-NEXT: retq
				; AVX1-NEXT: .LBB14_1:
				; AVX1-NEXT: vmovdqa %xmm0, %xmm2
				; AVX1-NEXT: vminsd %xmm2, %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512F-LABEL: test_fminimum_nnan:
				; AVX512F: # %bb.0:
				; AVX512F-NEXT: vmovq %xmm0, %rax
				; AVX512F-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
				; AVX512F-NEXT: cmpq %rcx, %rax
				; AVX512F-NEXT: sete %al
				; AVX512F-NEXT: kmovw %eax, %k1
				; AVX512F-NEXT: vmovapd %xmm1, %xmm2
				; AVX512F-NEXT: vmovsd %xmm0, %xmm2, %xmm2 {%k1}
				; AVX512F-NEXT: vmovsd %xmm1, %xmm0, %xmm0 {%k1}
				; AVX512F-NEXT: vminsd %xmm2, %xmm0, %xmm0
				; AVX512F-NEXT: retq
				;
				; AVX512DQ-LABEL: test_fminimum_nnan:
				; AVX512DQ: # %bb.0:
				; AVX512DQ-NEXT: vfpclasssd $5, %xmm1, %k0
				; AVX512DQ-NEXT: kmovw %k0, %k1
				; AVX512DQ-NEXT: vmovapd %xmm0, %xmm2
				; AVX512DQ-NEXT: vmovsd %xmm1, %xmm2, %xmm2 {%k1}
				; AVX512DQ-NEXT: vmovsd %xmm0, %xmm1, %xmm1 {%k1}
				; AVX512DQ-NEXT: vminsd %xmm2, %xmm1, %xmm0
				; AVX512DQ-NEXT: retq
				;
				; X86-LABEL: test_fminimum_nnan:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %ebp, -8
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: .cfi_def_cfa_register %ebp
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
				; X86-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
				; X86-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
				; X86-NEXT: vmovd %xmm2, %eax
				; X86-NEXT: vpextrd $1, %xmm2, %ecx
				; X86-NEXT: addl $-2147483648, %ecx # imm = 0x80000000
				; X86-NEXT: orl %eax, %ecx
				; X86-NEXT: je .LBB14_1
				; X86-NEXT: # %bb.2:
				; X86-NEXT: vmovapd %xmm1, %xmm2
				; X86-NEXT: jmp .LBB14_3
				; X86-NEXT: .LBB14_1:
				; X86-NEXT: vmovapd %xmm0, %xmm2
				; X86-NEXT: vmovapd %xmm1, %xmm0
				; X86-NEXT: .LBB14_3:
				; X86-NEXT: vminsd %xmm2, %xmm0, %xmm0
				; X86-NEXT: vmovsd %xmm0, (%esp)
				; X86-NEXT: fldl (%esp)
				; X86-NEXT: movl %ebp, %esp
				; X86-NEXT: popl %ebp
				; X86-NEXT: .cfi_def_cfa %esp, 4
				; X86-NEXT: retl
				%1 = tail call double @llvm.minimum.f64(double %x, double %y)
				ret double %1
				}

				define double @test_fminimum_zero0(double %x, double %y) {
				; SSE2-LABEL: test_fminimum_zero0:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movapd %xmm1, %xmm0
				; SSE2-NEXT: cmpunordsd %xmm1, %xmm0
				; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
				; SSE2-NEXT: andpd %xmm0, %xmm2
				; SSE2-NEXT: minsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
				; SSE2-NEXT: andnpd %xmm1, %xmm0
				; SSE2-NEXT: orpd %xmm2, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fminimum_zero0:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vcmpunordsd %xmm1, %xmm1, %xmm0
				; AVX1-NEXT: vminsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
				; AVX1-NEXT: vblendvpd %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fminimum_zero0:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vcmpunordsd %xmm1, %xmm1, %k1
				; AVX512-NEXT: vminsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
				; AVX512-NEXT: vmovsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k1}
				; AVX512-NEXT: retq
				;
				; X86-LABEL: test_fminimum_zero0:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %ebp, -8
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: .cfi_def_cfa_register %ebp
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
				; X86-NEXT: vcmpunordsd %xmm0, %xmm0, %xmm1
				; X86-NEXT: vminsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
				; X86-NEXT: vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
				; X86-NEXT: vmovlpd %xmm0, (%esp)
				; X86-NEXT: fldl (%esp)
				; X86-NEXT: movl %ebp, %esp
				; X86-NEXT: popl %ebp
				; X86-NEXT: .cfi_def_cfa %esp, 4
				; X86-NEXT: retl
				%1 = tail call double @llvm.minimum.f64(double -0.0, double %y)
				ret double %1
				}

				define double @test_fminimum_zero1(double %x, double %y) {
				; SSE2-LABEL: test_fminimum_zero1:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movapd %xmm0, %xmm1
				; SSE2-NEXT: cmpunordsd %xmm0, %xmm1
				; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
				; SSE2-NEXT: andpd %xmm1, %xmm2
				; SSE2-NEXT: minsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
				; SSE2-NEXT: andnpd %xmm0, %xmm1
				; SSE2-NEXT: orpd %xmm2, %xmm1
				; SSE2-NEXT: movapd %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fminimum_zero1:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vcmpunordsd %xmm0, %xmm0, %xmm1
				; AVX1-NEXT: vminsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fminimum_zero1:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vcmpunordsd %xmm0, %xmm0, %k1
				; AVX512-NEXT: vminsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX512-NEXT: vmovsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k1}
				; AVX512-NEXT: retq
				;
				; X86-LABEL: test_fminimum_zero1:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %ebp, -8
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: .cfi_def_cfa_register %ebp
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
				; X86-NEXT: vcmpunordsd %xmm0, %xmm0, %xmm1
				; X86-NEXT: vminsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
				; X86-NEXT: vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
				; X86-NEXT: vmovlpd %xmm0, (%esp)
				; X86-NEXT: fldl (%esp)
				; X86-NEXT: movl %ebp, %esp
				; X86-NEXT: popl %ebp
				; X86-NEXT: .cfi_def_cfa %esp, 4
				; X86-NEXT: retl
				%1 = tail call double @llvm.minimum.f64(double %x, double -0.0)
				ret double %1
				}

				define double @test_fminimum_zero2(double %x, double %y) {
				; SSE2-LABEL: test_fminimum_zero2:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; SSE2-NEXT: retq
				;
				; AVX-LABEL: test_fminimum_zero2:
				; AVX: # %bb.0:
				; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
				; AVX-NEXT: retq
				;
				; X86-LABEL: test_fminimum_zero2:
				; X86: # %bb.0:
				; X86-NEXT: fldz
				; X86-NEXT: fchs
				; X86-NEXT: retl
				%1 = tail call double @llvm.minimum.f64(double -0.0, double 0.0)
				ret double %1
				}

				define float @test_fminimum_nsz(float %x, float %y) {
				; SSE2-LABEL: test_fminimum_nsz:
				; SSE2: # %bb.0:
				; SSE2-NEXT: movaps %xmm0, %xmm2
				; SSE2-NEXT: minss %xmm1, %xmm2
				; SSE2-NEXT: cmpunordss %xmm1, %xmm0
				; SSE2-NEXT: movaps %xmm0, %xmm1
				; SSE2-NEXT: andnps %xmm2, %xmm1
				; SSE2-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
				; SSE2-NEXT: andps %xmm2, %xmm0
				; SSE2-NEXT: orps %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fminimum_nsz:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vcmpunordss %xmm1, %xmm0, %xmm2
				; AVX1-NEXT: vminss %xmm1, %xmm0, %xmm0
				; AVX1-NEXT: vblendvps %xmm2, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: test_fminimum_nsz:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vcmpunordss %xmm1, %xmm0, %k1
				; AVX512-NEXT: vminss %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: vmovss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k1}
				; AVX512-NEXT: retq
				;
				; X86-LABEL: test_fminimum_nsz:
				; X86: # %bb.0:
				; X86-NEXT: pushl %eax
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; X86-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; X86-NEXT: vcmpunordss %xmm0, %xmm1, %xmm2
				; X86-NEXT: vminss %xmm0, %xmm1, %xmm0
				; X86-NEXT: vblendvps %xmm2, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
				; X86-NEXT: vmovss %xmm0, (%esp)
				; X86-NEXT: flds (%esp)
				; X86-NEXT: popl %eax
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = tail call nsz float @llvm.minimum.f32(float %x, float %y)
				ret float %1
				}

				define float @test_fminimum_combine_cmps(float %x, float %y) {
				; SSE2-LABEL: test_fminimum_combine_cmps:
				RKSimonUnsubmitted Done Reply Inline Actions please can you add vector test coverage to ensure we scalarize? RKSimon: please can you add vector test coverage to ensure we scalarize?
				; SSE2: # %bb.0:
				; SSE2-NEXT: divss %xmm0, %xmm1
				; SSE2-NEXT: movd %xmm0, %eax
				; SSE2-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; SSE2-NEXT: je .LBB19_1
				; SSE2-NEXT: # %bb.2:
				; SSE2-NEXT: movaps %xmm1, %xmm2
				; SSE2-NEXT: movaps %xmm0, %xmm1
				; SSE2-NEXT: jmp .LBB19_3
				; SSE2-NEXT: .LBB19_1:
				; SSE2-NEXT: movaps %xmm0, %xmm2
				; SSE2-NEXT: .LBB19_3:
				; SSE2-NEXT: minss %xmm2, %xmm1
				; SSE2-NEXT: cmpunordss %xmm0, %xmm0
				; SSE2-NEXT: movaps %xmm0, %xmm2
				; SSE2-NEXT: andnps %xmm1, %xmm2
				; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; SSE2-NEXT: andps %xmm0, %xmm1
				; SSE2-NEXT: orps %xmm2, %xmm1
				; SSE2-NEXT: movaps %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX1-LABEL: test_fminimum_combine_cmps:
				; AVX1: # %bb.0:
				; AVX1-NEXT: vdivss %xmm0, %xmm1, %xmm1
				; AVX1-NEXT: vmovd %xmm0, %eax
				; AVX1-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; AVX1-NEXT: je .LBB19_1
				; AVX1-NEXT: # %bb.2:
				; AVX1-NEXT: vmovaps %xmm1, %xmm2
				; AVX1-NEXT: vmovaps %xmm0, %xmm1
				; AVX1-NEXT: jmp .LBB19_3
				; AVX1-NEXT: .LBB19_1:
				; AVX1-NEXT: vmovaps %xmm0, %xmm2
				; AVX1-NEXT: .LBB19_3:
				; AVX1-NEXT: vminss %xmm2, %xmm1, %xmm1
				; AVX1-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0
				; AVX1-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512F-LABEL: test_fminimum_combine_cmps:
				; AVX512F: # %bb.0:
				; AVX512F-NEXT: vdivss %xmm0, %xmm1, %xmm1
				; AVX512F-NEXT: vmovd %xmm0, %eax
				; AVX512F-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; AVX512F-NEXT: sete %al
				; AVX512F-NEXT: kmovw %eax, %k1
				; AVX512F-NEXT: vmovaps %xmm1, %xmm2
				; AVX512F-NEXT: vmovss %xmm0, %xmm2, %xmm2 {%k1}
				; AVX512F-NEXT: vcmpunordss %xmm0, %xmm0, %k2
				; AVX512F-NEXT: vmovss %xmm1, %xmm0, %xmm0 {%k1}
				; AVX512F-NEXT: vminss %xmm2, %xmm0, %xmm0
				; AVX512F-NEXT: vmovss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 {%k2}
				; AVX512F-NEXT: retq
				;
				; AVX512DQ-LABEL: test_fminimum_combine_cmps:
				; AVX512DQ: # %bb.0:
				; AVX512DQ-NEXT: vdivss %xmm0, %xmm1, %xmm1
				; AVX512DQ-NEXT: vfpclassss $5, %xmm0, %k0
				; AVX512DQ-NEXT: kmovw %k0, %k1
				; AVX512DQ-NEXT: vmovaps %xmm1, %xmm2
				; AVX512DQ-NEXT: vmovss %xmm0, %xmm2, %xmm2 {%k1}
				; AVX512DQ-NEXT: vmovss %xmm1, %xmm0, %xmm0 {%k1}
				; AVX512DQ-NEXT: vminss %xmm2, %xmm0, %xmm0
				; AVX512DQ-NEXT: retq
				;
				; X86-LABEL: test_fminimum_combine_cmps:
				; X86: # %bb.0:
				; X86-NEXT: pushl %eax
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; X86-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; X86-NEXT: vdivss %xmm0, %xmm1, %xmm1
				; X86-NEXT: vmovd %xmm0, %eax
				; X86-NEXT: cmpl $-2147483648, %eax # imm = 0x80000000
				; X86-NEXT: je .LBB19_1
				; X86-NEXT: # %bb.2:
				; X86-NEXT: vmovaps %xmm1, %xmm2
				; X86-NEXT: vmovaps %xmm0, %xmm1
				; X86-NEXT: jmp .LBB19_3
				; X86-NEXT: .LBB19_1:
				; X86-NEXT: vmovaps %xmm0, %xmm2
				; X86-NEXT: .LBB19_3:
				; X86-NEXT: vminss %xmm2, %xmm1, %xmm1
				; X86-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0
				; X86-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1, %xmm0
				; X86-NEXT: vmovss %xmm0, (%esp)
				; X86-NEXT: flds (%esp)
				; X86-NEXT: popl %eax
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				%1 = fdiv nnan float %y, %x
				%2 = tail call float @llvm.minimum.f32(float %x, float %1)
				ret float %2
				}

llvm/test/CodeGen/X86/half.ll

Show First 20 Lines • Show All 948 Lines • ▼ Show 20 Lines
; CHECK-I686-NEXT: pinsrw $0, %eax, %xmm0		; CHECK-I686-NEXT: pinsrw $0, %eax, %xmm0
; CHECK-I686-NEXT: addl $12, %esp		; CHECK-I686-NEXT: addl $12, %esp
; CHECK-I686-NEXT: retl		; CHECK-I686-NEXT: retl
%2 = fcmp une half %0, 0xH0000		%2 = fcmp une half %0, 0xH0000
%3 = uitofp i1 %2 to half		%3 = uitofp i1 %2 to half
ret half %3		ret half %3
}		}

define dso_local void @brcond(half %0) {		define dso_local void @brcond(half %0) {
		RKSimonUnsubmitted Done Reply Inline Actions pre-commit the nounwind change to keep it separate from this patch - you shouldn't really need dso_local either RKSimon: pre-commit the nounwind change to keep it separate from this patch - you shouldn't really need…
		e-kudAuthorUnsubmitted Done Reply Inline Actions I haven't got commit access yet, here it is https://reviews.llvm.org/D149114 e-kud: I haven't got commit access yet, here it is https://reviews.llvm.org/D149114
; CHECK-LIBCALL-LABEL: brcond:		; CHECK-LIBCALL-LABEL: brcond:
; CHECK-LIBCALL: # %bb.0: # %entry		; CHECK-LIBCALL: # %bb.0: # %entry
; CHECK-LIBCALL-NEXT: pushq %rax		; CHECK-LIBCALL-NEXT: pushq %rax
; CHECK-LIBCALL-NEXT: .cfi_def_cfa_offset 16		; CHECK-LIBCALL-NEXT: .cfi_def_cfa_offset 16
; CHECK-LIBCALL-NEXT: callq __extendhfsf2@PLT		; CHECK-LIBCALL-NEXT: callq __extendhfsf2@PLT
; CHECK-LIBCALL-NEXT: xorps %xmm1, %xmm1		; CHECK-LIBCALL-NEXT: xorps %xmm1, %xmm1
; CHECK-LIBCALL-NEXT: ucomiss %xmm1, %xmm0		; CHECK-LIBCALL-NEXT: ucomiss %xmm1, %xmm0
; CHECK-LIBCALL-NEXT: setp %al		; CHECK-LIBCALL-NEXT: setp %al
▲ Show 20 Lines • Show All 389 Lines • ▼ Show 20 Lines	; CHECK-I686-NEXT: retl
ret <8 x half> %2		ret <8 x half> %2
}		}

declare half @llvm.minnum.f16(half, half)		declare half @llvm.minnum.f16(half, half)

define half @pr61271(half %0, half %1) #0 {		define half @pr61271(half %0, half %1) #0 {
; CHECK-LIBCALL-LABEL: pr61271:		; CHECK-LIBCALL-LABEL: pr61271:
; CHECK-LIBCALL: # %bb.0:		; CHECK-LIBCALL: # %bb.0:
; CHECK-LIBCALL-NEXT: subq $40, %rsp		; CHECK-LIBCALL-NEXT: pushq %rax
; CHECK-LIBCALL-NEXT: movss %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill		; CHECK-LIBCALL-NEXT: movss %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; CHECK-LIBCALL-NEXT: movaps %xmm1, %xmm0		; CHECK-LIBCALL-NEXT: movaps %xmm1, %xmm0
; CHECK-LIBCALL-NEXT: callq __extendhfsf2@PLT		; CHECK-LIBCALL-NEXT: callq __extendhfsf2@PLT
; CHECK-LIBCALL-NEXT: movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill		; CHECK-LIBCALL-NEXT: movss %xmm0, (%rsp) # 4-byte Spill
; CHECK-LIBCALL-NEXT: movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload		; CHECK-LIBCALL-NEXT: movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
; CHECK-LIBCALL-NEXT: # xmm0 = mem[0],zero,zero,zero		; CHECK-LIBCALL-NEXT: # xmm0 = mem[0],zero,zero,zero
; CHECK-LIBCALL-NEXT: callq __extendhfsf2@PLT		; CHECK-LIBCALL-NEXT: callq __extendhfsf2@PLT
; CHECK-LIBCALL-NEXT: movaps %xmm0, %xmm1		; CHECK-LIBCALL-NEXT: minss (%rsp), %xmm0 # 4-byte Folded Reload
; CHECK-LIBCALL-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm2 # 16-byte Reload
; CHECK-LIBCALL-NEXT: cmpltss %xmm2, %xmm1
; CHECK-LIBCALL-NEXT: andps %xmm1, %xmm0
; CHECK-LIBCALL-NEXT: andnps %xmm2, %xmm1
; CHECK-LIBCALL-NEXT: orps %xmm1, %xmm0
; CHECK-LIBCALL-NEXT: callq __truncsfhf2@PLT		; CHECK-LIBCALL-NEXT: callq __truncsfhf2@PLT
; CHECK-LIBCALL-NEXT: addq $40, %rsp		; CHECK-LIBCALL-NEXT: popq %rax
; CHECK-LIBCALL-NEXT: retq		; CHECK-LIBCALL-NEXT: retq
;		;
; BWON-F16C-LABEL: pr61271:		; BWON-F16C-LABEL: pr61271:
; BWON-F16C: # %bb.0:		; BWON-F16C: # %bb.0:
; BWON-F16C-NEXT: vpextrw $0, %xmm0, %eax		; BWON-F16C-NEXT: vpextrw $0, %xmm0, %eax
; BWON-F16C-NEXT: vpextrw $0, %xmm1, %ecx		; BWON-F16C-NEXT: vpextrw $0, %xmm1, %ecx
; BWON-F16C-NEXT: movzwl %cx, %ecx		; BWON-F16C-NEXT: movzwl %cx, %ecx
; BWON-F16C-NEXT: vmovd %ecx, %xmm0		; BWON-F16C-NEXT: vmovd %ecx, %xmm0
; BWON-F16C-NEXT: vcvtph2ps %xmm0, %xmm0		; BWON-F16C-NEXT: vcvtph2ps %xmm0, %xmm0
; BWON-F16C-NEXT: movzwl %ax, %eax		; BWON-F16C-NEXT: movzwl %ax, %eax
; BWON-F16C-NEXT: vmovd %eax, %xmm1		; BWON-F16C-NEXT: vmovd %eax, %xmm1
; BWON-F16C-NEXT: vcvtph2ps %xmm1, %xmm1		; BWON-F16C-NEXT: vcvtph2ps %xmm1, %xmm1
; BWON-F16C-NEXT: vcmpltss %xmm0, %xmm1, %xmm2		; BWON-F16C-NEXT: vminss %xmm0, %xmm1, %xmm0
; BWON-F16C-NEXT: vblendvps %xmm2, %xmm1, %xmm0, %xmm0
; BWON-F16C-NEXT: vcvtps2ph $4, %xmm0, %xmm0		; BWON-F16C-NEXT: vcvtps2ph $4, %xmm0, %xmm0
; BWON-F16C-NEXT: vmovd %xmm0, %eax		; BWON-F16C-NEXT: vmovd %xmm0, %eax
; BWON-F16C-NEXT: vpinsrw $0, %eax, %xmm0, %xmm0		; BWON-F16C-NEXT: vpinsrw $0, %eax, %xmm0, %xmm0
; BWON-F16C-NEXT: retq		; BWON-F16C-NEXT: retq
;		;
; CHECK-I686-LABEL: pr61271:		; CHECK-I686-LABEL: pr61271:
; CHECK-I686: # %bb.0:		; CHECK-I686: # %bb.0:
; CHECK-I686-NEXT: subl $44, %esp		; CHECK-I686-NEXT: subl $44, %esp
; CHECK-I686-NEXT: pinsrw $0, {{[0-9]+}}(%esp), %xmm0		; CHECK-I686-NEXT: pinsrw $0, {{[0-9]+}}(%esp), %xmm0
; CHECK-I686-NEXT: movdqa %xmm0, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill		; CHECK-I686-NEXT: movdqa %xmm0, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
; CHECK-I686-NEXT: pinsrw $0, {{[0-9]+}}(%esp), %xmm0		; CHECK-I686-NEXT: pinsrw $0, {{[0-9]+}}(%esp), %xmm0
; CHECK-I686-NEXT: pextrw $0, %xmm0, %eax		; CHECK-I686-NEXT: pextrw $0, %xmm0, %eax
; CHECK-I686-NEXT: movw %ax, (%esp)		; CHECK-I686-NEXT: movw %ax, (%esp)
; CHECK-I686-NEXT: calll __extendhfsf2		; CHECK-I686-NEXT: calll __extendhfsf2
; CHECK-I686-NEXT: movdqa {{[-0-9]+}}(%e{{[sb]}}p), %xmm0 # 16-byte Reload		; CHECK-I686-NEXT: movdqa {{[-0-9]+}}(%e{{[sb]}}p), %xmm0 # 16-byte Reload
; CHECK-I686-NEXT: pextrw $0, %xmm0, %eax		; CHECK-I686-NEXT: pextrw $0, %xmm0, %eax
; CHECK-I686-NEXT: movw %ax, (%esp)		; CHECK-I686-NEXT: movw %ax, (%esp)
; CHECK-I686-NEXT: fstps {{[0-9]+}}(%esp)		; CHECK-I686-NEXT: fstps {{[0-9]+}}(%esp)
; CHECK-I686-NEXT: calll __extendhfsf2		; CHECK-I686-NEXT: calll __extendhfsf2
; CHECK-I686-NEXT: fstps {{[0-9]+}}(%esp)		; CHECK-I686-NEXT: fstps {{[0-9]+}}(%esp)
; CHECK-I686-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero		; CHECK-I686-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; CHECK-I686-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; CHECK-I686-NEXT: minss {{[0-9]+}}(%esp), %xmm0
; CHECK-I686-NEXT: movaps %xmm1, %xmm2		; CHECK-I686-NEXT: movss %xmm0, (%esp)
; CHECK-I686-NEXT: cmpltss %xmm0, %xmm2
; CHECK-I686-NEXT: andps %xmm2, %xmm1
; CHECK-I686-NEXT: andnps %xmm0, %xmm2
; CHECK-I686-NEXT: orps %xmm1, %xmm2
; CHECK-I686-NEXT: movss %xmm2, (%esp)
; CHECK-I686-NEXT: calll __truncsfhf2		; CHECK-I686-NEXT: calll __truncsfhf2
; CHECK-I686-NEXT: addl $44, %esp		; CHECK-I686-NEXT: addl $44, %esp
; CHECK-I686-NEXT: retl		; CHECK-I686-NEXT: retl
%3 = call fast half @llvm.minnum.f16(half %0, half %1)		%3 = call fast half @llvm.minnum.f16(half %0, half %1)
ret half %3		ret half %3
}		}

declare <8 x half> @llvm.maxnum.v8f16(<8 x half>, <8 x half>)		declare <8 x half> @llvm.maxnum.v8f16(<8 x half>, <8 x half>)
▲ Show 20 Lines • Show All 726 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Support llvm.{min,max}imum.f{16,32,64}ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 513836

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll

llvm/test/CodeGen/X86/avx512fp16-fminimum-fmaximum.ll

llvm/test/CodeGen/X86/extract-fp.ll

llvm/test/CodeGen/X86/extractelement-fp.ll

llvm/test/CodeGen/X86/fminimum-fmaximum.ll

llvm/test/CodeGen/X86/half.ll

[X86] Support llvm.{min,max}imum.f{16,32,64}
ClosedPublic