This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
LegalizeDAG.cpp
-
test/CodeGen/
-
CodeGen/
-
ARM/
-
fminmax-folds.ll
-
NVPTX/
1
fminimum-fmaximum.ll

Differential D137655

Expand fminimum/fmaximum into fminnum/fmaxnum + NaN check
Needs ReviewPublic

Authored by gflegar on Nov 8 2022, 9:51 AM.

Download Raw Diff

Details

Reviewers

csigg
efriedma

Summary

We do not have an instruction for this in PTX prior to SM 8.0, so we are
expanding it. However, there is no expansion defined for this op in LLVM, so
define a custom expansion for the NVPTX backend instead (the same thing does
not really work on LLVM level due to fminnum/fmaxnum semantics for
-0.0 / +0.0).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gflegar created this revision.Nov 8 2022, 9:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 9:51 AM

Herald added subscribers: mattd, gchakrabarti, asavonic, hiraditya. · View Herald Transcript

gflegar requested review of this revision.Nov 8 2022, 9:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 9:51 AM

Herald added subscribers: llvm-commits, jholewinski. · View Herald Transcript

gflegar added a reviewer: csigg.Nov 8 2022, 9:53 AM

Doesn't this handle signed zero incorrectly?

Harbormaster completed remote builds in B196738: Diff 474036.Nov 8 2022, 10:36 AM

In D137655#3915463, @nikic wrote:

Doesn't this handle signed zero incorrectly?

I believe it is:

For FMINIMUM:

Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 ULT Tmp2 == True   =>  Tmp3 = Tmp1 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0
Tmp1 = +0.0, Tmp2 = -0.0  =>  Tmp1 ULT Tmp2 == False  =>  Tmp3 = Tmp2 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0

For FMAXIMUM:

Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 UGT Tmp2 == False  =>  Tmp3 = Tmp2 = +0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = +0.0
Tmp1 = +0.0, Tmp2 = -0.0  =>  Tmp1 UGT Tmp2 == True   =>  Tmp3 = Tmp1 = +0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = +0.0

In D137655#3916906, @gflegar wrote:
In D137655#3915463, @nikic wrote:

Doesn't this handle signed zero incorrectly?

I believe it is:

For FMINIMUM:
Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 ULT Tmp2 == True   =>  Tmp3 = Tmp1 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0

Isn't -0.0 ULT 0.0 false, because negative zero and positive zero are equal?

In D137655#3916933, @nikic wrote:
In D137655#3916906, @gflegar wrote:
In D137655#3915463, @nikic wrote:

Doesn't this handle signed zero incorrectly?

I believe it is:

For FMINIMUM:
Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 ULT Tmp2 == True   =>  Tmp3 = Tmp1 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0
Isn't -0.0 ULT 0.0 false, because negative zero and positive zero are equal?

Right, I checked the standard now. For comparisons they're considered equal, but for maximum / minimum there's a special exception that they're considered as -0.0 < 0.0. Why would anyone define this so inconsistently ... -.-
The problem for us here is performance. We would need more instructions to implement it correctly via comparisons (2 to compare + select, at least 2 for 0 handling, at least 2 for NaNs). I'm mostly concerned with the PTX backend, and for it the correct and efficient way to expand this would be to use minnum (builtin instruction), and then a NaN check. If that is true, we generate our own NaN constant (if I'm reading the standard correctly it just requires us to return a quiet NaN, which doesn't have to be the same NaN as any of the operands).
I'll try doing that instead.

Change lowering to minnum/maxnum + NaN check

Also update the failing arm test to pass.
I'm not an expert on ARM, but looking at ARMISelLowering.cpp, it does
specify exactly under which conditions minimum / maximum instructions
are available. Thus, they also likely had the same silent failure
that the PTX side had, and the test was likely wrong (since the checks
were auto-generated).

@nikic 0 handling should be fixed now

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

nikic added a reviewer: efriedma.Nov 9 2022, 7:35 AM

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Unfortunately, I don't have the bandwidth to chase this rabbit hole further, especially since our use case is insensitive to what happens for -0 and +0. I can add a TODO comment to fix this. Though I would still argue for submitting this, as it is correct modulo -0/+0, which is far preferable to the current state where we have a silent failure (and produce invalid code) for the backends that attempt to expand the op (like in NVPTX).

Harbormaster completed remote builds in B196902: Diff 474267.Nov 9 2022, 8:57 AM

In D137655#3917468, @gflegar wrote:

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Unfortunately, I don't have the bandwidth to chase this rabbit hole further, especially since our use case is insensitive to what happens for -0 and +0. I can add a TODO comment to fix this. Though I would still argue for submitting this, as it is correct modulo -0/+0, which is far preferable to the current state where we have a silent failure (and produce invalid code) for the backends that attempt to expand the op (like in NVPTX).

That sounds like an unrelated bug. If an operation is Expand, but we don't support expanding it, shouldn't that result in an isel failure?

In D137655#3917621, @nikic wrote:

In D137655#3917468, @gflegar wrote:

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Unfortunately, I don't have the bandwidth to chase this rabbit hole further, especially since our use case is insensitive to what happens for -0 and +0. I can add a TODO comment to fix this. Though I would still argue for submitting this, as it is correct modulo -0/+0, which is far preferable to the current state where we have a silent failure (and produce invalid code) for the backends that attempt to expand the op (like in NVPTX).

That sounds like an unrelated bug. If an operation is Expand, but we don't support expanding it, shouldn't that result in an isel failure?

Yes, there is an orthogonal bug where this is a silent, and not a real failure. However, even if that bug is fixed, we would still fail (just earlier), which is fixed by this change.

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

Limit expansion only to NVPTX backend

There it ends up handling all the cases correctly, due to the expanded
semantics of the min/max PTX instructions for +/-0.0.

I think this is the best we can do short of adding a new op. We only do the expansion for the NVPTX backed, and don't support it otherwise. In the NVPTX backed, the intermediate code still ends up being semantically incorrect for +/-0.0, but since FMINNUM/FMAXNUM lower to PTX min/max, which do implement the 2018 semantics of those ops, the final PTX ends up being correct.

Once we have an op in LLVM that represents the 2018 semantics, we can lower it to that instead, to make the intermediate code semantically correct as well.

gflegar retitled this revision from Expand fminimum and fmaximum into a pair of selects to Expand fminimum/fmaximum into fminnum/fmaxnum + NaN check.Nov 10 2022, 6:49 AM

gflegar edited the summary of this revision. (Show Details)

gflegar added a child revision: D137786: Lower arith.min/max to llvm.intr.minimum/maximum.Nov 10 2022, 7:03 AM

Harbormaster completed remote builds in B197074: Diff 474535.Nov 10 2022, 7:47 AM

In D137655#3918964, @gflegar wrote:

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

It's not OK to have wrong intermediate code. We do have the "new" semantic opcodes already in FMINIMUM/FMAXIMUM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
2227–2228 ↗	(On Diff #474535)	Should go through APFloat
2227–2228 ↗	(On Diff #474535)	Can also move this to generic code and check which of the variants are legal

Formatting fixes

In D137655#3919674, @arsenm wrote:

In D137655#3918964, @gflegar wrote:

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

It's not OK to have wrong intermediate code. We do have the "new" semantic opcodes already in FMINIMUM/FMAXIMUM

It's even less OK to fail altogether (which is what is happening without this patch). And we're not talking about the new semantics for FMINIMUM/FMAXIMUM, but for the new semantics of FMINNUM/FMAXNUM (the +/-0 handling changed).

Harbormaster completed remote builds in B197094: Diff 474565.Nov 10 2022, 10:01 AM

We do not have an instruction for this in PTX prior to SM 8.0,

I assume that we're talking about min.nan.*/man.nan.* instruction variants that appeared in PTX7.0 on sm80+.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-max
docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-max

Looks like we do not properly constrain instruction availability in llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td#L940
There are no predicates on sm80+ or ptx70 and we sort of rely on custom lowering, and not always correctly as things stand:
https://godbolt.org/z/G8vYb5ajT

This patch should help generating correct instructions for fp64.

Still, I think we need to fix instruction definitions to correctly reflect their availability.

On a side note, we may want to add some min/max correctness tests to CUDA tests in llvm test-suite. Considering that we have different lowering on different GPUs, we do want to make sure that we actually do consistently get the results we expect across different GPUs and CUDA versions. We currently do not have any sm80 GPUs on cuda buildbots, but we'll get them eventually.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
615 ↗	(On Diff #474565)	I think this should be refactored into a more generic `GetGPUAction(sm, ptx, ifAvailableAction, FallbackAction)`. This would make it clear which actions we take and why. `GetMinMax` action just says 'magic'.
llvm/test/CodeGen/NVPTX/fminimum-fmaximum.ll
65	What exactly do we end up generating here? If both `setp` and `min`the are inputs for `selp` then `-DAG` should be removed from `selp`

In D137655#3919749, @gflegar wrote:

In D137655#3919674, @arsenm wrote:

In D137655#3918964, @gflegar wrote:

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

It's not OK to have wrong intermediate code. We do have the "new" semantic opcodes already in FMINIMUM/FMAXIMUM

It's even less OK to fail altogether (which is what is happening without this patch).

Hard disagree

And we're not talking about the new semantics for FMINIMUM/FMAXIMUM, but for the new semantics of FMINNUM/FMAXNUM (the +/-0 handling changed).

The 2019 final spec does not have minnum or maxnum; they were removed and replaced with minimum and maximum which have specified signed zero behavior. Unless there was a draft revision I missed, there was never a defined minnum with specified -0 behavior. It would be helpful to define minnum/maxnum variants with specified -0 ordered less than +0

In D137655#3919929, @arsenm wrote:

And we're not talking about the new semantics for FMINIMUM/FMAXIMUM, but for the new semantics of FMINNUM/FMAXNUM (the +/-0 handling changed).

The 2019 final spec does not have minnum or maxnum; they were removed and replaced with minimum and maximum which have specified signed zero behavior. Unless there was a draft revision I missed, there was never a defined minnum with specified -0 behavior. It would be helpful to define minnum/maxnum variants with specified -0 ordered less than +0

I don't have access to the 2019 spec, but as far as I know it specifies both minimum and minimumNumber, where minimumNumber is 2008 minnum with a) specified signed zero behavior and b) fixed sNaN behavior (i.e. the FMINNUM rather than FMINNUM_IEEE behavior). That's what I meant by the 2019 minnum semantics.

In D137655#3921056, @nikic wrote:

I don't have access to the 2019 spec, but as far as I know it specifies both minimum and minimumNumber, where minimumNumber is 2008 minnum with a) specified signed zero behavior and b) fixed sNaN behavior (i.e. the FMINNUM rather than FMINNUM_IEEE behavior). That's what I meant by the 2019 minnum semantics.

OK, yes I was confused by the name change. "minimum" is basically the same with the specified signed zero behavior. Regardless, we should have another pair of min/max with the defined signed zero handling

In D137655#3922642, @arsenm wrote:

In D137655#3921056, @nikic wrote:

I don't have access to the 2019 spec, but as far as I know it specifies both minimum and minimumNumber, where minimumNumber is 2008 minnum with a) specified signed zero behavior and b) fixed sNaN behavior (i.e. the FMINNUM rather than FMINNUM_IEEE behavior). That's what I meant by the 2019 minnum semantics.

OK, yes I was confused by the name change. "minimum" is basically the same with the specified signed zero behavior. Regardless, we should have another pair of min/max with the defined signed zero handling

Alternatively, since I believe we don't actually have any users that don't specify the correct signed zero handling, we could just redefine FMINNUM_IEEE/FMAXNUM_IEEE to have the new behavior.

ThomasRaoux added a subscriber: ThomasRaoux.Jan 6 2023, 7:32 AM

kiranchandramohan mentioned this in D158200: [flang] Fixed simplification for FP maxval..Aug 21 2023, 1:27 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

12 lines

test/

CodeGen/

ARM/

fminmax-folds.ll

64 lines

NVPTX/

fminimum-fmaximum.ll

66 lines

Diff 474267

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 3,191 Lines • ▼ Show 20 Lines	case ISD::UMAX: {
break;		break;
}		}
case ISD::FMINNUM:		case ISD::FMINNUM:
case ISD::FMAXNUM: {		case ISD::FMAXNUM: {
if (SDValue Expanded = TLI.expandFMINNUM_FMAXNUM(Node, DAG))		if (SDValue Expanded = TLI.expandFMINNUM_FMAXNUM(Node, DAG))
Results.push_back(Expanded);		Results.push_back(Expanded);
break;		break;
}		}
		case ISD::FMINIMUM:
		case ISD::FMAXIMUM: {
		EVT VT = Node->getValueType(0);
		ISD::NodeType NT =
		Node->getOpcode() == ISD::FMINIMUM ? ISD::FMINNUM : ISD::FMAXNUM;
		Tmp1 = Node->getOperand(0);
		Tmp2 = Node->getOperand(1);
		Tmp3 = DAG.getNode(NT, dl, VT, {Tmp1, Tmp2});
		Tmp4 = DAG.getConstantFP(std::numeric_limits<double>::quiet_NaN(), dl, VT);
		Results.push_back(DAG.getSelectCC(dl, Tmp1, Tmp2, Tmp4, Tmp3, ISD::SETUO));
		break;
		}
case ISD::FSIN:		case ISD::FSIN:
case ISD::FCOS: {		case ISD::FCOS: {
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
// Turn fsin / fcos into ISD::FSINCOS node if there are a pair of fsin /		// Turn fsin / fcos into ISD::FSINCOS node if there are a pair of fsin /
// fcos which share the same operand and both are used.		// fcos which share the same operand and both are used.
if ((TLI.isOperationLegalOrCustom(ISD::FSINCOS, VT) \|\|		if ((TLI.isOperationLegalOrCustom(ISD::FSINCOS, VT) \|\|
isSinCosLibcallAvailable(Node, TLI))		isSinCosLibcallAvailable(Node, TLI))
&& useSinCos(Node)) {		&& useSinCos(Node)) {
▲ Show 20 Lines • Show All 1,921 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/fminmax-folds.ll

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	; CHECK-NEXT: bx lr
ret float %r		ret float %r
}		}

define float @test_maximum_const_inf(float %x) {		define float @test_maximum_const_inf(float %x) {
; CHECK-LABEL: test_maximum_const_inf:		; CHECK-LABEL: test_maximum_const_inf:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vldr s0, .LCPI6_0		; CHECK-NEXT: vldr s0, .LCPI6_0
; CHECK-NEXT: vmov s2, r0		; CHECK-NEXT: vmov s2, r0
; CHECK-NEXT: vmax.f32 d0, d1, d0		; CHECK-NEXT: vldr s4, .LCPI6_1
		; CHECK-NEXT: vcmp.f32 s2, s0
		; CHECK-NEXT: vmaxnm.f32 s6, s2, s0
		; CHECK-NEXT: vmrs APSR_nzcv, fpscr
		; CHECK-NEXT: vselvs.f32 s0, s4, s6
; CHECK-NEXT: vmov r0, s0		; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
; CHECK-NEXT: .p2align 2		; CHECK-NEXT: .p2align 2
; CHECK-NEXT: @ %bb.1:		; CHECK-NEXT: @ %bb.1:
; CHECK-NEXT: .LCPI6_0:		; CHECK-NEXT: .LCPI6_0:
; CHECK-NEXT: .long 0x7f800000 @ float +Inf		; CHECK-NEXT: .long 0x7f800000 @ float +Inf
		; CHECK-NEXT: .LCPI6_1:
		; CHECK-NEXT: .long 0x7fc00000 @ float NaN
%r = call float @llvm.maximum.f32(float %x, float 0x7ff0000000000000)		%r = call float @llvm.maximum.f32(float %x, float 0x7ff0000000000000)
ret float %r		ret float %r
}		}

define float @test_minimum_const_inf(float %x) {		define float @test_minimum_const_inf(float %x) {
; CHECK-LABEL: test_minimum_const_inf:		; CHECK-LABEL: test_minimum_const_inf:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
Show All 35 Lines	; CHECK-NEXT: bx lr
ret float %r		ret float %r
}		}

define float @test_minimum_const_neg_inf(float %x) {		define float @test_minimum_const_neg_inf(float %x) {
; CHECK-LABEL: test_minimum_const_neg_inf:		; CHECK-LABEL: test_minimum_const_neg_inf:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vldr s0, .LCPI11_0		; CHECK-NEXT: vldr s0, .LCPI11_0
; CHECK-NEXT: vmov s2, r0		; CHECK-NEXT: vmov s2, r0
; CHECK-NEXT: vmin.f32 d0, d1, d0		; CHECK-NEXT: vldr s4, .LCPI11_1
		; CHECK-NEXT: vcmp.f32 s2, s0
		; CHECK-NEXT: vminnm.f32 s6, s2, s0
		; CHECK-NEXT: vmrs APSR_nzcv, fpscr
		; CHECK-NEXT: vselvs.f32 s0, s4, s6
; CHECK-NEXT: vmov r0, s0		; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
; CHECK-NEXT: .p2align 2		; CHECK-NEXT: .p2align 2
; CHECK-NEXT: @ %bb.1:		; CHECK-NEXT: @ %bb.1:
; CHECK-NEXT: .LCPI11_0:		; CHECK-NEXT: .LCPI11_0:
; CHECK-NEXT: .long 0xff800000 @ float -Inf		; CHECK-NEXT: .long 0xff800000 @ float -Inf
		; CHECK-NEXT: .LCPI11_1:
		; CHECK-NEXT: .long 0x7fc00000 @ float NaN
%r = call float @llvm.minimum.f32(float %x, float 0xfff0000000000000)		%r = call float @llvm.minimum.f32(float %x, float 0xfff0000000000000)
ret float %r		ret float %r
}		}

define float @test_minnum_const_inf_nnan(float %x) {		define float @test_minnum_const_inf_nnan(float %x) {
; CHECK-LABEL: test_minnum_const_inf_nnan:		; CHECK-LABEL: test_minnum_const_inf_nnan:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	; CHECK-NEXT: .long 0x7f7fffff @ float 3.40282347E+38
ret float %r		ret float %r
}		}

define float @test_maximum_const_max(float %x) {		define float @test_maximum_const_max(float %x) {
; CHECK-LABEL: test_maximum_const_max:		; CHECK-LABEL: test_maximum_const_max:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vldr s0, .LCPI30_0		; CHECK-NEXT: vldr s0, .LCPI30_0
; CHECK-NEXT: vmov s2, r0		; CHECK-NEXT: vmov s2, r0
; CHECK-NEXT: vmax.f32 d0, d1, d0		; CHECK-NEXT: vldr s4, .LCPI30_1
		; CHECK-NEXT: vcmp.f32 s2, s0
		; CHECK-NEXT: vmaxnm.f32 s6, s2, s0
		; CHECK-NEXT: vmrs APSR_nzcv, fpscr
		; CHECK-NEXT: vselvs.f32 s0, s4, s6
; CHECK-NEXT: vmov r0, s0		; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
; CHECK-NEXT: .p2align 2		; CHECK-NEXT: .p2align 2
; CHECK-NEXT: @ %bb.1:		; CHECK-NEXT: @ %bb.1:
; CHECK-NEXT: .LCPI30_0:		; CHECK-NEXT: .LCPI30_0:
; CHECK-NEXT: .long 0x7f7fffff @ float 3.40282347E+38		; CHECK-NEXT: .long 0x7f7fffff @ float 3.40282347E+38
		; CHECK-NEXT: .LCPI30_1:
		; CHECK-NEXT: .long 0x7fc00000 @ float NaN
%r = call float @llvm.maximum.f32(float %x, float 0x47efffffe0000000)		%r = call float @llvm.maximum.f32(float %x, float 0x47efffffe0000000)
ret float %r		ret float %r
}		}

define float @test_minimum_const_max(float %x) {		define float @test_minimum_const_max(float %x) {
; CHECK-LABEL: test_minimum_const_max:		; CHECK-LABEL: test_minimum_const_max:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vldr s0, .LCPI31_0		; CHECK-NEXT: vldr s0, .LCPI31_0
; CHECK-NEXT: vmov s2, r0		; CHECK-NEXT: vmov s2, r0
; CHECK-NEXT: vmin.f32 d0, d1, d0		; CHECK-NEXT: vldr s4, .LCPI31_1
		; CHECK-NEXT: vcmp.f32 s2, s0
		; CHECK-NEXT: vminnm.f32 s6, s2, s0
		; CHECK-NEXT: vmrs APSR_nzcv, fpscr
		; CHECK-NEXT: vselvs.f32 s0, s4, s6
; CHECK-NEXT: vmov r0, s0		; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
; CHECK-NEXT: .p2align 2		; CHECK-NEXT: .p2align 2
; CHECK-NEXT: @ %bb.1:		; CHECK-NEXT: @ %bb.1:
; CHECK-NEXT: .LCPI31_0:		; CHECK-NEXT: .LCPI31_0:
; CHECK-NEXT: .long 0x7f7fffff @ float 3.40282347E+38		; CHECK-NEXT: .long 0x7f7fffff @ float 3.40282347E+38
		; CHECK-NEXT: .LCPI31_1:
		; CHECK-NEXT: .long 0x7fc00000 @ float NaN
%r = call float @llvm.minimum.f32(float %x, float 0x47efffffe0000000)		%r = call float @llvm.minimum.f32(float %x, float 0x47efffffe0000000)
ret float %r		ret float %r
}		}

define float @test_minnum_const_neg_max(float %x) {		define float @test_minnum_const_neg_max(float %x) {
; CHECK-LABEL: test_minnum_const_neg_max:		; CHECK-LABEL: test_minnum_const_neg_max:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vldr s0, .LCPI32_0		; CHECK-NEXT: vldr s0, .LCPI32_0
Show All 25 Lines	; CHECK-NEXT: .long 0xff7fffff @ float -3.40282347E+38
ret float %r		ret float %r
}		}

define float @test_maximum_const_neg_max(float %x) {		define float @test_maximum_const_neg_max(float %x) {
; CHECK-LABEL: test_maximum_const_neg_max:		; CHECK-LABEL: test_maximum_const_neg_max:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vldr s0, .LCPI34_0		; CHECK-NEXT: vldr s0, .LCPI34_0
; CHECK-NEXT: vmov s2, r0		; CHECK-NEXT: vmov s2, r0
; CHECK-NEXT: vmax.f32 d0, d1, d0		; CHECK-NEXT: vldr s4, .LCPI34_1
		; CHECK-NEXT: vcmp.f32 s2, s0
		; CHECK-NEXT: vmaxnm.f32 s6, s2, s0
		; CHECK-NEXT: vmrs APSR_nzcv, fpscr
		; CHECK-NEXT: vselvs.f32 s0, s4, s6
; CHECK-NEXT: vmov r0, s0		; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
; CHECK-NEXT: .p2align 2		; CHECK-NEXT: .p2align 2
; CHECK-NEXT: @ %bb.1:		; CHECK-NEXT: @ %bb.1:
; CHECK-NEXT: .LCPI34_0:		; CHECK-NEXT: .LCPI34_0:
; CHECK-NEXT: .long 0xff7fffff @ float -3.40282347E+38		; CHECK-NEXT: .long 0xff7fffff @ float -3.40282347E+38
		; CHECK-NEXT: .LCPI34_1:
		; CHECK-NEXT: .long 0x7fc00000 @ float NaN
%r = call float @llvm.maximum.f32(float %x, float 0xc7efffffe0000000)		%r = call float @llvm.maximum.f32(float %x, float 0xc7efffffe0000000)
ret float %r		ret float %r
}		}

define float @test_minimum_const_neg_max(float %x) {		define float @test_minimum_const_neg_max(float %x) {
; CHECK-LABEL: test_minimum_const_neg_max:		; CHECK-LABEL: test_minimum_const_neg_max:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vldr s0, .LCPI35_0		; CHECK-NEXT: vldr s0, .LCPI35_0
; CHECK-NEXT: vmov s2, r0		; CHECK-NEXT: vmov s2, r0
; CHECK-NEXT: vmin.f32 d0, d1, d0		; CHECK-NEXT: vldr s4, .LCPI35_1
		; CHECK-NEXT: vcmp.f32 s2, s0
		; CHECK-NEXT: vminnm.f32 s6, s2, s0
		; CHECK-NEXT: vmrs APSR_nzcv, fpscr
		; CHECK-NEXT: vselvs.f32 s0, s4, s6
; CHECK-NEXT: vmov r0, s0		; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
; CHECK-NEXT: .p2align 2		; CHECK-NEXT: .p2align 2
; CHECK-NEXT: @ %bb.1:		; CHECK-NEXT: @ %bb.1:
; CHECK-NEXT: .LCPI35_0:		; CHECK-NEXT: .LCPI35_0:
; CHECK-NEXT: .long 0xff7fffff @ float -3.40282347E+38		; CHECK-NEXT: .long 0xff7fffff @ float -3.40282347E+38
		; CHECK-NEXT: .LCPI35_1:
		; CHECK-NEXT: .long 0x7fc00000 @ float NaN
%r = call float @llvm.minimum.f32(float %x, float 0xc7efffffe0000000)		%r = call float @llvm.minimum.f32(float %x, float 0xc7efffffe0000000)
ret float %r		ret float %r
}		}

define float @test_minnum_const_max_ninf(float %x) {		define float @test_minnum_const_max_ninf(float %x) {
; CHECK-LABEL: test_minnum_const_max_ninf:		; CHECK-LABEL: test_minnum_const_max_ninf:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vldr s0, .LCPI36_0		; CHECK-NEXT: vldr s0, .LCPI36_0
Show All 19 Lines	; CHECK-NEXT: bx lr
ret float %r		ret float %r
}		}

define float @test_maximum_const_max_ninf(float %x) {		define float @test_maximum_const_max_ninf(float %x) {
; CHECK-LABEL: test_maximum_const_max_ninf:		; CHECK-LABEL: test_maximum_const_max_ninf:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vldr s0, .LCPI38_0		; CHECK-NEXT: vldr s0, .LCPI38_0
; CHECK-NEXT: vmov s2, r0		; CHECK-NEXT: vmov s2, r0
; CHECK-NEXT: vmax.f32 d0, d1, d0		; CHECK-NEXT: vldr s4, .LCPI38_1
		; CHECK-NEXT: vcmp.f32 s2, s0
		; CHECK-NEXT: vmaxnm.f32 s6, s2, s0
		; CHECK-NEXT: vmrs APSR_nzcv, fpscr
		; CHECK-NEXT: vselvs.f32 s0, s4, s6
; CHECK-NEXT: vmov r0, s0		; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
; CHECK-NEXT: .p2align 2		; CHECK-NEXT: .p2align 2
; CHECK-NEXT: @ %bb.1:		; CHECK-NEXT: @ %bb.1:
; CHECK-NEXT: .LCPI38_0:		; CHECK-NEXT: .LCPI38_0:
; CHECK-NEXT: .long 0x7f7fffff @ float 3.40282347E+38		; CHECK-NEXT: .long 0x7f7fffff @ float 3.40282347E+38
		; CHECK-NEXT: .LCPI38_1:
		; CHECK-NEXT: .long 0x7fc00000 @ float NaN
%r = call ninf float @llvm.maximum.f32(float %x, float 0x47efffffe0000000)		%r = call ninf float @llvm.maximum.f32(float %x, float 0x47efffffe0000000)
ret float %r		ret float %r
}		}

define float @test_minimum_const_max_ninf(float %x) {		define float @test_minimum_const_max_ninf(float %x) {
; CHECK-LABEL: test_minimum_const_max_ninf:		; CHECK-LABEL: test_minimum_const_max_ninf:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
Show All 34 Lines	; CHECK-NEXT: bx lr
ret float %r		ret float %r
}		}

define float @test_minimum_const_neg_max_ninf(float %x) {		define float @test_minimum_const_neg_max_ninf(float %x) {
; CHECK-LABEL: test_minimum_const_neg_max_ninf:		; CHECK-LABEL: test_minimum_const_neg_max_ninf:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vldr s0, .LCPI43_0		; CHECK-NEXT: vldr s0, .LCPI43_0
; CHECK-NEXT: vmov s2, r0		; CHECK-NEXT: vmov s2, r0
; CHECK-NEXT: vmin.f32 d0, d1, d0		; CHECK-NEXT: vldr s4, .LCPI43_1
		; CHECK-NEXT: vcmp.f32 s2, s0
		; CHECK-NEXT: vminnm.f32 s6, s2, s0
		; CHECK-NEXT: vmrs APSR_nzcv, fpscr
		; CHECK-NEXT: vselvs.f32 s0, s4, s6
; CHECK-NEXT: vmov r0, s0		; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
; CHECK-NEXT: .p2align 2		; CHECK-NEXT: .p2align 2
; CHECK-NEXT: @ %bb.1:		; CHECK-NEXT: @ %bb.1:
; CHECK-NEXT: .LCPI43_0:		; CHECK-NEXT: .LCPI43_0:
; CHECK-NEXT: .long 0xff7fffff @ float -3.40282347E+38		; CHECK-NEXT: .long 0xff7fffff @ float -3.40282347E+38
		; CHECK-NEXT: .LCPI43_1:
		; CHECK-NEXT: .long 0x7fc00000 @ float NaN
%r = call ninf float @llvm.minimum.f32(float %x, float 0xc7efffffe0000000)		%r = call ninf float @llvm.minimum.f32(float %x, float 0xc7efffffe0000000)
ret float %r		ret float %r
}		}

define float @test_minnum_const_max_nnan_ninf(float %x) {		define float @test_minnum_const_max_nnan_ninf(float %x) {
; CHECK-LABEL: test_minnum_const_max_nnan_ninf:		; CHECK-LABEL: test_minnum_const_max_nnan_ninf:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/CodeGen/NVPTX/fminimum-fmaximum.ll

	; RUN: llc < %s -march=nvptx \| FileCheck %s --check-prefixes=CHECK,CHECK-NONAN			; RUN: llc < %s -march=nvptx \| FileCheck %s --check-prefixes=CHECK,CHECK-NONAN
	; RUN: llc < %s -march=nvptx -mcpu=sm_80 \| FileCheck %s --check-prefixes=CHECK,CHECK-NAN			; RUN: llc < %s -march=nvptx -mcpu=sm_80 \| FileCheck %s --check-prefixes=CHECK,CHECK-NAN
	; RUN: %if ptxas %{ llc < %s -march=nvptx \| %ptxas-verify %}			; RUN: %if ptxas %{ llc < %s -march=nvptx \| %ptxas-verify %}
	; RUN: %if ptxas-11.0 %{ llc < %s -march=nvptx -mcpu=sm_80 \| %ptxas-verify -arch=sm_80 %}			; RUN: %if ptxas-11.0 %{ llc < %s -march=nvptx -mcpu=sm_80 \| %ptxas-verify -arch=sm_80 %}

	; ---- minimum ----			; ---- minimum ----

				declare half @llvm.minimum.f16(half %a, half %b)
				declare float @llvm.minimum.f32(float %a, float %b)
				declare double @llvm.minimum.f64(double %a, double %b)

	; CHECK-LABEL: minimum_half			; CHECK-LABEL: minimum_half
	define half @minimum_half(half %a) #0 {			define half @minimum_half(half %a) #0 {
	; CHECK-NONAN: setp			; CHECK-NONAN: setp
	; CHECK-NONAN: selp.b16			; CHECK-NONAN: selp.b16
	; CHECK-NAN: min.NaN.f16			; CHECK-NAN: min.NaN.f16
	%p = fcmp ult half %a, 0.0			%p = fcmp ult half %a, 0.0
	%x = select i1 %p, half %a, half 0.0			%x = select i1 %p, half %a, half 0.0
	ret half %x			ret half %x
	}			}

				; CHECK-LABEL: minimum_intr_half
				define half @minimum_intr_half(half %a, half %b) #0 {
				; CHECK-NONAN-DAG: min.f32
				; CHECK-NONAN-DAG: setp.nan.f32
				; CHECK-NONAN-DAG: selp.b16
				; CHECK-NAN: min.NaN.f16
				%x = call half @llvm.minimum.f16(half %a, half %b)
				ret half %x
				}

	; CHECK-LABEL: minimum_float			; CHECK-LABEL: minimum_float
	define float @minimum_float(float %a) #0 {			define float @minimum_float(float %a) #0 {
	; CHECK-NONAN: setp			; CHECK-NONAN: setp
	; CHECK-NONAN: selp.f32			; CHECK-NONAN: selp.f32
	; CHECK-NAN: min.NaN.f32			; CHECK-NAN: min.NaN.f32
	%p = fcmp ult float %a, 0.0			%p = fcmp ult float %a, 0.0
	%x = select i1 %p, float %a, float 0.0			%x = select i1 %p, float %a, float 0.0
	ret float %x			ret float %x
	}			}

				; CHECK-LABEL: minimum_intr_float
				define float @minimum_intr_float(float %a, float %b) #0 {
				; CHECK-NONAN-DAG: min.f32
				; CHECK-NONAN-DAG: setp.nan.f32
				; CHECK-NONAN-DAG: selp.f32
				; CHECK-NAN: min.NaN.f32
				%x = call float @llvm.minimum.f32(float %a, float %b)
				ret float %x
				}

	; CHECK-LABEL: minimum_double			; CHECK-LABEL: minimum_double
	define double @minimum_double(double %a) #0 {			define double @minimum_double(double %a) #0 {
	; CHECK: setp			; CHECK: setp
	; CHECK: selp.f64			; CHECK: selp.f64
	%p = fcmp ult double %a, 0.0			%p = fcmp ult double %a, 0.0
	%x = select i1 %p, double %a, double 0.0			%x = select i1 %p, double %a, double 0.0
	ret double %x			ret double %x
	}			}

				; CHECK-LABEL: minimum_intr_double
				define double @minimum_intr_double(double %a, double %b) #0 {
				; CHECK-DAG: min.f64
				; CHECK-DAG: setp.nan.f64
				; CHECK-DAG: selp.f64
				traUnsubmitted Not Done Reply Inline Actions What exactly do we end up generating here? If both `setp` and `min`the are inputs for `selp` then `-DAG` should be removed from `selp` tra: What exactly do we end up generating here? If both `setp` and `min`the are inputs for `selp`…
				%x = call double @llvm.minimum.f64(double %a, double %b)
				ret double %x
				}

	; CHECK-LABEL: minimum_v2half			; CHECK-LABEL: minimum_v2half
	define <2 x half> @minimum_v2half(<2 x half> %a) #0 {			define <2 x half> @minimum_v2half(<2 x half> %a) #0 {
	; CHECK-NONAN-DAG: setp			; CHECK-NONAN-DAG: setp
	; CHECK-NONAN-DAG: setp			; CHECK-NONAN-DAG: setp
	; CHECK-NONAN-DAG: selp.b16			; CHECK-NONAN-DAG: selp.b16
	; CHECK-NONAN-DAG: selp.b16			; CHECK-NONAN-DAG: selp.b16
	; CHECK-NAN: min.NaN.f16x2			; CHECK-NAN: min.NaN.f16x2
	%p = fcmp ult <2 x half> %a, zeroinitializer			%p = fcmp ult <2 x half> %a, zeroinitializer
	%x = select <2 x i1> %p, <2 x half> %a, <2 x half> zeroinitializer			%x = select <2 x i1> %p, <2 x half> %a, <2 x half> zeroinitializer
	ret <2 x half> %x			ret <2 x half> %x
	}			}

	; ---- maximum ----			; ---- maximum ----

				declare half @llvm.maximum.f16(half %a, half %b)
				declare float @llvm.maximum.f32(float %a, float %b)
				declare double @llvm.maximum.f64(double %a, double %b)

	; CHECK-LABEL: maximum_half			; CHECK-LABEL: maximum_half
	define half @maximum_half(half %a) #0 {			define half @maximum_half(half %a) #0 {
	; CHECK-NONAN: setp			; CHECK-NONAN: setp
	; CHECK-NONAN: selp.b16			; CHECK-NONAN: selp.b16
	; CHECK-NAN: max.NaN.f16			; CHECK-NAN: max.NaN.f16
	%p = fcmp ugt half %a, 0.0			%p = fcmp ugt half %a, 0.0
	%x = select i1 %p, half %a, half 0.0			%x = select i1 %p, half %a, half 0.0
	ret half %x			ret half %x
	}			}

				; CHECK-LABEL: maximum_intr_half
				define half @maximum_intr_half(half %a, half %b) #0 {
				; CHECK-NONAN-DAG: max.f32
				; CHECK-NONAN-DAG: setp.nan.f32
				; CHECK-NONAN-DAG: selp.b16
				; CHECK-NAN: max.NaN.f16
				%x = call half @llvm.maximum.f16(half %a, half %b)
				ret half %x
				}

	; CHECK-LABEL: maximum_float			; CHECK-LABEL: maximum_float
	define float @maximum_float(float %a) #0 {			define float @maximum_float(float %a) #0 {
	; CHECK-NONAN: setp			; CHECK-NONAN: setp
	; CHECK-NONAN: selp.f32			; CHECK-NONAN: selp.f32
	; CHECK-NAN: max.NaN.f32			; CHECK-NAN: max.NaN.f32
	%p = fcmp ugt float %a, 0.0			%p = fcmp ugt float %a, 0.0
	%x = select i1 %p, float %a, float 0.0			%x = select i1 %p, float %a, float 0.0
	ret float %x			ret float %x
	}			}

				; CHECK-LABEL: maximum_intr_float
				define float @maximum_intr_float(float %a, float %b) #0 {
				; CHECK-NONAN-DAG: max.f32
				; CHECK-NONAN-DAG: setp.nan.f32
				; CHECK-NONAN-DAG: selp.f32
				; CHECK-NAN: max.NaN.f32
				%x = call float @llvm.maximum.f32(float %a, float %b)
				ret float %x
				}

	; CHECK-LABEL: maximum_double			; CHECK-LABEL: maximum_double
	define double @maximum_double(double %a) #0 {			define double @maximum_double(double %a) #0 {
	; CHECK: setp			; CHECK: setp
	; CHECK: selp.f64			; CHECK: selp.f64
	%p = fcmp ugt double %a, 0.0			%p = fcmp ugt double %a, 0.0
	%x = select i1 %p, double %a, double 0.0			%x = select i1 %p, double %a, double 0.0
	ret double %x			ret double %x
	}			}

				; CHECK-LABEL: maximum_intr_double
				define double @maximum_intr_double(double %a, double %b) #0 {
				; CHECK-DAG: max.f64
				; CHECK-DAG: setp.nan.f64
				; CHECK-DAG: selp.f64
				%x = call double @llvm.maximum.f64(double %a, double %b)
				ret double %x
				}

	; CHECK-LABEL: maximum_v2half			; CHECK-LABEL: maximum_v2half
	define <2 x half> @maximum_v2half(<2 x half> %a) #0 {			define <2 x half> @maximum_v2half(<2 x half> %a) #0 {
	; CHECK-NONAN-DAG: setp			; CHECK-NONAN-DAG: setp
	; CHECK-NONAN-DAG: setp			; CHECK-NONAN-DAG: setp
	; CHECK-NONAN-DAG: selp.b16			; CHECK-NONAN-DAG: selp.b16
	; CHECK-NONAN-DAG: selp.b16			; CHECK-NONAN-DAG: selp.b16
	; CHECK-NAN: max.NaN.f16x2			; CHECK-NAN: max.NaN.f16x2
	%p = fcmp ugt <2 x half> %a, zeroinitializer			%p = fcmp ugt <2 x half> %a, zeroinitializer
	%x = select <2 x i1> %p, <2 x half> %a, <2 x half> zeroinitializer			%x = select <2 x i1> %p, <2 x half> %a, <2 x half> zeroinitializer
	ret <2 x half> %x			ret <2 x half> %x
	}			}