This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
-
NVPTXISelLowering.h
3
NVPTXISelLowering.cpp
-
test/CodeGen/NVPTX/
-
CodeGen/
-
NVPTX/
1
fminimum-fmaximum.ll

Differential D137655

Expand fminimum/fmaximum into fminnum/fmaxnum + NaN check
Needs ReviewPublic

Authored by gflegar on Nov 8 2022, 9:51 AM.

Download Raw Diff

Details

Reviewers

csigg
efriedma

Summary

We do not have an instruction for this in PTX prior to SM 8.0, so we are
expanding it. However, there is no expansion defined for this op in LLVM, so
define a custom expansion for the NVPTX backend instead (the same thing does
not really work on LLVM level due to fminnum/fmaxnum semantics for
-0.0 / +0.0).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gflegar created this revision.Nov 8 2022, 9:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 9:51 AM

Herald added subscribers: mattd, gchakrabarti, asavonic, hiraditya. · View Herald Transcript

gflegar requested review of this revision.Nov 8 2022, 9:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 9:51 AM

Herald added subscribers: llvm-commits, jholewinski. · View Herald Transcript

gflegar added a reviewer: csigg.Nov 8 2022, 9:53 AM

Doesn't this handle signed zero incorrectly?

Harbormaster completed remote builds in B196738: Diff 474036.Nov 8 2022, 10:36 AM

In D137655#3915463, @nikic wrote:

Doesn't this handle signed zero incorrectly?

I believe it is:

For FMINIMUM:

Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 ULT Tmp2 == True   =>  Tmp3 = Tmp1 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0
Tmp1 = +0.0, Tmp2 = -0.0  =>  Tmp1 ULT Tmp2 == False  =>  Tmp3 = Tmp2 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0

For FMAXIMUM:

Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 UGT Tmp2 == False  =>  Tmp3 = Tmp2 = +0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = +0.0
Tmp1 = +0.0, Tmp2 = -0.0  =>  Tmp1 UGT Tmp2 == True   =>  Tmp3 = Tmp1 = +0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = +0.0

In D137655#3916906, @gflegar wrote:
In D137655#3915463, @nikic wrote:

Doesn't this handle signed zero incorrectly?

I believe it is:

For FMINIMUM:
Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 ULT Tmp2 == True   =>  Tmp3 = Tmp1 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0

Isn't -0.0 ULT 0.0 false, because negative zero and positive zero are equal?

In D137655#3916933, @nikic wrote:
In D137655#3916906, @gflegar wrote:
In D137655#3915463, @nikic wrote:

Doesn't this handle signed zero incorrectly?

I believe it is:

For FMINIMUM:
Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 ULT Tmp2 == True   =>  Tmp3 = Tmp1 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0
Isn't -0.0 ULT 0.0 false, because negative zero and positive zero are equal?

Right, I checked the standard now. For comparisons they're considered equal, but for maximum / minimum there's a special exception that they're considered as -0.0 < 0.0. Why would anyone define this so inconsistently ... -.-
The problem for us here is performance. We would need more instructions to implement it correctly via comparisons (2 to compare + select, at least 2 for 0 handling, at least 2 for NaNs). I'm mostly concerned with the PTX backend, and for it the correct and efficient way to expand this would be to use minnum (builtin instruction), and then a NaN check. If that is true, we generate our own NaN constant (if I'm reading the standard correctly it just requires us to return a quiet NaN, which doesn't have to be the same NaN as any of the operands).
I'll try doing that instead.

Change lowering to minnum/maxnum + NaN check

Also update the failing arm test to pass.
I'm not an expert on ARM, but looking at ARMISelLowering.cpp, it does
specify exactly under which conditions minimum / maximum instructions
are available. Thus, they also likely had the same silent failure
that the PTX side had, and the test was likely wrong (since the checks
were auto-generated).

@nikic 0 handling should be fixed now

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

nikic added a reviewer: efriedma.Nov 9 2022, 7:35 AM

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Unfortunately, I don't have the bandwidth to chase this rabbit hole further, especially since our use case is insensitive to what happens for -0 and +0. I can add a TODO comment to fix this. Though I would still argue for submitting this, as it is correct modulo -0/+0, which is far preferable to the current state where we have a silent failure (and produce invalid code) for the backends that attempt to expand the op (like in NVPTX).

Harbormaster completed remote builds in B196902: Diff 474267.Nov 9 2022, 8:57 AM

In D137655#3917468, @gflegar wrote:

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Unfortunately, I don't have the bandwidth to chase this rabbit hole further, especially since our use case is insensitive to what happens for -0 and +0. I can add a TODO comment to fix this. Though I would still argue for submitting this, as it is correct modulo -0/+0, which is far preferable to the current state where we have a silent failure (and produce invalid code) for the backends that attempt to expand the op (like in NVPTX).

That sounds like an unrelated bug. If an operation is Expand, but we don't support expanding it, shouldn't that result in an isel failure?

In D137655#3917621, @nikic wrote:

In D137655#3917468, @gflegar wrote:

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Unfortunately, I don't have the bandwidth to chase this rabbit hole further, especially since our use case is insensitive to what happens for -0 and +0. I can add a TODO comment to fix this. Though I would still argue for submitting this, as it is correct modulo -0/+0, which is far preferable to the current state where we have a silent failure (and produce invalid code) for the backends that attempt to expand the op (like in NVPTX).

That sounds like an unrelated bug. If an operation is Expand, but we don't support expanding it, shouldn't that result in an isel failure?

Yes, there is an orthogonal bug where this is a silent, and not a real failure. However, even if that bug is fixed, we would still fail (just earlier), which is fixed by this change.

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

Limit expansion only to NVPTX backend

There it ends up handling all the cases correctly, due to the expanded
semantics of the min/max PTX instructions for +/-0.0.

I think this is the best we can do short of adding a new op. We only do the expansion for the NVPTX backed, and don't support it otherwise. In the NVPTX backed, the intermediate code still ends up being semantically incorrect for +/-0.0, but since FMINNUM/FMAXNUM lower to PTX min/max, which do implement the 2018 semantics of those ops, the final PTX ends up being correct.

Once we have an op in LLVM that represents the 2018 semantics, we can lower it to that instead, to make the intermediate code semantically correct as well.

gflegar retitled this revision from Expand fminimum and fmaximum into a pair of selects to Expand fminimum/fmaximum into fminnum/fmaxnum + NaN check.Nov 10 2022, 6:49 AM

gflegar edited the summary of this revision. (Show Details)

gflegar added a child revision: D137786: Lower arith.min/max to llvm.intr.minimum/maximum.Nov 10 2022, 7:03 AM

Harbormaster completed remote builds in B197074: Diff 474535.Nov 10 2022, 7:47 AM

In D137655#3918964, @gflegar wrote:

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

It's not OK to have wrong intermediate code. We do have the "new" semantic opcodes already in FMINIMUM/FMAXIMUM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
2227–2228	Should go through APFloat
2227–2228	Can also move this to generic code and check which of the variants are legal

Formatting fixes

In D137655#3919674, @arsenm wrote:

In D137655#3918964, @gflegar wrote:

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

It's not OK to have wrong intermediate code. We do have the "new" semantic opcodes already in FMINIMUM/FMAXIMUM

It's even less OK to fail altogether (which is what is happening without this patch). And we're not talking about the new semantics for FMINIMUM/FMAXIMUM, but for the new semantics of FMINNUM/FMAXNUM (the +/-0 handling changed).

Harbormaster completed remote builds in B197094: Diff 474565.Nov 10 2022, 10:01 AM

We do not have an instruction for this in PTX prior to SM 8.0,

I assume that we're talking about min.nan.*/man.nan.* instruction variants that appeared in PTX7.0 on sm80+.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-max
docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-max

Looks like we do not properly constrain instruction availability in llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td#L940
There are no predicates on sm80+ or ptx70 and we sort of rely on custom lowering, and not always correctly as things stand:
https://godbolt.org/z/G8vYb5ajT

This patch should help generating correct instructions for fp64.

Still, I think we need to fix instruction definitions to correctly reflect their availability.

On a side note, we may want to add some min/max correctness tests to CUDA tests in llvm test-suite. Considering that we have different lowering on different GPUs, we do want to make sure that we actually do consistently get the results we expect across different GPUs and CUDA versions. We currently do not have any sm80 GPUs on cuda buildbots, but we'll get them eventually.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
615	I think this should be refactored into a more generic `GetGPUAction(sm, ptx, ifAvailableAction, FallbackAction)`. This would make it clear which actions we take and why. `GetMinMax` action just says 'magic'.
llvm/test/CodeGen/NVPTX/fminimum-fmaximum.ll
65	What exactly do we end up generating here? If both `setp` and `min`the are inputs for `selp` then `-DAG` should be removed from `selp`

In D137655#3919749, @gflegar wrote:

In D137655#3919674, @arsenm wrote:

In D137655#3918964, @gflegar wrote:

In D137655#3917359, @nikic wrote:

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

It's not OK to have wrong intermediate code. We do have the "new" semantic opcodes already in FMINIMUM/FMAXIMUM

It's even less OK to fail altogether (which is what is happening without this patch).

Hard disagree

And we're not talking about the new semantics for FMINIMUM/FMAXIMUM, but for the new semantics of FMINNUM/FMAXNUM (the +/-0 handling changed).

The 2019 final spec does not have minnum or maxnum; they were removed and replaced with minimum and maximum which have specified signed zero behavior. Unless there was a draft revision I missed, there was never a defined minnum with specified -0 behavior. It would be helpful to define minnum/maxnum variants with specified -0 ordered less than +0

In D137655#3919929, @arsenm wrote:

And we're not talking about the new semantics for FMINIMUM/FMAXIMUM, but for the new semantics of FMINNUM/FMAXNUM (the +/-0 handling changed).

The 2019 final spec does not have minnum or maxnum; they were removed and replaced with minimum and maximum which have specified signed zero behavior. Unless there was a draft revision I missed, there was never a defined minnum with specified -0 behavior. It would be helpful to define minnum/maxnum variants with specified -0 ordered less than +0

I don't have access to the 2019 spec, but as far as I know it specifies both minimum and minimumNumber, where minimumNumber is 2008 minnum with a) specified signed zero behavior and b) fixed sNaN behavior (i.e. the FMINNUM rather than FMINNUM_IEEE behavior). That's what I meant by the 2019 minnum semantics.

In D137655#3921056, @nikic wrote:

I don't have access to the 2019 spec, but as far as I know it specifies both minimum and minimumNumber, where minimumNumber is 2008 minnum with a) specified signed zero behavior and b) fixed sNaN behavior (i.e. the FMINNUM rather than FMINNUM_IEEE behavior). That's what I meant by the 2019 minnum semantics.

OK, yes I was confused by the name change. "minimum" is basically the same with the specified signed zero behavior. Regardless, we should have another pair of min/max with the defined signed zero handling

In D137655#3922642, @arsenm wrote:

In D137655#3921056, @nikic wrote:

I don't have access to the 2019 spec, but as far as I know it specifies both minimum and minimumNumber, where minimumNumber is 2008 minnum with a) specified signed zero behavior and b) fixed sNaN behavior (i.e. the FMINNUM rather than FMINNUM_IEEE behavior). That's what I meant by the 2019 minnum semantics.

OK, yes I was confused by the name change. "minimum" is basically the same with the specified signed zero behavior. Regardless, we should have another pair of min/max with the defined signed zero handling

Alternatively, since I believe we don't actually have any users that don't specify the correct signed zero handling, we could just redefine FMINNUM_IEEE/FMAXNUM_IEEE to have the new behavior.

ThomasRaoux added a subscriber: ThomasRaoux.Jan 6 2023, 7:32 AM

kiranchandramohan mentioned this in D158200: [flang] Fixed simplification for FP maxval..Aug 21 2023, 1:27 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

NVPTX/

NVPTXISelLowering.h

2 lines

NVPTXISelLowering.cpp

34 lines

test/

CodeGen/

NVPTX/

fminimum-fmaximum.ll

66 lines

Diff 474565

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

Show First 20 Lines • Show All 578 Lines • ▼ Show 20 Lines	private:
SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFROUND32(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND32(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFROUND64(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND64(SDValue Op, SelectionDAG &DAG) const;

		SDValue LowerFMINIMUM_FMAXIMUM(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerLOADi1(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerLOADi1(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSTOREi1(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSTOREi1(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSTOREVector(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSTOREVector(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;
Show All 14 Lines

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Show First 20 Lines • Show All 606 Lines • ▼ Show 20 Lines	NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
for (const auto &Op :		for (const auto &Op :
{ISD::FDIV, ISD::FREM, ISD::FSQRT, ISD::FSIN, ISD::FCOS, ISD::FABS}) {		{ISD::FDIV, ISD::FREM, ISD::FSQRT, ISD::FSIN, ISD::FCOS, ISD::FABS}) {
setOperationAction(Op, MVT::f16, Promote);		setOperationAction(Op, MVT::f16, Promote);
setOperationAction(Op, MVT::f32, Legal);		setOperationAction(Op, MVT::f32, Legal);
setOperationAction(Op, MVT::f64, Legal);		setOperationAction(Op, MVT::f64, Legal);
setOperationAction(Op, MVT::v2f16, Expand);		setOperationAction(Op, MVT::v2f16, Expand);
}		}
// max.f16, max.f16x2 and max.NaN are supported on sm_80+.		// max.f16, max.f16x2 and max.NaN are supported on sm_80+.
auto GetMinMaxAction = [&](LegalizeAction NotSm80Action) {		auto GetMinMaxAction = [&](LegalizeAction NotSm80Action) {
		traUnsubmitted Not Done Reply Inline Actions I think this should be refactored into a more generic `GetGPUAction(sm, ptx, ifAvailableAction, FallbackAction)`. This would make it clear which actions we take and why. `GetMinMax` action just says 'magic'. tra: I think this should be refactored into a more generic `GetGPUAction(sm, ptx, ifAvailableAction…
bool IsAtLeastSm80 = STI.getSmVersion() >= 80 && STI.getPTXVersion() >= 70;		bool IsAtLeastSm80 = STI.getSmVersion() >= 80 && STI.getPTXVersion() >= 70;
return IsAtLeastSm80 ? Legal : NotSm80Action;		return IsAtLeastSm80 ? Legal : NotSm80Action;
};		};
for (const auto &Op : {ISD::FMINNUM, ISD::FMAXNUM}) {		for (const auto &Op : {ISD::FMINNUM, ISD::FMAXNUM}) {
setFP16OperationAction(Op, MVT::f16, GetMinMaxAction(Promote), Promote);		setFP16OperationAction(Op, MVT::f16, GetMinMaxAction(Promote), Promote);
setOperationAction(Op, MVT::f32, Legal);		setOperationAction(Op, MVT::f32, Legal);
setOperationAction(Op, MVT::f64, Legal);		setOperationAction(Op, MVT::f64, Legal);
setFP16OperationAction(Op, MVT::v2f16, GetMinMaxAction(Expand), Expand);		setFP16OperationAction(Op, MVT::v2f16, GetMinMaxAction(Expand), Expand);
}		}
for (const auto &Op : {ISD::FMINIMUM, ISD::FMAXIMUM}) {		for (const auto &Op : {ISD::FMINIMUM, ISD::FMAXIMUM}) {
setFP16OperationAction(Op, MVT::f16, GetMinMaxAction(Expand), Expand);		setFP16OperationAction(Op, MVT::f16, GetMinMaxAction(Custom), Custom);
setOperationAction(Op, MVT::f32, GetMinMaxAction(Expand));		setOperationAction(Op, MVT::f32, GetMinMaxAction(Custom));
setFP16OperationAction(Op, MVT::v2f16, GetMinMaxAction(Expand), Expand);		setOperationAction(Op, MVT::f64, Custom);
		setFP16OperationAction(Op, MVT::v2f16, GetMinMaxAction(Custom), Custom);
}		}

// No FEXP2, FLOG2. The PTX ex2 and log2 functions are always approximate.		// No FEXP2, FLOG2. The PTX ex2 and log2 functions are always approximate.
// No FPOW or FREM in PTX.		// No FPOW or FREM in PTX.

// Now deduce the information based on the above mentioned		// Now deduce the information based on the above mentioned
// actions		// actions
computeRegisterProperties(STI.getRegisterInfo());		computeRegisterProperties(STI.getRegisterInfo());
▲ Show 20 Lines • Show All 1,560 Lines • ▼ Show 20 Lines	SDValue NVPTXTargetLowering::LowerFROUND64(SDValue Op,

// RoundedA = abs(A) > 0x1.0p52 ? A : RoundedA;		// RoundedA = abs(A) > 0x1.0p52 ? A : RoundedA;
SDValue IsLarge =		SDValue IsLarge =
DAG.getSetCC(SL, SetCCVT, AbsA, DAG.getConstantFP(pow(2.0, 52.0), SL, VT),		DAG.getSetCC(SL, SetCCVT, AbsA, DAG.getConstantFP(pow(2.0, 52.0), SL, VT),
ISD::SETOGT);		ISD::SETOGT);
return DAG.getNode(ISD::SELECT, SL, VT, IsLarge, A, RoundedA);		return DAG.getNode(ISD::SELECT, SL, VT, IsLarge, A, RoundedA);
}		}

		// Lower FMINIMUM / FMAXIMUM for SM < 8.0. We use FMINNUM / FMAXNUM followed by
		// a NaN check to handle NaNs correctly.
		//
		// Techincally, FMINNUM/FMAXNUM do not handle the -0.0 / +0.0 case correctly,
		// since they define them according to the IEEE 754-2008 semantics (it's
		// undefined which one is returned). However, the PTX min/max instructions to
		// which FMINNUM and FMAXNUM are lowered to conform to the IEEE 754-2019
		// semantics (-0.0 < +0.0), thus the lowering ends up working out correctly.
		//
		// TODO: Replace FMINNUM/FMAXNUM with ops that conform to IEEE 754-2019 standard
		// once those are available in LLVM.
		SDValue NVPTXTargetLowering::LowerFMINIMUM_FMAXIMUM(SDValue Op,
		SelectionDAG &DAG) const {
		EVT VT = Op.getValueType();
		ISD::NodeType NT =
		Op.getOpcode() == ISD::FMINIMUM ? ISD::FMINNUM : ISD::FMAXNUM;
		SDValue LHS = Op.getOperand(0);
		SDValue RHS = Op.getOperand(1);
		SDLoc SL(Op);

		SDValue NonPropagatingResult = DAG.getNode(NT, SL, VT, {LHS, RHS});
		SDValue NaN =
		DAG.getConstantFP(std::numeric_limits<double>::quiet_NaN(), SL, VT);
		arsenmUnsubmitted Not Done Reply Inline Actions Should go through APFloat arsenm: Should go through APFloat
		arsenmUnsubmitted Not Done Reply Inline Actions Can also move this to generic code and check which of the variants are legal arsenm: Can also move this to generic code and check which of the variants are legal
		return DAG.getSelectCC(SL, LHS, RHS, NaN, NonPropagatingResult, ISD::SETUO);
		}

SDValue		SDValue
NVPTXTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {		NVPTXTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
case ISD::RETURNADDR:		case ISD::RETURNADDR:
return SDValue();		return SDValue();
case ISD::FRAMEADDR:		case ISD::FRAMEADDR:
return SDValue();		return SDValue();
Show All 17 Lines	case ISD::SHL_PARTS:
return LowerShiftLeftParts(Op, DAG);		return LowerShiftLeftParts(Op, DAG);
case ISD::SRA_PARTS:		case ISD::SRA_PARTS:
case ISD::SRL_PARTS:		case ISD::SRL_PARTS:
return LowerShiftRightParts(Op, DAG);		return LowerShiftRightParts(Op, DAG);
case ISD::SELECT:		case ISD::SELECT:
return LowerSelect(Op, DAG);		return LowerSelect(Op, DAG);
case ISD::FROUND:		case ISD::FROUND:
return LowerFROUND(Op, DAG);		return LowerFROUND(Op, DAG);
		case ISD::FMINIMUM:
		case ISD::FMAXIMUM:
		return LowerFMINIMUM_FMAXIMUM(Op, DAG);
default:		default:
llvm_unreachable("Custom lowering not defined for operation");		llvm_unreachable("Custom lowering not defined for operation");
}		}
}		}

SDValue NVPTXTargetLowering::LowerSelect(SDValue Op, SelectionDAG &DAG) const {		SDValue NVPTXTargetLowering::LowerSelect(SDValue Op, SelectionDAG &DAG) const {
SDValue Op0 = Op->getOperand(0);		SDValue Op0 = Op->getOperand(0);
SDValue Op1 = Op->getOperand(1);		SDValue Op1 = Op->getOperand(1);
▲ Show 20 Lines • Show All 3,025 Lines • Show Last 20 Lines

llvm/test/CodeGen/NVPTX/fminimum-fmaximum.ll

	; RUN: llc < %s -march=nvptx \| FileCheck %s --check-prefixes=CHECK,CHECK-NONAN			; RUN: llc < %s -march=nvptx \| FileCheck %s --check-prefixes=CHECK,CHECK-NONAN
	; RUN: llc < %s -march=nvptx -mcpu=sm_80 \| FileCheck %s --check-prefixes=CHECK,CHECK-NAN			; RUN: llc < %s -march=nvptx -mcpu=sm_80 \| FileCheck %s --check-prefixes=CHECK,CHECK-NAN
	; RUN: %if ptxas %{ llc < %s -march=nvptx \| %ptxas-verify %}			; RUN: %if ptxas %{ llc < %s -march=nvptx \| %ptxas-verify %}
	; RUN: %if ptxas-11.0 %{ llc < %s -march=nvptx -mcpu=sm_80 \| %ptxas-verify -arch=sm_80 %}			; RUN: %if ptxas-11.0 %{ llc < %s -march=nvptx -mcpu=sm_80 \| %ptxas-verify -arch=sm_80 %}

	; ---- minimum ----			; ---- minimum ----

				declare half @llvm.minimum.f16(half %a, half %b)
				declare float @llvm.minimum.f32(float %a, float %b)
				declare double @llvm.minimum.f64(double %a, double %b)

	; CHECK-LABEL: minimum_half			; CHECK-LABEL: minimum_half
	define half @minimum_half(half %a) #0 {			define half @minimum_half(half %a) #0 {
	; CHECK-NONAN: setp			; CHECK-NONAN: setp
	; CHECK-NONAN: selp.b16			; CHECK-NONAN: selp.b16
	; CHECK-NAN: min.NaN.f16			; CHECK-NAN: min.NaN.f16
	%p = fcmp ult half %a, 0.0			%p = fcmp ult half %a, 0.0
	%x = select i1 %p, half %a, half 0.0			%x = select i1 %p, half %a, half 0.0
	ret half %x			ret half %x
	}			}

				; CHECK-LABEL: minimum_intr_half
				define half @minimum_intr_half(half %a, half %b) #0 {
				; CHECK-NONAN-DAG: min.f32
				; CHECK-NONAN-DAG: setp.nan.f32
				; CHECK-NONAN-DAG: selp.b16
				; CHECK-NAN: min.NaN.f16
				%x = call half @llvm.minimum.f16(half %a, half %b)
				ret half %x
				}

	; CHECK-LABEL: minimum_float			; CHECK-LABEL: minimum_float
	define float @minimum_float(float %a) #0 {			define float @minimum_float(float %a) #0 {
	; CHECK-NONAN: setp			; CHECK-NONAN: setp
	; CHECK-NONAN: selp.f32			; CHECK-NONAN: selp.f32
	; CHECK-NAN: min.NaN.f32			; CHECK-NAN: min.NaN.f32
	%p = fcmp ult float %a, 0.0			%p = fcmp ult float %a, 0.0
	%x = select i1 %p, float %a, float 0.0			%x = select i1 %p, float %a, float 0.0
	ret float %x			ret float %x
	}			}

				; CHECK-LABEL: minimum_intr_float
				define float @minimum_intr_float(float %a, float %b) #0 {
				; CHECK-NONAN-DAG: min.f32
				; CHECK-NONAN-DAG: setp.nan.f32
				; CHECK-NONAN-DAG: selp.f32
				; CHECK-NAN: min.NaN.f32
				%x = call float @llvm.minimum.f32(float %a, float %b)
				ret float %x
				}

	; CHECK-LABEL: minimum_double			; CHECK-LABEL: minimum_double
	define double @minimum_double(double %a) #0 {			define double @minimum_double(double %a) #0 {
	; CHECK: setp			; CHECK: setp
	; CHECK: selp.f64			; CHECK: selp.f64
	%p = fcmp ult double %a, 0.0			%p = fcmp ult double %a, 0.0
	%x = select i1 %p, double %a, double 0.0			%x = select i1 %p, double %a, double 0.0
	ret double %x			ret double %x
	}			}

				; CHECK-LABEL: minimum_intr_double
				define double @minimum_intr_double(double %a, double %b) #0 {
				; CHECK-DAG: min.f64
				; CHECK-DAG: setp.nan.f64
				; CHECK-DAG: selp.f64
				traUnsubmitted Not Done Reply Inline Actions What exactly do we end up generating here? If both `setp` and `min`the are inputs for `selp` then `-DAG` should be removed from `selp` tra: What exactly do we end up generating here? If both `setp` and `min`the are inputs for `selp`…
				%x = call double @llvm.minimum.f64(double %a, double %b)
				ret double %x
				}

	; CHECK-LABEL: minimum_v2half			; CHECK-LABEL: minimum_v2half
	define <2 x half> @minimum_v2half(<2 x half> %a) #0 {			define <2 x half> @minimum_v2half(<2 x half> %a) #0 {
	; CHECK-NONAN-DAG: setp			; CHECK-NONAN-DAG: setp
	; CHECK-NONAN-DAG: setp			; CHECK-NONAN-DAG: setp
	; CHECK-NONAN-DAG: selp.b16			; CHECK-NONAN-DAG: selp.b16
	; CHECK-NONAN-DAG: selp.b16			; CHECK-NONAN-DAG: selp.b16
	; CHECK-NAN: min.NaN.f16x2			; CHECK-NAN: min.NaN.f16x2
	%p = fcmp ult <2 x half> %a, zeroinitializer			%p = fcmp ult <2 x half> %a, zeroinitializer
	%x = select <2 x i1> %p, <2 x half> %a, <2 x half> zeroinitializer			%x = select <2 x i1> %p, <2 x half> %a, <2 x half> zeroinitializer
	ret <2 x half> %x			ret <2 x half> %x
	}			}

	; ---- maximum ----			; ---- maximum ----

				declare half @llvm.maximum.f16(half %a, half %b)
				declare float @llvm.maximum.f32(float %a, float %b)
				declare double @llvm.maximum.f64(double %a, double %b)

	; CHECK-LABEL: maximum_half			; CHECK-LABEL: maximum_half
	define half @maximum_half(half %a) #0 {			define half @maximum_half(half %a) #0 {
	; CHECK-NONAN: setp			; CHECK-NONAN: setp
	; CHECK-NONAN: selp.b16			; CHECK-NONAN: selp.b16
	; CHECK-NAN: max.NaN.f16			; CHECK-NAN: max.NaN.f16
	%p = fcmp ugt half %a, 0.0			%p = fcmp ugt half %a, 0.0
	%x = select i1 %p, half %a, half 0.0			%x = select i1 %p, half %a, half 0.0
	ret half %x			ret half %x
	}			}

				; CHECK-LABEL: maximum_intr_half
				define half @maximum_intr_half(half %a, half %b) #0 {
				; CHECK-NONAN-DAG: max.f32
				; CHECK-NONAN-DAG: setp.nan.f32
				; CHECK-NONAN-DAG: selp.b16
				; CHECK-NAN: max.NaN.f16
				%x = call half @llvm.maximum.f16(half %a, half %b)
				ret half %x
				}

	; CHECK-LABEL: maximum_float			; CHECK-LABEL: maximum_float
	define float @maximum_float(float %a) #0 {			define float @maximum_float(float %a) #0 {
	; CHECK-NONAN: setp			; CHECK-NONAN: setp
	; CHECK-NONAN: selp.f32			; CHECK-NONAN: selp.f32
	; CHECK-NAN: max.NaN.f32			; CHECK-NAN: max.NaN.f32
	%p = fcmp ugt float %a, 0.0			%p = fcmp ugt float %a, 0.0
	%x = select i1 %p, float %a, float 0.0			%x = select i1 %p, float %a, float 0.0
	ret float %x			ret float %x
	}			}

				; CHECK-LABEL: maximum_intr_float
				define float @maximum_intr_float(float %a, float %b) #0 {
				; CHECK-NONAN-DAG: max.f32
				; CHECK-NONAN-DAG: setp.nan.f32
				; CHECK-NONAN-DAG: selp.f32
				; CHECK-NAN: max.NaN.f32
				%x = call float @llvm.maximum.f32(float %a, float %b)
				ret float %x
				}

	; CHECK-LABEL: maximum_double			; CHECK-LABEL: maximum_double
	define double @maximum_double(double %a) #0 {			define double @maximum_double(double %a) #0 {
	; CHECK: setp			; CHECK: setp
	; CHECK: selp.f64			; CHECK: selp.f64
	%p = fcmp ugt double %a, 0.0			%p = fcmp ugt double %a, 0.0
	%x = select i1 %p, double %a, double 0.0			%x = select i1 %p, double %a, double 0.0
	ret double %x			ret double %x
	}			}

				; CHECK-LABEL: maximum_intr_double
				define double @maximum_intr_double(double %a, double %b) #0 {
				; CHECK-DAG: max.f64
				; CHECK-DAG: setp.nan.f64
				; CHECK-DAG: selp.f64
				%x = call double @llvm.maximum.f64(double %a, double %b)
				ret double %x
				}

	; CHECK-LABEL: maximum_v2half			; CHECK-LABEL: maximum_v2half
	define <2 x half> @maximum_v2half(<2 x half> %a) #0 {			define <2 x half> @maximum_v2half(<2 x half> %a) #0 {
	; CHECK-NONAN-DAG: setp			; CHECK-NONAN-DAG: setp
	; CHECK-NONAN-DAG: setp			; CHECK-NONAN-DAG: setp
	; CHECK-NONAN-DAG: selp.b16			; CHECK-NONAN-DAG: selp.b16
	; CHECK-NONAN-DAG: selp.b16			; CHECK-NONAN-DAG: selp.b16
	; CHECK-NAN: max.NaN.f16x2			; CHECK-NAN: max.NaN.f16x2
	%p = fcmp ugt <2 x half> %a, zeroinitializer			%p = fcmp ugt <2 x half> %a, zeroinitializer
	%x = select <2 x i1> %p, <2 x half> %a, <2 x half> zeroinitializer			%x = select <2 x i1> %p, <2 x half> %a, <2 x half> zeroinitializer
	ret <2 x half> %x			ret <2 x half> %x
	}			}