Page MenuHomePhabricator

Expand fminimum/fmaximum into fminnum/fmaxnum + NaN check
Needs ReviewPublic

Authored by gflegar on Nov 8 2022, 9:51 AM.

Details

Reviewers
csigg
efriedma
Summary

We do not have an instruction for this in PTX prior to SM 8.0, so we are
expanding it. However, there is no expansion defined for this op in LLVM, so
define a custom expansion for the NVPTX backend instead (the same thing does
not really work on LLVM level due to fminnum/fmaxnum semantics for
-0.0 / +0.0).

Diff Detail

Event Timeline

gflegar created this revision.Nov 8 2022, 9:51 AM
gflegar requested review of this revision.Nov 8 2022, 9:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 9:51 AM
nikic added a subscriber: nikic.Nov 8 2022, 10:06 AM

Doesn't this handle signed zero incorrectly?

Doesn't this handle signed zero incorrectly?

I believe it is:

For FMINIMUM:

Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 ULT Tmp2 == True   =>  Tmp3 = Tmp1 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0
Tmp1 = +0.0, Tmp2 = -0.0  =>  Tmp1 ULT Tmp2 == False  =>  Tmp3 = Tmp2 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0

For FMAXIMUM:

Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 UGT Tmp2 == False  =>  Tmp3 = Tmp2 = +0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = +0.0
Tmp1 = +0.0, Tmp2 = -0.0  =>  Tmp1 UGT Tmp2 == True   =>  Tmp3 = Tmp1 = +0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = +0.0
nikic added a comment.Nov 9 2022, 2:28 AM

Doesn't this handle signed zero incorrectly?

I believe it is:

For FMINIMUM:

Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 ULT Tmp2 == True   =>  Tmp3 = Tmp1 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0

Isn't -0.0 ULT 0.0 false, because negative zero and positive zero are equal?

Doesn't this handle signed zero incorrectly?

I believe it is:

For FMINIMUM:

Tmp1 = -0.0, Tmp2 = +0.0  =>  Tmp1 ULT Tmp2 == True   =>  Tmp3 = Tmp1 = -0.0;     Tmp2 UO Tmp2 = False  =>  Result = Tmp3 = -0.0

Isn't -0.0 ULT 0.0 false, because negative zero and positive zero are equal?

Right, I checked the standard now. For comparisons they're considered equal, but for maximum / minimum there's a special exception that they're considered as -0.0 < 0.0. Why would anyone define this so inconsistently ... -.-
The problem for us here is performance. We would need more instructions to implement it correctly via comparisons (2 to compare + select, at least 2 for 0 handling, at least 2 for NaNs). I'm mostly concerned with the PTX backend, and for it the correct and efficient way to expand this would be to use minnum (builtin instruction), and then a NaN check. If that is true, we generate our own NaN constant (if I'm reading the standard correctly it just requires us to return a quiet NaN, which doesn't have to be the same NaN as any of the operands).
I'll try doing that instead.

gflegar updated this revision to Diff 474267.Nov 9 2022, 7:20 AM

Change lowering to minnum/maxnum + NaN check

Also update the failing arm test to pass.
I'm not an expert on ARM, but looking at ARMISelLowering.cpp, it does
specify exactly under which conditions minimum / maximum instructions
are available. Thus, they also likely had the same silent failure
that the PTX side had, and the test was likely wrong (since the checks
were auto-generated).

@nikic 0 handling should be fixed now

nikic added a comment.Nov 9 2022, 7:32 AM

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Unfortunately, I don't have the bandwidth to chase this rabbit hole further, especially since our use case is insensitive to what happens for -0 and +0. I can add a TODO comment to fix this. Though I would still argue for submitting this, as it is correct modulo -0/+0, which is far preferable to the current state where we have a silent failure (and produce invalid code) for the backends that attempt to expand the op (like in NVPTX).

nikic added a comment.Nov 9 2022, 9:23 AM

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Unfortunately, I don't have the bandwidth to chase this rabbit hole further, especially since our use case is insensitive to what happens for -0 and +0. I can add a TODO comment to fix this. Though I would still argue for submitting this, as it is correct modulo -0/+0, which is far preferable to the current state where we have a silent failure (and produce invalid code) for the backends that attempt to expand the op (like in NVPTX).

That sounds like an unrelated bug. If an operation is Expand, but we don't support expanding it, shouldn't that result in an isel failure?

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Unfortunately, I don't have the bandwidth to chase this rabbit hole further, especially since our use case is insensitive to what happens for -0 and +0. I can add a TODO comment to fix this. Though I would still argue for submitting this, as it is correct modulo -0/+0, which is far preferable to the current state where we have a silent failure (and produce invalid code) for the backends that attempt to expand the op (like in NVPTX).

That sounds like an unrelated bug. If an operation is Expand, but we don't support expanding it, shouldn't that result in an isel failure?

Yes, there is an orthogonal bug where this is a silent, and not a real failure. However, even if that bug is fixed, we would still fail (just earlier), which is fixed by this change.

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

gflegar updated this revision to Diff 474535.Nov 10 2022, 6:43 AM

Limit expansion only to NVPTX backend

There it ends up handling all the cases correctly, due to the expanded
semantics of the min/max PTX instructions for +/-0.0.

gflegar added a comment.EditedNov 10 2022, 6:48 AM

I think this is the best we can do short of adding a new op. We only do the expansion for the NVPTX backed, and don't support it otherwise. In the NVPTX backed, the intermediate code still ends up being semantically incorrect for +/-0.0, but since FMINNUM/FMAXNUM lower to PTX min/max, which do implement the 2018 semantics of those ops, the final PTX ends up being correct.

Once we have an op in LLVM that represents the 2018 semantics, we can lower it to that instead, to make the intermediate code semantically correct as well.

gflegar retitled this revision from Expand fminimum and fmaximum into a pair of selects to Expand fminimum/fmaximum into fminnum/fmaxnum + NaN check.Nov 10 2022, 6:49 AM
gflegar edited the summary of this revision. (Show Details)
arsenm added a subscriber: arsenm.Nov 10 2022, 8:40 AM

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

It's not OK to have wrong intermediate code. We do have the "new" semantic opcodes already in FMINIMUM/FMAXIMUM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
2227–2228

Should go through APFloat

2227–2228

Can also move this to generic code and check which of the variants are legal

gflegar updated this revision to Diff 474565.Nov 10 2022, 9:10 AM

Formatting fixes

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

It's not OK to have wrong intermediate code. We do have the "new" semantic opcodes already in FMINIMUM/FMAXIMUM

It's even less OK to fail altogether (which is what is happening without this patch). And we're not talking about the new semantics for FMINIMUM/FMAXIMUM, but for the new semantics of FMINNUM/FMAXNUM (the +/-0 handling changed).

tra added a subscriber: tra.Nov 10 2022, 10:33 AM

We do not have an instruction for this in PTX prior to SM 8.0,

I assume that we're talking about min.nan.*/man.nan.* instruction variants that appeared in PTX7.0 on sm80+.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-max
docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-max

Looks like we do not properly constrain instruction availability in llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td#L940
There are no predicates on sm80+ or ptx70 and we sort of rely on custom lowering, and not always correctly as things stand:
https://godbolt.org/z/G8vYb5ajT

This patch should help generating correct instructions for fp64.

Still, I think we need to fix instruction definitions to correctly reflect their availability.

On a side note, we may want to add some min/max correctness tests to CUDA tests in llvm test-suite. Considering that we have different lowering on different GPUs, we do want to make sure that we actually do consistently get the results we expect across different GPUs and CUDA versions. We currently do not have any sm80 GPUs on cuda buildbots, but we'll get them eventually.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
615

I think this should be refactored into a more generic GetGPUAction(sm, ptx, ifAvailableAction, FallbackAction).

This would make it clear which actions we take and why. GetMinMax action just says 'magic'.

llvm/test/CodeGen/NVPTX/fminimum-fmaximum.ll
65

What exactly do we end up generating here?

If both setp and minthe are inputs for selp then -DAG should be removed from selp

I don't think this is right either. LLVM defines minnum according to the old semantics, which don't specify an order between zeroes. We'd need a separate ISD opcode for minnum according to 2018 semantics.

Interestingly though, for the NVPTX backend this will end up producing the correct code, since minnum is lowered to the min/max PTX instructions, which defines the 2018 semantics. (I do agree though that the intermediate code is not correct.)

It's not OK to have wrong intermediate code. We do have the "new" semantic opcodes already in FMINIMUM/FMAXIMUM

It's even less OK to fail altogether (which is what is happening without this patch).

Hard disagree

And we're not talking about the new semantics for FMINIMUM/FMAXIMUM, but for the new semantics of FMINNUM/FMAXNUM (the +/-0 handling changed).

The 2019 final spec does not have minnum or maxnum; they were removed and replaced with minimum and maximum which have specified signed zero behavior. Unless there was a draft revision I missed, there was never a defined minnum with specified -0 behavior. It would be helpful to define minnum/maxnum variants with specified -0 ordered less than +0

nikic added a comment.Nov 11 2022, 2:20 AM

And we're not talking about the new semantics for FMINIMUM/FMAXIMUM, but for the new semantics of FMINNUM/FMAXNUM (the +/-0 handling changed).

The 2019 final spec does not have minnum or maxnum; they were removed and replaced with minimum and maximum which have specified signed zero behavior. Unless there was a draft revision I missed, there was never a defined minnum with specified -0 behavior. It would be helpful to define minnum/maxnum variants with specified -0 ordered less than +0

I don't have access to the 2019 spec, but as far as I know it specifies both minimum and minimumNumber, where minimumNumber is 2008 minnum with a) specified signed zero behavior and b) fixed sNaN behavior (i.e. the FMINNUM rather than FMINNUM_IEEE behavior). That's what I meant by the 2019 minnum semantics.

I don't have access to the 2019 spec, but as far as I know it specifies both minimum and minimumNumber, where minimumNumber is 2008 minnum with a) specified signed zero behavior and b) fixed sNaN behavior (i.e. the FMINNUM rather than FMINNUM_IEEE behavior). That's what I meant by the 2019 minnum semantics.

OK, yes I was confused by the name change. "minimum" is basically the same with the specified signed zero behavior. Regardless, we should have another pair of min/max with the defined signed zero handling

I don't have access to the 2019 spec, but as far as I know it specifies both minimum and minimumNumber, where minimumNumber is 2008 minnum with a) specified signed zero behavior and b) fixed sNaN behavior (i.e. the FMINNUM rather than FMINNUM_IEEE behavior). That's what I meant by the 2019 minnum semantics.

OK, yes I was confused by the name change. "minimum" is basically the same with the specified signed zero behavior. Regardless, we should have another pair of min/max with the defined signed zero handling

Alternatively, since I believe we don't actually have any users that don't specify the correct signed zero handling, we could just redefine FMINNUM_IEEE/FMAXNUM_IEEE to have the new behavior.