This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] fix miscompiles caused by using ValueTracking matchSelectPattern to create FMINIMUM/FMAXIMUM
ClosedPublic

Authored by spatel on Feb 1 2023, 1:06 PM.

Details

Summary

ValueTracking attempts to match compare+select patterns to FP min/max operations, but it was created before the newer IEEE-754-2019 minimum/maximum ops were defined. Ie, matchSelectPattern() does not account for the -0.0/+0.0 behavior that is specified in the newer standard.

FMINIMUM/FMAXIMUM nodes were created to map to the newer standard:

/// FMINIMUM/FMAXIMUM - NaN-propagating minimum/maximum that also treat -0.0
/// as less than 0.0. While FMINNUM_IEEE/FMAXNUM_IEEE follow IEEE 754-2008
/// semantics, FMINIMUM/FMAXIMUM follow IEEE 754-2018 draft semantics.

We could adjust ValueTracking to deal with signed zero, but it seems like a moot point given the divergent NaN behavior discussed in D143056, so just delete this possibility to avoid bugs when converting IR to SDAG.

Diff Detail

Event Timeline

spatel created this revision.Feb 1 2023, 1:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2023, 1:06 PM
spatel requested review of this revision.Feb 1 2023, 1:06 PM
samparker accepted this revision.Feb 2 2023, 2:09 AM

Thanks for doing this.

It looks like it could cause some unexpected performance hiccups though... So, for the targets that would return the input NaN, do you think it would be worth implementing a target API for ValueTracking so that it could make a decision, or is this something that have to be sunk into the DAGBuilder? Are there any other uses of matchSelectPattern that could also be making a bogus decision?

This revision is now accepted and ready to land.Feb 2 2023, 2:09 AM
arsenm added a comment.Feb 2 2023, 2:56 AM

Can you preserve the match with nsz?

I think we need to just rename FMINNUM to FMIN, and have explicit FMINNUM_IEEE2008 (with snan handling and unspecified -0 vs. 0, or whatever version these were introduced), and FMINNUM_IEEE2019 (with specified +0 vs. 0)

Can you preserve the match with nsz?

I'm not sure I follow why signalling is important here? I've noticed that we'll lower FMINNUM to FMINIMUM, in TargetLowering::expandFMINNUM_FMAXNUM, in the absence of any NaNs - but that also doesn't appear to respect the +/-0 issue raised here.

arsenm added a comment.Feb 2 2023, 5:30 AM

Can you preserve the match with nsz?

I'm not sure I follow why signalling is important here?

I didn’t say it was, but changing from select to a min introduces quieting which wouldn’t happen before

spatel added a comment.Feb 2 2023, 7:09 AM

It looks like it could cause some unexpected performance hiccups though... So, for the targets that would return the input NaN, do you think it would be worth implementing a target API for ValueTracking so that it could make a decision, or is this something that have to be sunk into the DAGBuilder? Are there any other uses of matchSelectPattern that could also be making a bogus decision?

I agree that this seems likely to cause some perf regressions based on the test diffs. If we can find a way to avoid those, that works for me, but it's such a mess, I figured we should just remove this hunk of wrong code as a first step.

Let me try to summarize where we stand:

  1. The select pattern in IR guarantees that we would return a NaN input bitwise exactly. The FMINIMUM/FMAXIMUM intrinsics/nodes don't guarantee that, and as noted in D143056, the targets that implement this operation tend to return a canonical NaN rather than propagate a NaN payload. Given that, changing semantics in IR or SDAG to accommodate this transform does not seem viable.
  1. The select pattern in IR differs in its treatment of -0.0, so we have potential for real numeric miscompiles as shown in the test diffs here. It seems likely that we will add even more MIN/MAX node variations to match target behavior, but I'm not sure what that behavior is yet, so I don't know what a good solution will be. As an example of an existing codegen semantic variation of min/max, note that x86 has target-specific FMAXC/FMINC nodes.
  1. We could look for 'nsz' and 'nnan' on the IR to make the transform sound, but it requires a questionable combination of FMF on the fcmp and select: https://alive2.llvm.org/ce/z/vjV9AC . This might get better with the nofpclass attribute proposed in D139902.
  1. I don't think we need to worry about SNaN with this transform because we're not using strict math -- strict variants of the MIN/MAX nodes already exist -- but I'm also not sure how that path works yet. I was ignoring that complication for now. There is an attempt to clarify SNaN behavior with a LangRef edit in D143074 in case that part wasn't clear enough.
spatel added a comment.Feb 2 2023, 7:54 AM

So, for the targets that would return the input NaN, do you think it would be worth implementing a target API for ValueTracking so that it could make a decision, or is this something that have to be sunk into the DAGBuilder?

Can't rule out transforms/analysis in IR...but currently, it doesn't look good. It seems more likely that we would defer to codegen (DAGCombiner or target-specific isel).

Are there any other uses of matchSelectPattern that could also be making a bogus decision?

I did an audit of trunk LLVM for SPF_MINNUM and related API, and this looks like the only potential misuse.

This revision was landed with ongoing or failed builds.Feb 3 2023, 6:59 AM
This revision was automatically updated to reflect the committed changes.
uabelho added a subscriber: uabelho.Feb 6 2023, 5:46 AM

Hi,

Would the rewrite be legal if the fcmp was "fast" or would that also lead to miscompiles?
So e.g. if we have

%cmp = fcmp fast ogt float %a, %b
%cond = select i1 %cmp, float %a, float %b

?

spatel added a comment.Feb 6 2023, 6:31 AM

Hi,

Would the rewrite be legal if the fcmp was "fast" or would that also lead to miscompiles?
So e.g. if we have

%cmp = fcmp fast ogt float %a, %b
%cond = select i1 %cmp, float %a, float %b

?

To make this transform sound, the inputs must be neither -0.0 nor NaN. But "nsz" doesn't guarantee that - it just means the sign of zero is "insignificant". That is already implied by the definition of fcmp, so "nsz" on fcmp is redundant/meaningless - "nsz" needs to be on the select. We have the opposite problem with "nnan" on the select. By definition, the select filters out a possible poison value (NaN), so a "nnan" on the select doesn't give us the required pre-condition that neither input is NaN.

That's why we have this strange proof:
https://alive2.llvm.org/ce/z/vjV9AC