When in a mode that flushes denormals, we don't want to transform FSUB(-0.0,X) -> FNEG(X). The former is an arith operation that will flush a denormal input to 0. The latter is a bitwise operation that will only flip the sign bit.
Marked as [WIP] since the logic is a little weird. Hoping @arsenm and others can offer some guidance...
- Notice that we still perform the transformation when in DenormalMode::IEEE. This is counter-intuitive. IEEE-754 is what specifies that these operations are distinct, but only in regards to side-effects, not denormal flushing. LLVM optimizations do not preserve side-effects, and both operation results will be bitwise identical when we're not flushing denormals, so I think this is the correct thing to do.
Although, there's also the problem of this transform changing the sign of a NaN in DenormalMode::IEEE. Do we want to take that into consideration? E.g. an FSUB(-0.0, NaN) should produce a canonical NaN with the same payload, while FNEG(NaN) produces -NaN. If I'm not mistaken, IEEE-754 doesn't specify the sign of a NaN result, besides being a canonical NaN.
- Also notice that we still perform the transformation when in DenormalMode::Invalid. I believe that Invalid is actually a flush to zero mode. However, I think it makes sense to leave the default mode unchanged wrt disabling this transform. There could be a very small (and hard to measure) performance penalty for using a proper FSUB on some targets.
Thoughts about any of this?
AMDGPU basically already has this, but it requires a depth argument similar to computeKnownBits.