Currently there are optimizations for the fadd instruction that do not fire for a constrained fadd. Add some of these optimizations.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4922 | It requires adding support to the IR matchers like m_FSub(), and those are used elsewhere. Which implies testing in places in addition to here. So I'm saving that for a subsequent patch. Small steps. |
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4908 | Can I put this in a different patch? We have the same issue for constrained and non-constrained cases, and we don't have a good way to distinguish them -- nor should we. Most of the required code is present in APFloat, but not all. Would it be OK for me to add the needed bits to APFloat and use them from InstSimplify in a different ticket? |
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4908 | I am not sure I understand what you are going to put into another patch. The transformation: fadd X, -0 ==> X is valid only if X != SNaN, otherwise we have: fadd SNaN, -0 ==> QNaN It does not depend on whether FP environment is default or not, the code before your changes was already incorrect. Such transformation is valid if the operation has flag nnan, but not in general case. Code in APFloat hardly can help, as constant folding is made previously, if X is a constant here, it means it cannot be folded. |
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4908 | IIUC, you are saying SNaN vs. QNaN is more than just a part of the exception state. But that's not how I interpret the current LangRef: ...and that's why *none* of the transforms here are intentionally SNaN preserving/clearing. If we are going to change the behavior of the default FP env, the LangRef must be updated to make that clear. I don't see the motivation yet. |
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4908 | That's true for default FP environment. A difference between SNaN and QNaN is that operations on SNaN raise invalid exception. In default FP environment exceptions are ignored so SNaN and QNaN behave identically. But this code works in non-default FP environment as well. fadd here designates both regular IR node used in default environment as well as its constrained counterpart. So SNaN here must be handled more carefully. |
We could fold in the "strict" exception behavior cases if we had a matcher for a QNaN. But instructions still wouldn't be removed since "strict" makes them be !isInstructionTriviallyDead(). A TODO note is probably sufficient should a m_QNaN() be added in the future.
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4908 | We won't see a NaN here if 'nnan' is specified since simplifyFPOp() will have already turned it to poison and returned. So I think the check for FMF.noNaNs() should be removed since it can't happen and is therefore misleading. We won't fold in the 'strict' exception case because we check for it and reject it. I suspect the real objection is that we're not turning a SNaN into a QNaN here. But with the constant folding that Serge recently added we won't even be here in the "ignore" or "maytrap" cases at all. There won't be a "fadd NaN, -0" case except in the "strict" exception case which we decline to fold. Aside from the unneeded check for FMF.noNaNs() I don't see a problem here. We aren't returning the wrong result here or below. And I do now have code for the APFloat and ConstantFP classes to make it easy for simplifyFPOp() to convert SNaN->QNaN. That's what I was planning on submitting in a different ticket. |
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4908 | This seems to be getting fuzzy when the exception state is "MayTrap", so we probably need to clarify that definition in the LangRef. Is there a regression test below that shows where you think this patch is wrong? I think the fold is correct with MayTrap (assuming the rounding mode is known suitable) because "passes are not required to preserve all exceptions that are implied by the original code"; presumably some other operation is eventually going to generate the invalid exception when it sees the SNaN? |
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4908 | Actually, the check for FMF.noNaNs() is required to catch non-constants. We can still fold "fadd X, -0" if we know X is not a NaN. We still have the "trivially dead" issue where the instruction would be hanging around needlessly, but still... |
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4908 | Rereading @spatel's comment: what about the "X is a variable that happens to have the value of an SNaN" case: it's true that we'll be removing an instruction that would have converted the SNaN to a QNaN. So the removal of the instruction would be observable. I don't see a way around that without eliminating the ability to optimize at all. |
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4908 | I don't think you can make this transformation in the "maytrap" case. It would be ok to optimize an SNaN to a QNaN, but you can't eliminate an instruction that would raise the exception while performing the same conversion. Consider this example: double foo(double x, double y) { #pragma clang fp exceptions(maytrap) double temp; feclearexcept(FE_ALL_EXCEPT); temp = x + -0.0; // Check the x = SNaN case if (fetestexcept(FE_INEXACT)) return -1.0; temp = temp + y; // Check the y = SNaN case if (fetestexcept(FE_INEXACT)) return -2.0; return temp; } Eliminating the fadd x, -0 causes the exception to be raised in the wrong place. |
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4922 | Actually, the check for the default FP environment isn't needed. No matcher support has been done yet. The matchers won't match so there's no need for the guard. |
Update for review comments. Avoid changes that move the location of traps unless we know they won't ever trap due to fast math flags.
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4908 | The transformation fadd X, -0 ==> X is wrong in the two cases:
It is not correct to produce SNaN instead of QNaN. This result might be stored somewhere and could further be processed by other code, which operates in strict mode. Such transformation would cause that code to trap. Behavior of the code with this transformation and without it would be different. So this transformation is safe only if FMF.noNans(). The proposed condition is: if ((!RoundingModeMayBe(Rounding, RoundingMode::TowardNegative) || FMF.noSignedZeros()) && FMF.noNaNs()) | |
4922 | According to https://en.wikipedia.org/wiki/Signed_zero: "x-x=x+(-x)=+0 (for any finite x, -0 when rounding toward negative)" So the check for rounding mode is also necessary, unless FMF.noSignedZero(). |
Changing the way the existing transforms check for FMF.noNaNs() sounds like a different ticket that needs to be done before this one can progress. Unless I misunderstood?
I think the optimization fadd X, -0 ==> X in general case (not FMF.noNaNs()) is incorrect. The obtained values must be identical to what would produce hardware, otherwise it is not an optimization. If you think that such change deserves a separate patch, no problem. It seems to me that this patch could establish correct folding even if it changes non-constrained operation as well.
I really don't think we should be changing how LLVM handles the ebIgnore cases. LLVM currently treats SNaN and QNaN as the same except where it doesn't, and I believe we should keep that behavior unchanged when ignoring exceptions. At the very least we shouldn't be disabling any currently enabled optimizations when ignoring exceptions. @lebedev.ri or @jcranmer ?
Our closest approximation to a strict IEEE-754 mode currently is with ebStrict / "fpexcept.strict" usage, but that isn't documented as being strictly IEEE-754 compliant. If someone wants to go through and add a strict -754 mode at some point then they can. But that's a different project.
It is actually a bug. None of the operation in IEEE-754 compatible implementation may produce SNaN. If some code that would produce QNaN in runtime produces SNaN due to constant folding, such optimization is invalid. A user might use the expression x + 0.0 just to convert SNaN to QNaN, it won't work in the case of such optimization.
Our closest approximation to a strict IEEE-754 mode currently is with ebStrict / "fpexcept.strict" usage, but that isn't documented as being strictly IEEE-754 compliant. If someone wants to go through and add a strict -754 mode at some point then they can. But that's a different project.
Most hardware FP implementations adhere IEEE-754. As LLVM IR is by design target-independent, non-IEEE-754 implementation hardly is interesting. Also, ebIgnore is only an optimization hint, it should not change the semantics of the add operation.
I agree.
It is actually a bug. None of the operation in IEEE-754 compatible implementation may produce SNaN. If some code that would produce QNaN in runtime produces SNaN due to constant folding, such optimization is invalid. A user might use the expression x + 0.0 just to convert SNaN to QNaN, it won't work in the case of such optimization.
Our closest approximation to a strict IEEE-754 mode currently is with ebStrict / "fpexcept.strict" usage, but that isn't documented as being strictly IEEE-754 compliant. If someone wants to go through and add a strict -754 mode at some point then they can. But that's a different project.
Most hardware FP implementations adhere IEEE-754. As LLVM IR is by design target-independent, non-IEEE-754 implementation hardly is interesting. Also, ebIgnore is only an optimization hint, it should not change the semantics of the add operation.
fadd X, -0 ==> X is *NOT* a miscompile, at least given the current LLVM IR semantics: https://alive2.llvm.org/ce/z/TuTiSQ
I would personally strongly suggest to not reason about semantics via hand-waving, but to actually model them in alive2, if it isn't already.
Honestly, i'm quite worried that this is repeating the same approach as in isnan threads.
Some might interpret it as being dismissive/intentionally ignoring documented semantics.
Here is the sample program: https://godbolt.org/z/ssYs6ez91
Hardware converts SNaN + 0 into QNaN.
Could you please write a little longer arguments?
Your point being? Pick one of:
- the optimization is incorrect as per the llvm langref
- the optimization is correct as per the llvm langref, which is itself incorrect
- ???
Which one is it?
This optimization is incorrect because it violates IEEE-754, which states (6.2):
Under default exception handling, any operation signaling an invalid operation exception and for which a floating-point result is to be delivered shall deliver a quiet NaN.
As a result it is inconsistent with the main hardware implementations, including at least X86, RISC-V and ARM. If it matters I can provide the exact references.
As for llvm langref, I think the following statement is cited here: https://llvm.org/docs/LangRef.html#floating-point-environment:
The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment. Therefore, there is no attempt to create or preserve invalid operation (SNaN) or division-by-zero exceptions.
This sentence is about exceptions but not the results of operations. So actually nothing in the langref justifies this transformation.
This is cherry-picking the example to try to prove your point; gcc does not behave as you are suggesting in general. Here is a counter-example based on the first transform (fadd X, -0.0) affected by this patch:
https://godbolt.org/z/qj9Meavnd
As for llvm langref, I think the following statement is cited here: https://llvm.org/docs/LangRef.html#floating-point-environment:
The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment. Therefore, there is no attempt to create or preserve invalid operation (SNaN) or division-by-zero exceptions.This sentence is about exceptions but not the results of operations. So actually nothing in the langref justifies this transformation.
The wording could be made more explicit, but as I suggested in my comment from Aug 6 in this review: the default LLVM FP env ignores exceptions, so citing the IEEE-754 clause that begins with "under default exception handling" to describe SNAN->QNAN behavior is not applicable.
But this has gone out-of-bounds for this patch review - this was just trying to nail down the behavior in a subset of a strict FP env. Please start an llvm-dev discussion if you want to change the LangRef to formalize the behavior you are proposing for default FP.
@kpn - might be worth re-starting this patch as "X + -0.0" only, so we can make progress?
In this case no add instruction is generated, so it does not demonstrate hardware behavior. It is (incorrect) constant folding. ICC does not do such transformation. Could you please file a bug in GCC bug tracker?
As for llvm langref, I think the following statement is cited here: https://llvm.org/docs/LangRef.html#floating-point-environment:
The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment. Therefore, there is no attempt to create or preserve invalid operation (SNaN) or division-by-zero exceptions.This sentence is about exceptions but not the results of operations. So actually nothing in the langref justifies this transformation.
The wording could be made more explicit, but as I suggested in my comment from Aug 6 in this review: the default LLVM FP env ignores exceptions, so citing the IEEE-754 clause that begins with "under default exception handling" to describe SNAN->QNAN behavior is not applicable.
This is peculiarities of IEEE-754 language. Exception handling in this context is what to do if exceptional situation occurs. In 7.1 there is a description of the default exception handling:
This clause also specifies default non-stop exception handling for exception signals, which is to deliver a default result, continue execution, and raise the corresponding status flag (except in the case of exact underflow, see 7.5).
This is exactly the way the exceptions are handled in the default FP environment. Exception must be handled in some way because hardware always signals them if exceptional situation occurs.
As for llvm langref, I think the following statement is cited here: https://llvm.org/docs/LangRef.html#floating-point-environment:
The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment. Therefore, there is no attempt to create or preserve invalid operation (SNaN) or division-by-zero exceptions.This sentence is about exceptions but not the results of operations. So actually nothing in the langref justifies this transformation.
The wording could be made more explicit, but as I suggested in my comment from Aug 6 in this review: the default LLVM FP env ignores exceptions, so citing the IEEE-754 clause that begins with "under default exception handling" to describe SNAN->QNAN behavior is not applicable.
This is peculiarities of IEEE-754 language. Exception handling in this context is what to do if exceptional situation occurs. In 7.1 there is a description of the default exception handling:
This clause also specifies default non-stop exception handling for exception signals, which is to deliver a default result, continue execution, and raise the corresponding status flag (except in the case of exact underflow, see 7.5).This is exactly the way the exceptions are handled in the default FP environment. Exception must be handled in some way because hardware always signals them if exceptional situation occurs.
I agree with @sepavloff's interpretation of IEEE-754. It's a bit jarring switching between -754 and the LangRef or Unix because they use the same words but with different meanings.
However, my understanding is that strict IEEE-754 compliance is _not_ a goal of the LLVM project. And having different behavior than the hardware in the case of NaN handling in the non-trapping case is generally accepted because of the large performance boost in total. LLVM implements its own IR language and NaN handling (non-trapping case) is one of those times where the IR language does not promise to behave like the hardware.
I'll do what @spatel asked and narrow the changes down to get this moving.
Apologies to this patch review for being off-topic again, but the answer is no.
I think you know that's not a standalone folding bug. It's intentional behavior in both compilers which you can easily show with a number of similar examples...
https://godbolt.org/z/fT1j8YPK6
I tried opening a bug against GCC (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102549) and got recommendation to use -fsignaling-nans. There is no such option in clang but gcc manual says that it implies ftrapping-math and the latter is a synonym of -ffp-exception-behavior=strict in clang.
So, you are right. Ignoring peculiarities of SNaN is by design in GCC and IEEE-754 conformant behavior is only possible in strict FP mode. Conformant behavior of GCC for X + 0.0 looks accidental. As this behavior opens optimization possibilities it is reasonable to use it.
Thank you for the discussion!
Ouch, this is a long thread.
If I understand correctly, an important missing feature in Alive2 is support for multiple rounding modes. I didn't implement that so far mostly because I was under the impression that rounding modes was still a working in progress thing. Is this true or is it mostly settled in stone?
I would appreciate if someone could point me out to the functions that can change the rounding mode (e.g., a function call may change the rounding mode, right? is the rounding mode global per thread or per process?). If I get to understand how this thing works I'm happy to implement it in Alive2 :)
The support of non-default rounding modes is still not as mature as it should be.
I would appreciate if someone could point me out to the functions that can change the rounding mode (e.g., a function call may change the rounding mode, right? is the rounding mode global per thread or per process?). If I get to understand how this thing works I'm happy to implement it in Alive2 :)
There are two kinds of rounding mode, static and dynamic. Availability of them depends on the target. The dynamic rounding mode is more universal, most targets support it. This kind of rounding mode is set by a write to a special hardware register. If IR it is done by the intrinsic llvm.set_rounding. Another intrinsic, 'llvm.flt_rounds` can be used to read the value of dynamic rounding mode. The dynamic rounding mode is defined for each thread. Operations that use dynamic rounding mode are represented in IR by constrained intrinsics like:
%add = call float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.dynamic", metadata !"fpexcept.strict")
Support of the dynamic rounding mode is more or less ready.
The static rounding mode is available on some targets, for example on RISC-V. In this case rounding mode is encoded as a part of processor instruction, it is not kept in a separate register. In IR the static rounding mode is represented by special metadata operands like:
%add = call float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.upward", metadata !"fpexcept.strict")
Support of the static rounding more is still weak, mainly because the support of constrained intrinsics for such targets is limited.
Everything that can be tested is now pushed in other tickets. I'll keep this around as a reminder to come back when m_FSub() is updated, but it doesn't need any action from anyone right now.
clang-tidy: warning: invalid case style for function 'RoundingModeMayBe' [readability-identifier-naming]
not useful