Page MenuHomePhabricator

Fix -f[no-]reciprocal-math -ffast-math interaction, including LTO
AbandonedPublic

Authored by wristow on Nov 15 2016, 3:22 PM.

Details

Summary

There are some inconsistencies in the handling of fast-math-flags that are
(a) preventing the selective disabling of individual fast-math features and
(b) keeping some features from working properly with LTO.
The proposed change here fixes an immediate problem where reciprocal-math
isn't handled correctly. This is one small step in handling these issues
that happen for some of the other fast-math-flags as well.

Here is a simple test-case that illustrates the problem for the
reciprocal-math situation.

extern void use(float x, float y);

void test(float a, float b, float c)
{
  float q1 = a / c;
  float q2 = b / c;
  use(q1, q2);
}

Without -ffast-math, two divisions will be done, and with -ffast-math only
one division will happen, since this will be transformed into:

float tmp = 1.0f / c;
float q1 = a * tmp;
float q2 = b * tmp;
use(q1, q2);

The bug is that with -ffast-math -fno-reciprocal-math, this
reciprocal-transformation is not suppressed.


tl;dr

The situation is that passing -ffast-math on the command-line, results in
passing the following 6 lower-level flags to cc1:

-menable-no-infs
-menable-no-nans
-fno-signed-zeros
-freciprocal-math
-fno-trapping-math
-ffp-contract=fast

and also still passing -ffast-math itself to cc1 (the act of passing
-ffast-math to cc1 results in the macro __FAST_MATH__ being defined).

These low level flags can be disabled individually. As an aside, when
-ffast-math is used, and a certain subset of the above are not disabled,
then the switch:

-menable-unsafe-fp-math

is also passed to cc1.

Ultimately, even when -fno-reciprocal-math is passed on the command-line,
the fact that -ffast-math is still passed to cc1 ends up setting the flag
UnsafeAlgebra in LLVM, which ends up over-riding the user's request to
suppress the reciprocal-math transformation. The code-change here is a
fairly simple one to deal with this.

Prior to this change, the reciprocal transformations are enabled when
either the fast or arcp IR-level flags are on. The philosophy of the
approach taken here is to only enable reciprocal transformations when
arcp is on. To put it another way, rather than an "umbrella" flag such
as fast being checked in the back-end (along with an individual flag like
arcp), it seems to me that just checking the individual flag expresses
the need more cleanly. Any fast-math-related transformation that doesn't
have an individual flag (e.g., re-association currently doesn't), should
eventually have an individual flag defined for it, and then that individual
flag should be checked. In the end, if -ffast-math sets the 6
lower-level flags described above, then the equivalent setting of the
individual flags should be equivalent. That is, ultimately, the following
2 user-commands should produce the same code:

clang -c -O2 -ffast-math foo.c
clang -c -O2 -D__FAST_MATH__ -fno-honor-infinities -fno-honor-nans -fno-signed-zeros -freciprocal-math -fno-trapping-math -ffp-contract=fast foo.c

and this proposed change is a small step in that direction.

This is my first venture into this area of LLVM, and I may be
misunderstanding some of the bigger-picture aspects. Maybe the approach of
controlling the reciprocal transformation strictly by arcp (rather than
arcp OR fast) is counter to some other expectations. In which case,
I'll be happy to learn more about how this is intended to work.

Related to this being my first venture into this area, although this fixes
the immediate problem, even with it there are some lurking issues (possibly
in Clang rather than LLVM, or maybe in both, but I'm not sure).
Specifically, if the above test-case (that computes the two quotients q1
and q2) is compiled with -ffast-math producing a .ll file, then there
is no indication in the .ll file that the reciprocal transformation is
enabled. That is, the arcp flag is not on the division instructions
(although the fast flag is, as expected -- but in this model I'm
describing, the fast flag is not to be checked for this). Continuing to
process that .ll file through to an assembly file shows two divisions
happening, confirming that (incorrectly) the reciprocal-transformation
does not happen. For example:

$ clang -S -o test_via_ll.ll -emit-llvm -O2 -ffast-math test.c
$ llc -o test_via_ll.s test_via_ll.ll  # via .ll: -ffast-math did not get the job done
$ grep div test_via_ll.s
        divss   %xmm2, %xmm0
        divss   %xmm2, %xmm1
$

Whereas doing the same compilation via a .bc file does honor the request
for doing the reciprocal transformation:

$ clang -c -o test_via_bc.bc -emit-llvm -O2 -ffast-math test.c
$ llc -o test_via_bc.s test_via_bc.bc
$ grep div test_via_bc.s               # via .bc: -ffast-math did get the job done
        divss   %xmm2, %xmm3
$

To be clear, without this proposed change, the approach via the .ll file
does correctly do the reciprocal transformation (as does the .bc approach).
And with my proposed change, the .bc approach continues to work (and the
bug that's the whole point of this patch is fixed), but the .ll approach shown
above fails. Manually adding the arcp flag to test_via_ll.s does result
in the reciprocal transformation being done with the updated compiler, as
expected.

I don't think this .ll issue causes any problems when going through normal
compilations (that is, when producing object files or bitcode files), but
it clearly is trouble when producing .ll files.

Diff Detail

Event Timeline

wristow updated this revision to Diff 78085.Nov 15 2016, 3:22 PM
wristow retitled this revision from to Fix -f[no-]reciprocal-math -ffast-math interaction, including LTO.
wristow updated this object.
wristow added reviewers: spatel, hfinkel.
wristow added a subscriber: llvm-commits.

This is related to a discussion in PR27372 (although not the original issue of that PR).

This change seems wrong. Our IR defines fast as implying arcp: http://llvm.org/docs/LangRef.html#fast-math-flags

If I understand what you wrote correctly, the correct target of your attention would be Clang instead of LLVM.

This change seems wrong. Our IR defines fast as implying arcp: http://llvm.org/docs/LangRef.html#fast-math-flags

So the change to make -ffast-math -fno-reciprocal-math work should be (a) remove the fast flag, and (b) add all the other ones (except arcp) to each relevant place?

hfinkel edited edge metadata.Nov 15 2016, 3:57 PM

This change seems wrong. Our IR defines fast as implying arcp: http://llvm.org/docs/LangRef.html#fast-math-flags

So the change to make -ffast-math -fno-reciprocal-math work should be (a) remove the fast flag, and (b) add all the other ones (except arcp) to each relevant place?

We might reconsider having 'fast' imply all of the other flags. I believe this was a suboptimal design choice, and so long as someone is willing to do the work to separate out the various semantic requirements, we should allow that work to proceed. We should discuss this on llvm-dev first, however.

We might reconsider having 'fast' imply all of the other flags. I believe this was a suboptimal design choice, and so long as someone is willing to do the work to separate out the various semantic requirements, we should allow that work to proceed.

I have to admit I felt "funny" about changing the semantics of whether fast should imply all the other flags. Reconsidering that design choice is what I was implicitly suggesting when I said:

To put it another way, rather than an "umbrella" flag such
as fast being checked in the back-end (along with an individual flag like
arcp), it seems to me that just checking the individual flag expresses
the need more cleanly. Any fast-math-related transformation that doesn't
have an individual flag (e.g., re-association currently doesn't), should
eventually have an individual flag defined for it, and then that individual
flag should be checked.

Regarding:

We should discuss this on llvm-dev first, however.

Sounds good. I'll start a discussion on llvm-dev.

We should discuss this on llvm-dev first, however.

Sounds good. I'll start a discussion on llvm-dev.

http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html

I think it would be a lot better to have sub-options to fast-math in a similar way to how target features are handled (via -mattr).

I think it would be a lot better to have sub-options to fast-math in a similar way to how target features are handled (via -mattr).

Let me clarify---not to fix this issue here, but in general. It would make setting/clearing of the individual options much clearer.

This change seems wrong. Our IR defines fast as implying arcp: http://llvm.org/docs/LangRef.html#fast-math-flags

So the change to make -ffast-math -fno-reciprocal-math work should be (a) remove the fast flag, and (b) add all the other ones (except arcp) to each relevant place?

This is the behavior I'd expect clang to have.

I think it would be a lot better to have sub-options to fast-math in a similar way to how target features are handled (via -mattr).

Let me clarify---not to fix this issue here, but in general. It would make setting/clearing of the individual options much clearer.

Can you give a straw man example of how such options would look like on the command line?

This change seems wrong. Our IR defines fast as implying arcp: http://llvm.org/docs/LangRef.html#fast-math-flags

So the change to make -ffast-math -fno-reciprocal-math work should be (a) remove the fast flag, and (b) add all the other ones (except arcp) to each relevant place?

This is the behavior I'd expect clang to have.

OK, thanks. There's more discussion over on the mailing list, so I'll continue over there.

Let me clarify---not to fix this issue here, but in general. It would make setting/clearing of the individual options much clearer.

Can you give a straw man example of how such options would look like on the command line?

Sure. For example, something like
-ffast-math=+noinf,-nonan -ffast-math=+recip
would enable "no infinities", disable "no NaNs", and enable the use of reciprocals. Each such occurrence of -ffast-math would behave as if it was combined with all the preceding ones, i.e. the above would be equivalent to -ffast-math=+noinf,-nonan,+recip.

There could also be something like -ffast-math=none and -ffast-math=all to disable/enable all available settings respectively.

While the existing options could be handled meaningfully, this scheme has the benefit of being less ambiguous to the user.

The IBM XLC compiler has something similar for -qstrict.

Neat! Thanks Krzysztof.

wristow abandoned this revision.May 21 2018, 3:12 PM

Abandoning this very old proposed patch. This patch was about some fundamental issues with the "umbrella" aspect of the FMF fast. This was intended to be a small step in solving those issues. That umbrella aspect of fast was fixed by by Sanjay (https://reviews.llvm.org/D39304). With that groundwork done, there have been a handful of additional improvements. With all those improvements, the issue of this patch is no longer a problem.