There are some inconsistencies in the handling of fast-math-flags that are
(a) preventing the selective disabling of individual fast-math features and
(b) keeping some features from working properly with LTO.
The proposed change here fixes an immediate problem where reciprocal-math
isn't handled correctly. This is one small step in handling these issues
that happen for some of the other fast-math-flags as well.
Here is a simple test-case that illustrates the problem for the
reciprocal-math situation.
extern void use(float x, float y); void test(float a, float b, float c) { float q1 = a / c; float q2 = b / c; use(q1, q2); }
Without -ffast-math, two divisions will be done, and with -ffast-math only
one division will happen, since this will be transformed into:
float tmp = 1.0f / c; float q1 = a * tmp; float q2 = b * tmp; use(q1, q2);
The bug is that with -ffast-math -fno-reciprocal-math, this
reciprocal-transformation is not suppressed.
tl;dr
The situation is that passing -ffast-math on the command-line, results in
passing the following 6 lower-level flags to cc1:
-menable-no-infs -menable-no-nans -fno-signed-zeros -freciprocal-math -fno-trapping-math -ffp-contract=fast
and also still passing -ffast-math itself to cc1 (the act of passing
-ffast-math to cc1 results in the macro __FAST_MATH__ being defined).
These low level flags can be disabled individually. As an aside, when
-ffast-math is used, and a certain subset of the above are not disabled,
then the switch:
-menable-unsafe-fp-math
is also passed to cc1.
Ultimately, even when -fno-reciprocal-math is passed on the command-line,
the fact that -ffast-math is still passed to cc1 ends up setting the flag
UnsafeAlgebra in LLVM, which ends up over-riding the user's request to
suppress the reciprocal-math transformation. The code-change here is a
fairly simple one to deal with this.
Prior to this change, the reciprocal transformations are enabled when
either the fast or arcp IR-level flags are on. The philosophy of the
approach taken here is to only enable reciprocal transformations when
arcp is on. To put it another way, rather than an "umbrella" flag such
as fast being checked in the back-end (along with an individual flag like
arcp), it seems to me that just checking the individual flag expresses
the need more cleanly. Any fast-math-related transformation that doesn't
have an individual flag (e.g., re-association currently doesn't), should
eventually have an individual flag defined for it, and then that individual
flag should be checked. In the end, if -ffast-math sets the 6
lower-level flags described above, then the equivalent setting of the
individual flags should be equivalent. That is, ultimately, the following
2 user-commands should produce the same code:
clang -c -O2 -ffast-math foo.c clang -c -O2 -D__FAST_MATH__ -fno-honor-infinities -fno-honor-nans -fno-signed-zeros -freciprocal-math -fno-trapping-math -ffp-contract=fast foo.c
and this proposed change is a small step in that direction.
This is my first venture into this area of LLVM, and I may be
misunderstanding some of the bigger-picture aspects. Maybe the approach of
controlling the reciprocal transformation strictly by arcp (rather than
arcp OR fast) is counter to some other expectations. In which case,
I'll be happy to learn more about how this is intended to work.
Related to this being my first venture into this area, although this fixes
the immediate problem, even with it there are some lurking issues (possibly
in Clang rather than LLVM, or maybe in both, but I'm not sure).
Specifically, if the above test-case (that computes the two quotients q1
and q2) is compiled with -ffast-math producing a .ll file, then there
is no indication in the .ll file that the reciprocal transformation is
enabled. That is, the arcp flag is not on the division instructions
(although the fast flag is, as expected -- but in this model I'm
describing, the fast flag is not to be checked for this). Continuing to
process that .ll file through to an assembly file shows two divisions
happening, confirming that (incorrectly) the reciprocal-transformation
does not happen. For example:
$ clang -S -o test_via_ll.ll -emit-llvm -O2 -ffast-math test.c $ llc -o test_via_ll.s test_via_ll.ll # via .ll: -ffast-math did not get the job done $ grep div test_via_ll.s divss %xmm2, %xmm0 divss %xmm2, %xmm1 $
Whereas doing the same compilation via a .bc file does honor the request
for doing the reciprocal transformation:
$ clang -c -o test_via_bc.bc -emit-llvm -O2 -ffast-math test.c $ llc -o test_via_bc.s test_via_bc.bc $ grep div test_via_bc.s # via .bc: -ffast-math did get the job done divss %xmm2, %xmm3 $
To be clear, without this proposed change, the approach via the .ll file
does correctly do the reciprocal transformation (as does the .bc approach).
And with my proposed change, the .bc approach continues to work (and the
bug that's the whole point of this patch is fixed), but the .ll approach shown
above fails. Manually adding the arcp flag to test_via_ll.s does result
in the reciprocal transformation being done with the updated compiler, as
expected.
I don't think this .ll issue causes any problems when going through normal
compilations (that is, when producing object files or bitcode files), but
it clearly is trouble when producing .ll files.