This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Driver/ToolChains/
-
Driver/
-
ToolChains/
11/13
Clang.cpp
-
test/Driver/
-
Driver/
3/5
fast-math.c
-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
PowerPC/
1/2
fmf-propagation.ll
-
X86/
-
fp-contract.ll

Differential D72675

[Clang][Driver] Fix -ffast-math/-ffp-contract interaction
Needs ReviewPublic

Authored by wristow on Jan 13 2020, 7:19 PM.

Download Raw Diff

Details

Reviewers

spatel
mcberg2017
hfinkel
efriedma
scanon
arsenm
echristo
RKSimon
andrew.w.kaylor
uweigand
kpn
cameron.mcinally
yaxunl

Summary

Fused Multiply Add (FMA) was not always being disabled when the switch -ffp-contract=off was used. More specifically, FMA is enabled when -ffp-contract=fast is used, and it also is enabled implicitly with -ffast-math. The combination:

-ffast-math -ffp-contract=off

is intended to leave most of fast-math enabled (for example, leave reassociation, reciprocal transformations, etc.) enabled, but disable the use of FMA. However, FMA was incorrectly left enabled with the above switch combination. This commit fixes this, allowing users to enable most of the fast-math optimizations, while disabling the FMA feature.

Diff Detail

Event Timeline

wristow created this revision.Jan 13 2020, 7:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2020, 7:19 PM

Herald added subscribers: jsji, hiraditya, nemanjai. · View Herald Transcript

hfinkel added a subscriber: hfinkel.Jan 13 2020, 7:56 PM

hfinkel added inline comments.

clang/lib/Driver/ToolChains/Clang.cpp
2721	I think that you just need && !FPContract.equals("off") or && !(FPContract.equals("off") \|\| FPContract.equals("on")) of which I think the latter. fp-contract=no is also not a fast-math-compatible setting in that regard, right?
2745	So now we pass it twice, because we also pass it here?
2763	You want the check here, I think, and not below so that if parts of fast-math are disabled `__FAST_MATH__` is not set. That seems to be what the comment/code currently do, although what does GCC do in this regard?

Herald added a subscriber: • wuzish. · View Herald TranscriptJan 13 2020, 7:56 PM

Thanks for the quick feedback @hfinkel

clang/lib/Driver/ToolChains/Clang.cpp
2721	Eliminating the `FPContractDisabled` variable, and instead checking that the value isn't "off" (or maybe isn't ("off" or "on")), sounds good. I'm not 100% sure of whether just "off" or both "off" and "on" ought to be checked. More on that in the discussion about `__FAST_MATH__` in one of your other comments.
2745	Whoops. This one was here originally, so I'll leave it here, and delete the one I needlessly added, above.
2763	I had originally included the check with the set of conditions of line 2763, but then the `warn_drv_overriding_flag_option` warning (below) was missed in the case of `-ffp-model=fast -ffp-contract=off`. So I think the check does need to remain inside the above conditional. Regarding the `__FAST_MATH__` aspect and whether `ffp-contract=on` also ought to suppress the `-ffast-math` emission (and what GCC does), this is somewhat fuzzy for me. After getting your comments, I'm leaning toward including both the "off" and "on" in the check (rather than only disabling it in the case of "off", as the current patch does). But I want to at least note my observations here, to make the current situation with both Clang and GCC clear, in case others want to chime in. Somewhat surprisingly to me, GCC's behavior wrt `__FAST_MATH__` seems buggy in general, so it's not a good base to compare against. For example, the following set of switches result in the `__FAST_MATH__` macro being defined (although I expected they would have disabled it): -ffast-math -fno-associative-math -fno-reciprocal-math -frounding-math (I've just tested GCC 7.4.0.) I could have sworn that in the past that for GCC I had seen this sort of disabling of part of the "fast-math set" result in the disabling of the def of `__FAST_MATH__`. As you'd expect with Clang (from the comment in our code code), the above switches do suppress the definition of `__FAST_MATH__`. That said, GCC does suppress the `__FAST_MATH__` def in some situations. E.g., both GCC and Clang suppress it with the following switches: -ffast-math -fmath-errno -ffast-math -ftrapping-math -ffast-math -fsigned-zeros In any case, for the `-ffp-contract` switch, none of the settings suppress the def of `__FAST_MATH__` with GCC. That is, all of the following result in `__FAST_MATH__` being defined: -ffast-math -ffp-contract=fast -ffast-math -ffp-contract=on -ffast-math -ffp-contract=off I think that's buggy (but hard to know if it's `-ffp-contract` specific, or part of the general buggyness of GCC and `__FAST_MATH__`). Clang (prior to the change described here) also defines `__FAST_MATH__` in all of these `-ffp-contract` situations. But I think that's just part of the bug I'm trying to address here. Bottom line: The intended semantics are somewhat fuzzy to me. I'd say: -ffast-math -ffp-contract=fast // should leave __FAST_MATH__ defined -ffast-math -ffp-contract=off // should not define __FAST_MATH__ -ffast-math -ffp-contract=on // unsure ???? For that last one ("unsure ???"), in the current patch I was leaving `__FAST_MATH__` defined. But I'm now leaning toward also suppressing the `__FAST_MATH__` def in that case. Unless anyone disagrees, I'll include that change when I update the patch.

Addressed comments from @hfinkel .

wristow marked 2 inline comments as done.Jan 14 2020, 4:54 PM

wristow marked an inline comment as done.Jan 14 2020, 4:58 PM

mcberg2017 added a reviewer: hfinkel.Jan 14 2020, 5:08 PM

spatel added inline comments.Jan 15 2020, 11:41 AM

llvm/test/CodeGen/PowerPC/fmf-propagation.ll
201–261	I was ok with the reasoning up to here, but this example lost me. Why does 'contract' alone allow splitting an fma? Is 'contract' relevant on anything besides fadd (with an fmul operand)?

wristow marked an inline comment as done.Jan 15 2020, 5:32 PM

wristow added inline comments.

llvm/test/CodeGen/PowerPC/fmf-propagation.ll
201–261	I have to admit I'm handwaving here. I don't really understand why 'contract' alone allows this simpliciation: `fma(X, 7.0, X * 42.0) --> X * 49.0` to happen. But either that's correct, or it's a separate bug (and I waved my hands about explaining more detail). The short story is that even without my proposed change, having only 'contract' on the FMA intrinsic results in this being simplified to `X * 49` (and also having only 'reassoc' did). The original comment: `; This is the minimum FMF needed for this transform - the FMA allows reassociation.` is incomplete/misleading. A more complete comment (relevant before my change) would have been: `; This is the minimum FMF needed for this transform - either 'reassoc' or 'contract' applied to the FMA intrinsic allows reassociation.` And with my change, the result is that 'reassoc' is no longer relevant. But since I don't really understand why it should be simplified with only `contract`, I should put a TODO comment in. That is, to me it seems like both 'contract' and 'reassoc' should be needed for this simplification to happen (whereas previously it worked with either of them individually). So maybe change this comment to: ; This is the minimum FMF needed for this transform - applying 'contract' to the FMA intrinsic allows reassociation. ; TODO: It seems 'contract' and 'reassoc' combined should be needed for this transform. Why does it work with only 'contract'?

We crossed that bridge internally at Apple a while ago, meaning I have some code debt for cleaning up open source for fma formation that uses contract and reassoc differently than we do today, both together and separately, case by case.

In D72675#1823227, @mcberg2017 wrote:

We crossed that bridge internally at Apple a while ago, meaning I have some code debt for cleaning up open source for fma formation that uses contract and reassoc differently than we do today, both together and separately, case by case.

Ah, great. I don't have a preference about the order of patches (ie, is it better to make this change first or do the backend cleanup first), so whatever works better.

But it would be better to have all of the baseline tests in place, so we know where we stand currently. So I'd prefer to have the PPC and x86 tests pre-committed to master (change test comments as needed to match current/future reality).

We could also decouple the clang part of this patch from the backend change IIUC. But do we need another change (or at least tests) to show how the driver difference translates in the clang front-end to LLVM FMF and/or function attributes?

clang/test/Driver/fast-math.c
183–184	Do we need another pair of tests for -ffp-contract=on?

I should do the other DAG combiner fma changes after this is wrapped up.

This all sounds good to me.

So to make sure we're all on the same page, my understanding is that the plan forward is:

Make the Clang change first (including adding another pair of tests for -ffp-contract=on).
Update the LLVM tests illustrating the current baseline state.
Make the LLVM change shown here, and update the tests with the fixes.
Bring in the DAG combiner work that Michael has done internally at Apple.

Points 1, 2, and 3 are essentially the points in the patch posted here, so I'll do that. And of course Michael will then take on point 4.

Is that the plan? If yes, I'll transition this item to just be the Clang pieces, and I'll spin off a new one to do the LLVM portion of what is posted here.

Sanjay, regarding:

But it would be better to have all of the baseline tests in place, so we know where we stand currently.

I'm interpreting that as applying to the LLVM tests. That is, the Clang change, along with the updated tests, can be in one commit, rather than first updating the Clang driver tests with the current state. If you'd prefer two commits on the Clang side as well, let me know.

In D72675#1824659, @wristow wrote:

This all sounds good to me.

So to make sure we're all on the same page, my understanding is that the plan forward is:

Make the Clang change first (including adding another pair of tests for -ffp-contract=on).

Update the LLVM tests illustrating the current baseline state.

Make the LLVM change shown here, and update the tests with the fixes.

Bring in the DAG combiner work that Michael has done internally at Apple.

Points 1, 2, and 3 are essentially the points in the patch posted here, so I'll do that. And of course Michael will then take on point 4.

Is that the plan? If yes, I'll transition this item to just be the Clang pieces, and I'll spin off a new one to do the LLVM portion of what is posted here.

SGTM

Sanjay, regarding:

But it would be better to have all of the baseline tests in place, so we know where we stand currently.

I'm interpreting that as applying to the LLVM tests. That is, the Clang change, along with the updated tests, can be in one commit, rather than first updating the Clang driver tests with the current state. If you'd prefer two commits on the Clang side as well, let me know.

One commit for the clang changes should be ok; it's a very small diff. But I'm still not sure if the driver change induces frontend diffs that we should make visible via tests.

One commit for the clang changes should be ok; it's a very small diff. But I'm still not sure if the driver change induces frontend diffs that we should make visible via tests.

The only thing I can think of is that it changes whether/when __FAST_MATH__ is defined. But that'll be indirectly tested, via the updated tests in "Driver/fast-math.c", which will verify that "-ffast-math" is passed appropriately (and the __FAST_MATH__ dependency on "-ffast-math" is already tested in "Preprocessor/predefined-macros.c").

In addition to adding the second pair of driver tests you suggested for -ffp-contract=on, I'll also add another pair for -ffp-contract=fast to verify that "-ffast-math" is passed in that case (no change in behavior for that pair, but just confirming it continues to work correctly).

Changing this to address only the Clang driver aspect, as discussed in the comments. Will spin off a separate patch for the LLVM side.

This follows the reasoning from the earlier discussion, but after re-reading the gcc comment in particular, I'm wondering whether this is what we really want to do...
If FAST_MATH is purely here for compatibility with gcc, then should we mimic gcc behavior in setting that macro even if we think it's buggy?
Ie, when we translate these settings to LLVM's FMF, we can still override the -ffast-math flag by checking the -ffp-contract flag (if I'm seeing it correctly, the existing code will pass that alongside -ffast-math when contract is set to on/off).

clang/lib/Driver/ToolChains/Clang.cpp
2720	This comment doesn't match the code any more. Should be "If -ffp-contract=off/on, then..."
clang/test/Driver/fast-math.c
184	typo: conteact

Adding potential FP reviewers for question of gcc- (potentially-buggy-) compatibility.

Herald added a subscriber: wdng. · View Herald TranscriptJan 17 2020, 6:03 AM

spatel added reviewers: andrew.w.kaylor, uweigand, kpn, cameron.mcinally.Jan 17 2020, 6:21 AM

Updated patch to correct a comment and fix a typo.

Regarding the point from @spatel :

This follows the reasoning from the earlier discussion, but after re-reading the gcc comment in particular, I'm wondering whether this is what we really want to do...
If __FAST_MATH__ is purely here for compatibility with gcc, then should we mimic gcc behavior in setting that macro even if we think it's buggy?

Thinking about it, yes, I think we should mimic GCC. FTR, I've tried some older GCCs via Compiler Explorer, and found them to be consistent in that removing some aspects of -ffast-math disables the generation of __FAST_MATH__ and removing other aspects of it does not. So my previous recollection ("I could have sworn that in the past that for GCC I had seen this sort of disabling of part of the "fast-math set" result in the disabling of the def of __FAST_MATH__.") is wrong. Bottom line, maybe it's not a bug in GCC, and instead it's the desired GCC behavior (just that I cannot find specific documentation of precisely which settings disable the generation of it).

However, I'd call that a bigger question than this -ffp-contract aspect of __FAST_MATH__, and a separate bug, that can be fixed separately.

Ie, when we translate these settings to LLVM's FMF, we can still override the -ffast-math flag by checking the -ffp-contract flag (if I'm seeing it correctly, the existing code will pass that alongside -ffast-math when contract is set to on/off).

How to handle this seems like an implementation question. The code here is just deciding whether or not we intend to pass "-ffast-math" to cc1 (it isn't directly defining __FAST_MATH__). If we do pass it to cc1, then "clang/lib/Frontend/InitPreprocessor.cpp" will pre-define __FAST_MATH__:

if (LangOpts.FastMath)
  Builder.defineMacro("__FAST_MATH__");

It seems to me that a straightforward way to deal with this question is to make the above test more elaborate (that is, if LangOpts.FastMath is set, OR whatever the appropriate subset of fast-math switches are set). If we do that, we don't need to deal with the "umbrella" aspect of "-ffast-math", which gets messy.

Which is to say the approach here can stay the same as it currently is (-ffp-contract=off/on suppressing the passing of "-ffast-math" to cc1). Although the comment about __FAST_MATH__ here in "Clang.cpp" should be changed, if we take this approach.

In D72675#1827893, @wristow wrote:
How to handle this seems like an implementation question. The code here is just deciding whether or not we intend to pass "-ffast-math" to cc1 (it isn't directly defining __FAST_MATH__). If we do pass it to cc1, then "clang/lib/Frontend/InitPreprocessor.cpp" will pre-define __FAST_MATH__:
if (LangOpts.FastMath)
  Builder.defineMacro("__FAST_MATH__");
It seems to me that a straightforward way to deal with this question is to make the above test more elaborate (that is, if LangOpts.FastMath is set, OR whatever the appropriate subset of fast-math switches are set). If we do that, we don't need to deal with the "umbrella" aspect of "-ffast-math", which gets messy.

Which is to say the approach here can stay the same as it currently is (-ffp-contract=off/on suppressing the passing of "-ffast-math" to cc1). Although the comment about __FAST_MATH__ here in "Clang.cpp" should be changed, if we take this approach.

That sounds fine to me, so could update that comment/this patch.

Let's give this a couple of days after updating to see if anyone else has feedback though. I'm never exactly sure what our responsibility is with respect to gcc compatibility.

I've had a quick look at GCC, and it seems there's a couple of different issues.

First of all, -ffast-math is a "collective" flag that has no separate meaning except setting a bunch of other flags. Currently, these flags are: -fno-math-errno, -funsafe-math-optimizations (which is itself a collective flag enabling -fno-signed-zeros, -fno-trapping-math, -fassociative-math, and -freciprocal-math), -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range, and -fexcess-precision=fast.

When deciding whether to define __FAST_MATH__, GCC will not specifically look at whether or not the -ffast-math option itself was given, but whether the status of those other flags matches what would have been set by -ffast-math. However, it does not include all of the flags; currently only -fmath-errno, -funsafe-math-optimizations, -fsigned-zeros, -ftrapping-math, -ffinite-math-only, and -fexcess-precision are checked, while -fassociative-math, -freciprocal-math, -frounding-math, -fsignaling-nans, and -fcx-limited-range are ignored.

I'm not sure whether this is deliberate (but it seems weird) or just a bug. I can ask the GCC developers ...

A separate question is the interaction of -ffast-math with -ffp-contract=. Currently, there is no such interaction whatsoever in GCC: -ffast-math does not imply any particular -ffp-contract= setting, and vice versa the -ffp-contract= setting is not considered at all when defining __FAST_MATH__. This seems at least internally consistent.

I'm not sure whether this is deliberate (but it seems weird) or just a bug. I can ask the GCC developers ...

Please do. If there's a rationale, we should know.

A separate question is the interaction of -ffast-math with -ffp-contract=. Currently, there is no such interaction whatsoever in GCC: -ffast-math does not imply any particular -ffp-contract= setting, and vice versa the -ffp-contract= setting is not considered at all when defining __FAST_MATH__. This seems at least internally consistent.

That's interesting, and as you said, internally consistent behavior in GCC. I think it would be a fine idea for us to do the same thing.

Looking into this point, I see that (ignoring fastmath for the moment) our default -ffp-contract behavior is different than GCC's. GCC enables FMA by default when optimization is high enough ('-O2') without any special switch needed. For example, taking an architecture that supports FMA (Haswell), GCC has the following behavior:

float test(float a, float b, float c)
{
  // FMA is enabled by default for GCC (on Haswell), so done at -O2:
  //    gcc -S -O2 -march=haswell test.c  # FMA happens
  //    $ gcc -S -march=haswell test.c ; egrep 'mul|add' test.s
  //            vmulss  -12(%rbp), %xmm0, %xmm0
  //            vaddss  -4(%rbp), %xmm0, %xmm0
  //    $ gcc -S -O2 -march=haswell test.c ; egrep 'mul|add' test.s
  //            vfmadd231ss     %xmm2, %xmm1, %xmm0
  //    $
  return a + (b * c);
}

As we'd expect, GCC does disable FMA with -ffp-contract=off (this is irrespective of whether -ffast-math was specified). Loosely, GCC's behavior can be summarized very simply on this point as:
Suppress FMA when -ffp-contract=off.
(As an aside, GCC's behavior with -ffp-contract=on is non-intuitive to me. It relates to the FP_CONTRACT pragma, which as far as I can see is ignored by GCC.)

In contrast, we do not enable FMA by default (via general optimization, such as '-O2'). For example:

$ clang -S -O2 -march=haswell test.c ; egrep 'mul|add' test.s
        vmulss  %xmm2, %xmm1, %xmm1
        vaddss  %xmm0, %xmm1, %xmm0
        .addrsig
$

I think that whether we want to continue doing that (or instead, enable it at '-O2', like GCC does), is a separate issue. I can see arguments either way.

We do enable FMA with -ffp-contract=fast, as desired (and also with -ffp-contract=on). And we do "leave it disabled" with -ffp-contract=off (as expected).

Now, bringing fastmath back into the discussion, we do enable FMA with -fffast-math. If we decide to continue leaving it disabled by default, then enabling it with -ffast-math seems perfectly sensible. (If we decide to enable it by default when optimization is high enough, like GCC, then turning on -ffast-math should not disable it of course.)

The problem I want to address here is that if the compiler is a mode where FMA is enabled (whether that's at '-O2' "by default", or whether it's because the user turned on -ffast-math), then appending -ffp-contract=off should disable FMA. I think this patch (along with the small change to "DAGCombiner.cpp", in an earlier version of this patch) is a reasonable approach to solving that. I'd say this patch/review/discussion has raised two additional questions:

Under what conditions should __FAST_MATH__ be defined?
Should we enable FMA "by default" at (for example) '-O2'?

I think these additional questions are best addressed separately. My 2 cents are that for (1), mimicking GCC's behavior seems reasonable (although that's assuming we don't find out that GCC's __FAST_MATH__ behavior is a bug). And for (2), I don't have a strong preference.

In D72675#1839492, @wristow wrote:

Should we enable FMA "by default" at (for example) '-O2'?

We recently introduced a new option "-ffp-model=[precise|fast|strict], which is supposed to serve as an umbrella option for the most common combination of options. I think our default behavior should be equivalent to -ffp-model=precise, which enables contraction. It currently seems to enable -ffp-contract=fast in the precise model (https://godbolt.org/z/c6w8mJ). I'm not sure that's what it should be doing, but whatever the precise model does should be our default.

In D72675#1839662, @andrew.w.kaylor wrote:

In D72675#1839492, @wristow wrote:

Should we enable FMA "by default" at (for example) '-O2'?

We recently introduced a new option "-ffp-model=[precise|fast|strict], which is supposed to serve as an umbrella option for the most common combination of options. I think our default behavior should be equivalent to -ffp-model=precise, which enables contraction. It currently seems to enable -ffp-contract=fast in the precise model (https://godbolt.org/z/c6w8mJ). I'm not sure that's what it should be doing, but whatever the precise model does should be our default.

That makes sense to me. So -ffp-model/-ffp-contract interaction should be examined, and possibly adjusted. If an adjustment is needed, I think it makes sense to handle that interaction separately.

I'm going to update this patch to adjust the comment that @spatel and I discussed earlier, so it no longer implies that __FAST_MATH__ is defined only when "all of" -ffast-math is enabled.

Update a comment to remove the discussion about enabling the __FAST_MATH__ preprocessor macro. The handling of the setting of __FAST_MATH__ is done in "clang/lib/Frontend/InitPreprocessor.cpp", and once we decide on the set of conditions that enable that definition, we can add an appropriate comment there.

andrew.w.kaylor added inline comments.Jan 24 2020, 4:35 PM

clang/lib/Driver/ToolChains/Clang.cpp
2721	I think this would read better as "... && !FPContract.equals("off") && !FPContract.equals("on")" The '\|\|' in the middle of all these '&&'s combined with the extra parens from the function call trips me up.
2764–2765	As above, I'd prefer "!off && !on".
2772	This is a bit of an oddity in our handling. The FPContract local variable starts out initialized to an empty string. So, what we're saying here is that if you've used all the individual controls to set the rest of the fast math flags, we'll turn on fp-contract=fast also. That seems wrong. If you use "-ffast-math", FPContract will have been set to "fast" so that's not applicable here. I believe this means -fno-honor-infinities -fno-honor-nans -fno-math-errno -fassociative-math -freciprocal-math -fno-signed-zeros -fno-trapping-math -fno-rounding-math will imply "-ffp-contract=fast". Even worse: -ffp-contract=off -fno-fast-math -fno-honor-infinities -fno-honor-nans -fno-math-errno -fassociative-math -freciprocal-math -fno-signed-zeros -fno-trapping-math -fno-rounding-math will imply "-ffp-contract=fast" because "-fno-fast-math" resets FPContract to empty. I think we should initialize FPContract to whatever we want to be the default (on?) and remove the special case for empty here. Also, -fno-fast-math should either not change FPContract at all or set it to "off". Probably the latter since we're saying -ffast-math includes -ffp-contract=on.
clang/test/Driver/fast-math.c
196	What about "-ffp-contract=off -ffast-math"? The way the code is written that will override the -ffp-contract option. That's probably what we want, though it might be nice to have a warning.

wristow marked 4 inline comments as done.Jan 24 2020, 7:18 PM

wristow added inline comments.

clang/lib/Driver/ToolChains/Clang.cpp
2721	Sounds good. Will do.
2764–2765	As above, will do.
2772	This is a bit of an oddity in our handling. Yes it is! This is certainly getting more complicated than I had originally expected. I feel there isn't a clear spec on what we want in terms of whether FMA should be enabled "automatically" at (for example) '-O2', and/or whether it should be enabled by `-ffast-math`. I'm very willing to make whatever change is needed here, to meet whatever spec we all ultimately agree on. So I think this patch should be put on hold until we decide on these wider aspects. One minor note about: ... Probably the latter since we're saying -ffast-math includes -ffp-contract=on. I think that is intended to say "-ffast-math includes -ffp-contract=fast". The semantics of `-ffp-contract=on` are another part that seems unclear (or at least, more complicated, since it then gets into the FP_CONTRACT pragma).
clang/test/Driver/fast-math.c
196	Yes, currently `-fffast-math` will override that earlier `-ffp-contract=off` settings. It's unclear to me whether we're ultimately intending for that to be the behavior (because GCC doesn't do that, as @uweigand noted). I guess this is another reason to hold off for a bit, until we figure out the wider spec.

About:

This is a bit of an oddity in our handling.

Yes it is!

This is certainly getting more complicated than I had originally expected. I feel there isn't a clear spec on what we want in terms of whether FMA should be enabled "automatically" at (for example) '-O2', and/or whether it should be enabled by -ffast-math. I'm very willing to make whatever change is needed here, to meet whatever spec we all ultimately agree on.

So I think this patch should be put on hold until we decide on these wider aspects.

Thinking about this a bit more, one thing that I believe there is a clear spec on, is that when -ffp-contract=off is "the last switch related to FMA", then FMA ought to be disabled. So for example, -ffast-math -ffp-contract=off should result in FMA being disabled, irrrspective of how we decide on the "wider aspects" of the spec mentioned above. (FTR, this inability to enable FastMath in general, but explicitly disable FMA, was a problem one of our customers reported, which is what initially motivated this patch.)

Note that the points about the "oddity" of "-ffp-contract=fast" being implied by the two long sets of switches in this comment are unrelated to the change proposed here. That is, that oddity is the current behavior of Clang.

So one approach we could take is to first fix the problem of when -ffp-contract=off is the "the last switch related to FMA", it isn't disabling FMA in some cases. And then as a followup, address the oddities that have been noted here. If we decide to take this two-step approach, I'm happy to take on the followup work (once we decide what the spec should be).

Used the clearer '!off && !on' (rather than '!(off || on)') in the checks.

Added more tests that note the current situation that -ffast-math enables FMA. overriding an earlier switch that had disabled it (included a "TODO" comment in these tests, since it isn't clear if that behavior is what we will ultimately stick with).

wristow marked 4 inline comments as done.Jan 27 2020, 2:54 PM

wristow added inline comments.

clang/test/Driver/fast-math.c
196	Added more tests that illustrate this behavior of "-ffp-contract=off/on -ffast-math" overriding and enabling FMA. (I haven't added a warning. Holding off until we decide on other questions discussed in this review.)

In D72675#1829866, @hfinkel wrote:

I'm not sure whether this is deliberate (but it seems weird) or just a bug. I can ask the GCC developers ...

Please do. If there's a rationale, we should know.

Sorry for the delay ... I've raised that question on the GCC list, and the opinion seems to be that the current behavior is indeed a bug -- however there's still discussion ongoing on what the proper fix ought to be. I'll report back once we've agreed on a solution on the GCC side ...

Revisiting this patch. I think that before fixing the -ffp-contract=off problem I originally raised here, there are two questions that have come up in this discussion that we should first resolve. Specifically:

(1) Do we want to enable FMA transformations by default (at appropriate optimization levels), like GCC does? That is, should FMA be done for targets that support it with a command as simple as the following?

clang -O2 -c test.c

FTR, currently we don't do FMA with simply -O2. We do enable FMA if -ffast-math is added, or (of course) if FMA is explicitly enabled via -ffp-contract=fast. Also note that with GCC, FMA is completely orthogonal to -ffast-math. For example:

gcc -O2 -ffp-contract=off -ffast-math test.c  # '-ffast-math' does not turn FMA "back on"

(2) Under what conditions do we want to predefine the __FAST_MATH__ macro? That is, should we continue with our current policy (essentially, define it when "all the fast-math-flags are enabled"), or do we want to do precisely what GCC does (define it when some particular subset of the fast-math-flags are enabled)?

My vote for (1) is yes. To add to this, I would make the -ffp-contract setting independent of -ffast-math, as GCC does. But to be explicit, enabling FMA at -O2 is a change in behavior that may be unexpected for some users -- so I can imagine some users objecting to this change.

My vote for (2) is to keep our current behavior. This approach seems more sensible to me than the GCC behavior. And we're not boxing ourselves into anything by keeping the current behavior (that is, we can just as easily change this in the future if we find the GCC behavior is better for some reason). Among other things, this means that whether we define __FAST_MATH__ will continue to not be impacted by the -ffp-contract setting -- that makes sense to me, especially if we make FMA orthogonal to -ffast-math, as discussed in (1).

One related point. Our documentation (UsersManual.rst) for -ffp-model=precise says:

Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (`-ffp-contract=fast`). This is the default behavior.

So this explicitly says that FMA is enabled by default (consistent with GCC in this regard). However, that doesn't match our implementation, as we have FMA disabled by default, as noted in this discussion. Concretely, the following two commands are not equivalent:

clang -c -O2 test.c                         # FMA is disabled
clang -c -O2 -ffp-model=precise test.c      # FMA is enabled

If we decide to enable FMA by default (that is, if we agree on "yes" for (1)), then the current -ffp-model=precise description will become correct. But if we decide to continue to disable FMA by default, then we'll need to change the above description (possibly add another value to -ffp-model=<value> that will really be the default).

In D72675#1917844, @wristow wrote:
Revisiting this patch. I think that before fixing the -ffp-contract=off problem I originally raised here, there are two questions that have come up in this discussion that we should first resolve. Specifically:

(1) Do we want to enable FMA transformations by default (at appropriate optimization levels), like GCC does? That is, should FMA be done for targets that support it with a command as simple as the following?
clang -O2 -c test.c

This has been discussed/tried a few times including very recently. I'm not sure where we stand currently, but here's some background:
https://reviews.llvm.org/rL282259
D74436 - cc @mibintc

And (sorry for the various flavors of links to the dev-lists, but that's how it showed up in the search results):
http://lists.llvm.org/pipermail/llvm-dev/2017-March/111129.html
http://lists.llvm.org/pipermail/cfe-dev/2020-February/064645.html
https://groups.google.com/forum/#!msg/llvm-dev/nBiCD5KvYW0/yfjrVzR7DAAJ
https://groups.google.com/forum/#!topic/llvm-dev/Aev2jQYepKQ
http://clang-developers.42468.n3.nabble.com/RFC-FP-contract-on-td4056277.html

In D72675#1919685, @spatel wrote:
In D72675#1917844, @wristow wrote:
Revisiting this patch. I think that before fixing the -ffp-contract=off problem I originally raised here, there are two questions that have come up in this discussion that we should first resolve. Specifically:

(1) Do we want to enable FMA transformations by default (at appropriate optimization levels), like GCC does? That is, should FMA be done for targets that support it with a command as simple as the following?
clang -O2 -c test.c
This has been discussed/tried a few times including very recently. I'm not sure where we stand currently, but here's some background:
https://reviews.llvm.org/rL282259
D74436 - cc @mibintc

And (sorry for the various flavors of links to the dev-lists, but that's how it showed up in the search results):
http://lists.llvm.org/pipermail/llvm-dev/2017-March/111129.html
http://lists.llvm.org/pipermail/cfe-dev/2020-February/064645.html
https://groups.google.com/forum/#!msg/llvm-dev/nBiCD5KvYW0/yfjrVzR7DAAJ
https://groups.google.com/forum/#!topic/llvm-dev/Aev2jQYepKQ
http://clang-developers.42468.n3.nabble.com/RFC-FP-contract-on-td4056277.html

A few weeks ago there was a discussion on cfe-dev and llvm-dev initiated by @andrew.w.kaylor and while the response wasn't unanimous, there was an overwhelming sentiment (in my opinion) that it is correct and desireable to enable ffp-contract=on by default even at -O0. I had submitted a patch to do that. However the patch caused performance regressions in about 20 LNT tests and the patch was reverted, the regression was seen on multiple architecture. I can send provide a few more details about the performance problems, I don't have a solution to the LNT regression.

mibintc added a subscriber: rjmccall.Mar 12 2020, 10:36 AM

This patch, which hasn't been committed, contains modifications to the UserManual with many details concerning 'floating point semantic modes" and the relation to floating point command line options. This is from a discussion that @andrew.w.kaylor initiated on the discussion lists. https://reviews.llvm.org/D74436

In D72675#1919793, @mibintc wrote:
In D72675#1919685, @spatel wrote:
In D72675#1917844, @wristow wrote:
Revisiting this patch. I think that before fixing the -ffp-contract=off problem I originally raised here, there are two questions that have come up in this discussion that we should first resolve. Specifically:

(1) Do we want to enable FMA transformations by default (at appropriate optimization levels), like GCC does? That is, should FMA be done for targets that support it with a command as simple as the following?
clang -O2 -c test.c
This has been discussed/tried a few times including very recently. I'm not sure where we stand currently, but here's some background:
https://reviews.llvm.org/rL282259
D74436 - cc @mibintc

And (sorry for the various flavors of links to the dev-lists, but that's how it showed up in the search results):
http://lists.llvm.org/pipermail/llvm-dev/2017-March/111129.html
http://lists.llvm.org/pipermail/cfe-dev/2020-February/064645.html
https://groups.google.com/forum/#!msg/llvm-dev/nBiCD5KvYW0/yfjrVzR7DAAJ
https://groups.google.com/forum/#!topic/llvm-dev/Aev2jQYepKQ
http://clang-developers.42468.n3.nabble.com/RFC-FP-contract-on-td4056277.html
A few weeks ago there was a discussion on cfe-dev and llvm-dev initiated by @andrew.w.kaylor and while the response wasn't unanimous, there was an overwhelming sentiment (in my opinion) that it is correct and desireable to enable ffp-contract=on by default even at -O0. I had submitted a patch to do that. However the patch caused performance regressions in about 20 LNT tests and the patch was reverted, the regression was seen on multiple architecture. I can send provide a few more details about the performance problems, I don't have a solution to the LNT regression.

Thanks for the update! I saw your questions about digging into the LNT regressions, but I don't have the answers (ping that message in case someone out there missed it?).
Let's break this into pieces and see if we can make progress apart from that:

Where is the list of LNT tests that showed problems?
Were the regressions on x86 bots? If so, many developers can likely repro those locally.
For regressions on other targets, we can guess at the problems by comparing asm (or recruit someone with the target hardware?).

@David.Bolvansky told me “https://lnt.llvm.org/ -> Test suite nts -> watch for

LNT-Broadwell-AVX2-O3clang_PRODx86_64:1364”

The fails were seen on aarch too and a couple other arch. AFAIK the old results are no longer availble. i scraped the list of fails, pasting here:
home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/report.simple.csv
Importing 'report.json'
Import succeeded.

Tested: 2560 tests --

FAIL: MultiSource/Applications/oggenc/oggenc.execution_time (513 of 2560)
FAIL: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.execution_time (514 of 2560)
FAIL: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.execution_time (515 of 2560)
FAIL: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.execution_time (516 of 2560)
FAIL: SingleSource/Benchmarks/Linpack/linpack-pc.execution_time (517 of 2560)
FAIL: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.execution_time (518 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.execution_time (519 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/datamining/covariance/covariance.execution_time (520 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/2mm/2mm.execution_time (521 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/3mm/3mm.execution_time (522 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax.execution_time (523 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg/bicg.execution_time (524 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemver/gemver.execution_time (525 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv.execution_time (526 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.execution_time (527 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv.execution_time (528 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trmm/trmm.execution_time (529 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt.execution_time (530 of 2560)
FAIL: SingleSource/Benchmarks/Polybench/stencils/adi/adi.execution_time (531 of 2560)
FAIL: SingleSource/UnitTests/Vector/SSE/sse_expandfft.execution_time (532 of 2560)
FAIL: SingleSource/UnitTests/Vector/SSE/sse_stepfft.execution_time (533 of 2560)

I may be wrong, but i suspect those failures aren't actually due to the fact
that we pessimize optimizations with this change, but that the whole execution
just fails. Can you try running test-suite locally? Do tests themselves actually pass,
ignoring the question of their performance?

In D72675#1920309, @lebedev.ri wrote:

I may be wrong, but i suspect those failures aren't actually due to the fact
that we pessimize optimizations with this change, but that the whole execution
just fails. Can you try running test-suite locally? Do tests themselves actually pass,
ignoring the question of their performance?

I find the LNT output very hard to decipher, but I thought that all of the failures on the Broadwell (x86) LNT bot were just performance regressions. There were many perf improvements and also many regressions. I investigated the top regression and found that the loop unroller made a different decision when the llvm.fmuladd intrinsic was used than it did for separate mul and add operations. The central loop of the test was unrolled eight times rather than four. Broadwell gets less benefit from FMA than either Haswell or Skylake, so any other factors that might drop performance are less likely to be mitigated by having fused these operations. In a more general sense, I don't see any reason apart from secondary changes in compiler behavior like this that allowing FMA would cause performance to drop.

At least one other target had execution failures caused by Melanie's change, but I understood it to be simply exposing a latent problem in the target-specific code.

arsenm added a reviewer: yaxunl.Aug 18 2023, 6:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2023, 6:35 AM

Herald added subscribers: StephenFan, MaskRay. · View Herald Transcript

Revision Contents

Path

Size

clang/

lib/

Driver/

ToolChains/

Clang.cpp

13 lines

test/

Driver/

fast-math.c

7 lines

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

2 lines

test/

CodeGen/

PowerPC/

fmf-propagation.ll

286 lines

X86/

fp-contract.ll

204 lines

Diff 237834

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,694 Lines • ▼ Show 20 Lines	if (StrictFPModel) {
: Args.MakeArgString(A->getSpelling() + A->getValue()));		: Args.MakeArgString(A->getSpelling() + A->getValue()));
}		}
}		}

// If we handled this option claim it		// If we handled this option claim it
A->claim();		A->claim();
}		}

		// If -ffp-contract=off has been specified on the command line, then we must
		// suppress the emission of -ffast-math and -menable-unsafe-fp-math to cc1.
		bool FPContractDisabled = false;
		if (!FPContract.empty()) {
		CmdArgs.push_back(Args.MakeArgString("-ffp-contract=" + FPContract));
		FPContractDisabled = FPContract.equals("off");
		}

if (!HonorINFs)		if (!HonorINFs)
CmdArgs.push_back("-menable-no-infs");		CmdArgs.push_back("-menable-no-infs");

if (!HonorNaNs)		if (!HonorNaNs)
CmdArgs.push_back("-menable-no-nans");		CmdArgs.push_back("-menable-no-nans");

if (MathErrno)		if (MathErrno)
CmdArgs.push_back("-fmath-errno");		CmdArgs.push_back("-fmath-errno");

if (!MathErrno && AssociativeMath && ReciprocalMath && !SignedZeros &&		if (!MathErrno && AssociativeMath && ReciprocalMath && !SignedZeros &&
		spatelUnsubmitted Not Done Reply Inline Actions This comment doesn't match the code any more. Should be "If -ffp-contract=off/on, then..." spatel: This comment doesn't match the code any more. Should be "If -ffp-contract=off/on, then..."
!TrappingMath)		!TrappingMath && !FPContractDisabled)
		hfinkelUnsubmitted Done Reply Inline Actions I think that you just need && !FPContract.equals("off") or && !(FPContract.equals("off") \|\| FPContract.equals("on")) of which I think the latter. fp-contract=no is also not a fast-math-compatible setting in that regard, right? hfinkel: I think that you just need && !FPContract.equals("off") or && !(FPContract.equals("off")…
		wristowAuthorUnsubmitted Done Reply Inline Actions Eliminating the `FPContractDisabled` variable, and instead checking that the value isn't "off" (or maybe isn't ("off" or "on")), sounds good. I'm not 100% sure of whether just "off" or both "off" and "on" ought to be checked. More on that in the discussion about `__FAST_MATH__` in one of your other comments. wristow: Eliminating the `FPContractDisabled` variable, and instead checking that the value isn't "off"…
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions I think this would read better as "... && !FPContract.equals("off") && !FPContract.equals("on")" The '\|\|' in the middle of all these '&&'s combined with the extra parens from the function call trips me up. andrew.w.kaylor: I think this would read better as "... && !FPContract.equals("off") && !FPContract.equals…
		wristowAuthorUnsubmitted Done Reply Inline Actions Sounds good. Will do. wristow: Sounds good. Will do.
CmdArgs.push_back("-menable-unsafe-fp-math");		CmdArgs.push_back("-menable-unsafe-fp-math");

if (!SignedZeros)		if (!SignedZeros)
CmdArgs.push_back("-fno-signed-zeros");		CmdArgs.push_back("-fno-signed-zeros");

if (AssociativeMath && !SignedZeros && !TrappingMath)		if (AssociativeMath && !SignedZeros && !TrappingMath)
CmdArgs.push_back("-mreassociate");		CmdArgs.push_back("-mreassociate");

if (ReciprocalMath)		if (ReciprocalMath)
CmdArgs.push_back("-freciprocal-math");		CmdArgs.push_back("-freciprocal-math");

if (TrappingMath) {		if (TrappingMath) {
// FP Exception Behavior is also set to strict		// FP Exception Behavior is also set to strict
assert(FPExceptionBehavior.equals("strict"));		assert(FPExceptionBehavior.equals("strict"));
CmdArgs.push_back("-ftrapping-math");		CmdArgs.push_back("-ftrapping-math");
} else if (TrappingMathPresent)		} else if (TrappingMathPresent)
CmdArgs.push_back("-fno-trapping-math");		CmdArgs.push_back("-fno-trapping-math");

if (!DenormalFPMath.empty())		if (!DenormalFPMath.empty())
CmdArgs.push_back(		CmdArgs.push_back(
Args.MakeArgString("-fdenormal-fp-math=" + DenormalFPMath));		Args.MakeArgString("-fdenormal-fp-math=" + DenormalFPMath));

if (!FPContract.empty())		if (!FPContract.empty())
CmdArgs.push_back(Args.MakeArgString("-ffp-contract=" + FPContract));		CmdArgs.push_back(Args.MakeArgString("-ffp-contract=" + FPContract));
		hfinkelUnsubmitted Done Reply Inline Actions So now we pass it twice, because we also pass it here? hfinkel: So now we pass it twice, because we also pass it here?
		wristowAuthorUnsubmitted Done Reply Inline Actions Whoops. This one was here originally, so I'll leave it here, and delete the one I needlessly added, above. wristow: Whoops. This one was here originally, so I'll leave it here, and delete the one I needlessly…

if (!RoundingFPMath)		if (!RoundingFPMath)
CmdArgs.push_back(Args.MakeArgString("-fno-rounding-math"));		CmdArgs.push_back(Args.MakeArgString("-fno-rounding-math"));

if (RoundingFPMath && RoundingMathPresent)		if (RoundingFPMath && RoundingMathPresent)
CmdArgs.push_back(Args.MakeArgString("-frounding-math"));		CmdArgs.push_back(Args.MakeArgString("-frounding-math"));

if (!FPExceptionBehavior.empty())		if (!FPExceptionBehavior.empty())
CmdArgs.push_back(Args.MakeArgString("-ffp-exception-behavior=" +		CmdArgs.push_back(Args.MakeArgString("-ffp-exception-behavior=" +
FPExceptionBehavior));		FPExceptionBehavior));

ParseMRecip(D, Args, CmdArgs);		ParseMRecip(D, Args, CmdArgs);

// -ffast-math enables the __FAST_MATH__ preprocessor macro, but check for the		// -ffast-math enables the __FAST_MATH__ preprocessor macro, but check for the
// individual features enabled by -ffast-math instead of the option itself as		// individual features enabled by -ffast-math instead of the option itself as
// that's consistent with gcc's behaviour.		// that's consistent with gcc's behaviour.
if (!HonorINFs && !HonorNaNs && !MathErrno && AssociativeMath &&		if (!HonorINFs && !HonorNaNs && !MathErrno && AssociativeMath &&
ReciprocalMath && !SignedZeros && !TrappingMath && !RoundingFPMath) {		ReciprocalMath && !SignedZeros && !TrappingMath && !RoundingFPMath) {
		hfinkelUnsubmitted Done Reply Inline Actions You want the check here, I think, and not below so that if parts of fast-math are disabled `__FAST_MATH__` is not set. That seems to be what the comment/code currently do, although what does GCC do in this regard? hfinkel: You want the check here, I think, and not below so that if parts of fast-math are disabled…
		wristowAuthorUnsubmitted Done Reply Inline Actions I had originally included the check with the set of conditions of line 2763, but then the `warn_drv_overriding_flag_option` warning (below) was missed in the case of `-ffp-model=fast -ffp-contract=off`. So I think the check does need to remain inside the above conditional. Regarding the `__FAST_MATH__` aspect and whether `ffp-contract=on` also ought to suppress the `-ffast-math` emission (and what GCC does), this is somewhat fuzzy for me. After getting your comments, I'm leaning toward including both the "off" and "on" in the check (rather than only disabling it in the case of "off", as the current patch does). But I want to at least note my observations here, to make the current situation with both Clang and GCC clear, in case others want to chime in. Somewhat surprisingly to me, GCC's behavior wrt `__FAST_MATH__` seems buggy in general, so it's not a good base to compare against. For example, the following set of switches result in the `__FAST_MATH__` macro being defined (although I expected they would have disabled it): -ffast-math -fno-associative-math -fno-reciprocal-math -frounding-math (I've just tested GCC 7.4.0.) I could have sworn that in the past that for GCC I had seen this sort of disabling of part of the "fast-math set" result in the disabling of the def of `__FAST_MATH__`. As you'd expect with Clang (from the comment in our code code), the above switches do suppress the definition of `__FAST_MATH__`. That said, GCC does suppress the `__FAST_MATH__` def in some situations. E.g., both GCC and Clang suppress it with the following switches: -ffast-math -fmath-errno -ffast-math -ftrapping-math -ffast-math -fsigned-zeros In any case, for the `-ffp-contract` switch, none of the settings suppress the def of `__FAST_MATH__` with GCC. That is, all of the following result in `__FAST_MATH__` being defined: -ffast-math -ffp-contract=fast -ffast-math -ffp-contract=on -ffast-math -ffp-contract=off I think that's buggy (but hard to know if it's `-ffp-contract` specific, or part of the general buggyness of GCC and `__FAST_MATH__`). Clang (prior to the change described here) also defines `__FAST_MATH__` in all of these `-ffp-contract` situations. But I think that's just part of the bug I'm trying to address here. Bottom line: The intended semantics are somewhat fuzzy to me. I'd say: -ffast-math -ffp-contract=fast // should leave __FAST_MATH__ defined -ffast-math -ffp-contract=off // should not define __FAST_MATH__ -ffast-math -ffp-contract=on // unsure ???? For that last one ("unsure ???"), in the current patch I was leaving `__FAST_MATH__` defined. But I'm now leaning toward also suppressing the `__FAST_MATH__` def in that case. Unless anyone disagrees, I'll include that change when I update the patch. wristow: I had originally included the check with the set of conditions of line 2763, but then the…
		if (!FPContractDisabled)
CmdArgs.push_back("-ffast-math");		CmdArgs.push_back("-ffast-math");
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions As above, I'd prefer "!off && !on". andrew.w.kaylor: As above, I'd prefer "!off && !on".
		wristowAuthorUnsubmitted Done Reply Inline Actions As above, will do. wristow: As above, will do.
if (FPModel.equals("fast")) {		if (FPModel.equals("fast")) {
if (FPContract.equals("fast"))		if (FPContract.equals("fast"))
// All set, do nothing.		// All set, do nothing.
;		;
else if (FPContract.empty())		else if (FPContract.empty())
// Enable -ffp-contract=fast		// Enable -ffp-contract=fast
CmdArgs.push_back(Args.MakeArgString("-ffp-contract=fast"));		CmdArgs.push_back(Args.MakeArgString("-ffp-contract=fast"));
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions This is a bit of an oddity in our handling. The FPContract local variable starts out initialized to an empty string. So, what we're saying here is that if you've used all the individual controls to set the rest of the fast math flags, we'll turn on fp-contract=fast also. That seems wrong. If you use "-ffast-math", FPContract will have been set to "fast" so that's not applicable here. I believe this means -fno-honor-infinities -fno-honor-nans -fno-math-errno -fassociative-math -freciprocal-math -fno-signed-zeros -fno-trapping-math -fno-rounding-math will imply "-ffp-contract=fast". Even worse: -ffp-contract=off -fno-fast-math -fno-honor-infinities -fno-honor-nans -fno-math-errno -fassociative-math -freciprocal-math -fno-signed-zeros -fno-trapping-math -fno-rounding-math will imply "-ffp-contract=fast" because "-fno-fast-math" resets FPContract to empty. I think we should initialize FPContract to whatever we want to be the default (on?) and remove the special case for empty here. Also, -fno-fast-math should either not change FPContract at all or set it to "off". Probably the latter since we're saying -ffast-math includes -ffp-contract=on. andrew.w.kaylor: This is a bit of an oddity in our handling. The FPContract local variable starts out…
		wristowAuthorUnsubmitted Done Reply Inline Actions This is a bit of an oddity in our handling. Yes it is! This is certainly getting more complicated than I had originally expected. I feel there isn't a clear spec on what we want in terms of whether FMA should be enabled "automatically" at (for example) '-O2', and/or whether it should be enabled by `-ffast-math`. I'm very willing to make whatever change is needed here, to meet whatever spec we all ultimately agree on. So I think this patch should be put on hold until we decide on these wider aspects. One minor note about: ... Probably the latter since we're saying -ffast-math includes -ffp-contract=on. I think that is intended to say "-ffast-math includes -ffp-contract=fast". The semantics of `-ffp-contract=on` are another part that seems unclear (or at least, more complicated, since it then gets into the FP_CONTRACT pragma). wristow: > This is a bit of an oddity in our handling. Yes it is! This is certainly getting more…
else		else
D.Diag(clang::diag::warn_drv_overriding_flag_option)		D.Diag(clang::diag::warn_drv_overriding_flag_option)
<< "-ffp-model=fast"		<< "-ffp-model=fast"
<< Args.MakeArgString("-ffp-contract=" + FPContract);		<< Args.MakeArgString("-ffp-contract=" + FPContract);
}		}
}		}

// Handle __FINITE_MATH_ONLY__ similarly.		// Handle __FINITE_MATH_ONLY__ similarly.
▲ Show 20 Lines • Show All 4,125 Lines • Show Last 20 Lines

clang/test/Driver/fast-math.c

	Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines
	// RUN: %clang -### -fno-honor-infinities -fno-honor-nans -fno-math-errno \			// RUN: %clang -### -fno-honor-infinities -fno-honor-nans -fno-math-errno \
	// RUN: -fassociative-math -freciprocal-math -fno-signed-zeros \			// RUN: -fassociative-math -freciprocal-math -fno-signed-zeros \
	// RUN: -fno-trapping-math -ffp-contract=fast -fno-rounding-math -c %s 2>&1 \			// RUN: -fno-trapping-math -ffp-contract=fast -fno-rounding-math -c %s 2>&1 \
	// RUN: \| FileCheck --check-prefix=CHECK-FAST-MATH %s			// RUN: \| FileCheck --check-prefix=CHECK-FAST-MATH %s
	// CHECK-FAST-MATH: "-cc1"			// CHECK-FAST-MATH: "-cc1"
	// CHECK-FAST-MATH: "-ffast-math"			// CHECK-FAST-MATH: "-ffast-math"
	// CHECK-FAST-MATH: "-ffinite-math-only"			// CHECK-FAST-MATH: "-ffinite-math-only"
	//			//
				// -ffp-contract=off must disable the fast-math umbrella, and the unsafe-fp-math
				// umbrella.
				spatelUnsubmitted Not Done Reply Inline Actions Do we need another pair of tests for -ffp-contract=on? spatel: Do we need another pair of tests for -ffp-contract=on?
				spatelUnsubmitted Not Done Reply Inline Actions typo: conteact spatel: typo: conteact
				// RUN: %clang -### -ffast-math -ffp-contract=off -c %s 2>&1 \
				// RUN: \| FileCheck --check-prefix=CHECK-NO-FAST-MATH %s
				// RUN: %clang -### -ffast-math -ffp-contract=off -c %s 2>&1 \
				// RUN: \| FileCheck --check-prefix=CHECK-NO-UNSAFE-MATH %s
				//
	// RUN: %clang -### -ffast-math -fno-fast-math -c %s 2>&1 \			// RUN: %clang -### -ffast-math -fno-fast-math -c %s 2>&1 \
	// RUN: \| FileCheck --check-prefix=CHECK-NO-FAST-MATH %s			// RUN: \| FileCheck --check-prefix=CHECK-NO-FAST-MATH %s
	// RUN: %clang -### -ffast-math -fno-finite-math-only -c %s 2>&1 \			// RUN: %clang -### -ffast-math -fno-finite-math-only -c %s 2>&1 \
	// RUN: \| FileCheck --check-prefix=CHECK-NO-FAST-MATH %s			// RUN: \| FileCheck --check-prefix=CHECK-NO-FAST-MATH %s
	// RUN: %clang -### -ffast-math -fno-unsafe-math-optimizations -c %s 2>&1 \			// RUN: %clang -### -ffast-math -fno-unsafe-math-optimizations -c %s 2>&1 \
	// RUN: \| FileCheck --check-prefix=CHECK-NO-FAST-MATH %s			// RUN: \| FileCheck --check-prefix=CHECK-NO-FAST-MATH %s
	// RUN: %clang -### -ffast-math -fmath-errno -c %s 2>&1 \			// RUN: %clang -### -ffast-math -fmath-errno -c %s 2>&1 \
				andrew.w.kaylorUnsubmitted Done Reply Inline Actions What about "-ffp-contract=off -ffast-math"? The way the code is written that will override the -ffp-contract option. That's probably what we want, though it might be nice to have a warning. andrew.w.kaylor: What about "-ffp-contract=off -ffast-math"? The way the code is written that will override the…
				wristowAuthorUnsubmitted Done Reply Inline Actions Yes, currently `-fffast-math` will override that earlier `-ffp-contract=off` settings. It's unclear to me whether we're ultimately intending for that to be the behavior (because GCC doesn't do that, as @uweigand noted). I guess this is another reason to hold off for a bit, until we figure out the wider spec. wristow: Yes, currently `-fffast-math` will override that earlier `-ffp-contract=off` settings. It's…
				wristowAuthorUnsubmitted Done Reply Inline Actions Added more tests that illustrate this behavior of "-ffp-contract=off/on -ffast-math" overriding and enabling FMA. (I haven't added a warning. Holding off until we decide on other questions discussed in this review.) wristow: Added more tests that illustrate this behavior of "-ffp-contract=off/on -ffast-math" overriding…
	// RUN: \| FileCheck --check-prefix=CHECK-NO-FAST-MATH %s			// RUN: \| FileCheck --check-prefix=CHECK-NO-FAST-MATH %s
	// RUN: %clang -### -ffast-math -fno-associative-math -c %s 2>&1 \			// RUN: %clang -### -ffast-math -fno-associative-math -c %s 2>&1 \
	// RUN: \| FileCheck --check-prefix=CHECK-NO-FAST-MATH --check-prefix=CHECK-ASSOC-MATH %s			// RUN: \| FileCheck --check-prefix=CHECK-NO-FAST-MATH --check-prefix=CHECK-ASSOC-MATH %s
	// CHECK-NO-FAST-MATH: "-cc1"			// CHECK-NO-FAST-MATH: "-cc1"
	// CHECK-NO-FAST-MATH-NOT: "-ffast-math"			// CHECK-NO-FAST-MATH-NOT: "-ffast-math"
	// CHECK-ASSOC-MATH-NOT: "-mreassociate"			// CHECK-ASSOC-MATH-NOT: "-mreassociate"
	//			//
	// Check various means of disabling these flags, including disabling them after			// Check various means of disabling these flags, including disabling them after
	▲ Show 20 Lines • Show All 124 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,500 Lines • ▼ Show 20 Lines	if (DAG.getDataLayout().isBigEndian())
std::reverse(Ops.end()-NumOutputsPerInput, Ops.end());		std::reverse(Ops.end()-NumOutputsPerInput, Ops.end());
}		}

return DAG.getBuildVector(VT, DL, Ops);		return DAG.getBuildVector(VT, DL, Ops);
}		}

static bool isContractable(SDNode *N) {		static bool isContractable(SDNode *N) {
SDNodeFlags F = N->getFlags();		SDNodeFlags F = N->getFlags();
return F.hasAllowContract() \|\| F.hasAllowReassociation();		return F.hasAllowContract();
}		}

/// Try to perform FMA combining on a given FADD node.		/// Try to perform FMA combining on a given FADD node.
SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {		SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc SL(N);		SDLoc SL(N);
▲ Show 20 Lines • Show All 9,732 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/fmf-propagation.ll

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; GLOBAL-NEXT: xsmaddasp 3, 1, 2			; GLOBAL-NEXT: xsmaddasp 3, 1, 2
	; GLOBAL-NEXT: fmr 1, 3			; GLOBAL-NEXT: fmr 1, 3
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%mul = fmul contract float %x, %y			%mul = fmul contract float %x, %y
	%add = fadd contract float %mul, %z			%add = fadd contract float %mul, %z
	ret float %add			ret float %add
	}			}

	; Reassociation implies that FMA contraction is allowed.			; On the FMF test, reassociation alone does _not_ imply that FMA contraction is
				; allowed (in particular, we need to be able to disable FMA even when
				; reassociation is enabled).

	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fadd_reassoc1:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fadd_reassoc1:'
	; FMFDEBUG: fma reassoc {{t[0-9]+}}, {{t[0-9]+}}, {{t[0-9]+}}			; FMFDEBUG: fadd reassoc {{t[0-9]+}}, {{t[0-9]+}}
	; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fadd_reassoc1:'			; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fadd_reassoc1:'

	define float @fmul_fadd_reassoc1(float %x, float %y, float %z) {			define float @fmul_fadd_reassoc1(float %x, float %y, float %z) {
	; FMF-LABEL: fmul_fadd_reassoc1:			; FMF-LABEL: fmul_fadd_reassoc1:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: xsmaddasp 3, 1, 2			; FMF-NEXT: xsmulsp 0, 1, 2
	; FMF-NEXT: fmr 1, 3			; FMF-NEXT: xsaddsp 1, 0, 3
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: fmul_fadd_reassoc1:			; GLOBAL-LABEL: fmul_fadd_reassoc1:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: xsmaddasp 3, 1, 2			; GLOBAL-NEXT: xsmaddasp 3, 1, 2
	; GLOBAL-NEXT: fmr 1, 3			; GLOBAL-NEXT: fmr 1, 3
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%mul = fmul float %x, %y			%mul = fmul float %x, %y
	%add = fadd reassoc float %mul, %z			%add = fadd reassoc float %mul, %z
	ret float %add			ret float %add
	}			}

	; This shouldn't change anything - the intermediate fmul result is now also flagged.			; This shouldn't change anything - the intermediate fmul result is now also flagged.

	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fadd_reassoc2:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fadd_reassoc2:'
	; FMFDEBUG: fma reassoc {{t[0-9]+}}, {{t[0-9]+}}			; FMFDEBUG: fadd reassoc {{t[0-9]+}}, {{t[0-9]+}}
	; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fadd_reassoc2:'			; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fadd_reassoc2:'

	define float @fmul_fadd_reassoc2(float %x, float %y, float %z) {			define float @fmul_fadd_reassoc2(float %x, float %y, float %z) {
	; FMF-LABEL: fmul_fadd_reassoc2:			; FMF-LABEL: fmul_fadd_reassoc2:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: xsmaddasp 3, 1, 2			; FMF-NEXT: xsmulsp 0, 1, 2
	; FMF-NEXT: fmr 1, 3			; FMF-NEXT: xsaddsp 1, 0, 3
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: fmul_fadd_reassoc2:			; GLOBAL-LABEL: fmul_fadd_reassoc2:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: xsmaddasp 3, 1, 2			; GLOBAL-NEXT: xsmaddasp 3, 1, 2
	; GLOBAL-NEXT: fmr 1, 3			; GLOBAL-NEXT: fmr 1, 3
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%mul = fmul reassoc float %x, %y			%mul = fmul reassoc float %x, %y
	%add = fadd reassoc float %mul, %z			%add = fadd reassoc float %mul, %z
	ret float %add			ret float %add
	}			}

				; Reassociation applied with contract enables FMA contraction (of course).

				; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fadd_contract_reassoc1:'
				; FMFDEBUG: fma contract reassoc {{t[0-9]+}}, {{t[0-9]+}}, {{t[0-9]+}}
				; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fadd_contract_reassoc1:'

				define float @fmul_fadd_contract_reassoc1(float %x, float %y, float %z) {
				; FMF-LABEL: fmul_fadd_contract_reassoc1:
				; FMF: # %bb.0:
				; FMF-NEXT: xsmaddasp 3, 1, 2
				; FMF-NEXT: fmr 1, 3
				; FMF-NEXT: blr
				;
				; GLOBAL-LABEL: fmul_fadd_contract_reassoc1:
				; GLOBAL: # %bb.0:
				; GLOBAL-NEXT: xsmaddasp 3, 1, 2
				; GLOBAL-NEXT: fmr 1, 3
				; GLOBAL-NEXT: blr
				%mul = fmul contract float %x, %y
				%add = fadd contract reassoc float %mul, %z
				ret float %add
				}

				; This shouldn't change anything - the intermediate fmul result is now also flagged.

				; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fadd_contract_reassoc2:'
				; FMFDEBUG: fma contract reassoc {{t[0-9]+}}, {{t[0-9]+}}
				; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fadd_contract_reassoc2:'

				define float @fmul_fadd_contract_reassoc2(float %x, float %y, float %z) {
				; FMF-LABEL: fmul_fadd_contract_reassoc2:
				; FMF: # %bb.0:
				; FMF-NEXT: xsmaddasp 3, 1, 2
				; FMF-NEXT: fmr 1, 3
				; FMF-NEXT: blr
				;
				; GLOBAL-LABEL: fmul_fadd_contract_reassoc2:
				; GLOBAL: # %bb.0:
				; GLOBAL-NEXT: xsmaddasp 3, 1, 2
				; GLOBAL-NEXT: fmr 1, 3
				; GLOBAL-NEXT: blr
				%mul = fmul contract reassoc float %x, %y
				%add = fadd contract reassoc float %mul, %z
				ret float %add
				}

	; The fadd is now fully 'fast'. This implies that contraction is allowed.			; The fadd is now fully 'fast'. This implies that contraction is allowed.

	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fadd_fast1:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fadd_fast1:'
	; FMFDEBUG: fma nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}, {{t[0-9]+}}, {{t[0-9]+}}			; FMFDEBUG: fma nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}, {{t[0-9]+}}, {{t[0-9]+}}
	; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fadd_fast1:'			; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fadd_fast1:'

	define float @fmul_fadd_fast1(float %x, float %y, float %z) {			define float @fmul_fadd_fast1(float %x, float %y, float %z) {
	; FMF-LABEL: fmul_fadd_fast1:			; FMF-LABEL: fmul_fadd_fast1:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: xsmaddasp 3, 1, 2			; FMF-NEXT: xsmaddasp 3, 1, 2
	; FMF-NEXT: fmr 1, 3			; FMF-NEXT: fmr 1, 3
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: fmul_fadd_fast1:			; GLOBAL-LABEL: fmul_fadd_fast1:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: xsmaddasp 3, 1, 2			; GLOBAL-NEXT: xsmaddasp 3, 1, 2
	; GLOBAL-NEXT: fmr 1, 3			; GLOBAL-NEXT: fmr 1, 3
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%mul = fmul fast float %x, %y			%mul = fmul float %x, %y
	%add = fadd fast float %mul, %z			%add = fadd fast float %mul, %z
	ret float %add			ret float %add
	}			}

	; This shouldn't change anything - the intermediate fmul result is now also flagged.			; This shouldn't change anything - the intermediate fmul result is now also flagged.

	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fadd_fast2:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fadd_fast2:'
	; FMFDEBUG: fma nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}, {{t[0-9]+}}, {{t[0-9]+}}			; FMFDEBUG: fma nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}, {{t[0-9]+}}, {{t[0-9]+}}
	Show All 11 Lines
	; GLOBAL-NEXT: xsmaddasp 3, 1, 2			; GLOBAL-NEXT: xsmaddasp 3, 1, 2
	; GLOBAL-NEXT: fmr 1, 3			; GLOBAL-NEXT: fmr 1, 3
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%mul = fmul fast float %x, %y			%mul = fmul fast float %x, %y
	%add = fadd fast float %mul, %z			%add = fadd fast float %mul, %z
	ret float %add			ret float %add
	}			}

	; fma(X, 7.0, X * 42.0) --> X * 49.0			; fma(X, 7.0, X * 42.0) --> X * 49.0
	; This is the minimum FMF needed for this transform - the FMA allows reassociation.			; This is the minimum FMF needed for this transform - the 'contract' allows the needed reassociation.

				; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_contract1:'
				; FMFDEBUG: fmul contract {{t[0-9]+}},
				; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_contract1:'

				; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_contract1:'
				; GLOBALDEBUG: fmul contract {{t[0-9]+}}
				; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_contract1:'

				define float @fmul_fma_contract1(float %x) {
				; FMF-LABEL: fmul_fma_contract1:
				; FMF: # %bb.0:
				; FMF-NEXT: addis 3, 2, .LCPI8_0@toc@ha
				; FMF-NEXT: lfs 0, .LCPI8_0@toc@l(3)
				; FMF-NEXT: xsmulsp 1, 1, 0
				; FMF-NEXT: blr
				;
				; GLOBAL-LABEL: fmul_fma_contract1:
				; GLOBAL: # %bb.0:
				; GLOBAL-NEXT: addis 3, 2, .LCPI8_0@toc@ha
				; GLOBAL-NEXT: lfs 0, .LCPI8_0@toc@l(3)
				; GLOBAL-NEXT: xsmulsp 1, 1, 0
				; GLOBAL-NEXT: blr
				%mul = fmul float %x, 42.0
				%fma = call contract float @llvm.fma.f32(float %x, float 7.0, float %mul)
				ret float %fma
				}

				; This shouldn't change anything - the intermediate fmul result is now also flagged.

				; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_contract2:'
				; FMFDEBUG: fmul contract {{t[0-9]+}},
				; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_contract2:'

				; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_contract2:'
				; GLOBALDEBUG: fmul contract {{t[0-9]+}}
				; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_contract2:'

				define float @fmul_fma_contract2(float %x) {
				; FMF-LABEL: fmul_fma_contract2:
				; FMF: # %bb.0:
				; FMF-NEXT: addis 3, 2, .LCPI9_0@toc@ha
				; FMF-NEXT: lfs 0, .LCPI9_0@toc@l(3)
				; FMF-NEXT: xsmulsp 1, 1, 0
				; FMF-NEXT: blr
				;
				; GLOBAL-LABEL: fmul_fma_contract2:
				; GLOBAL: # %bb.0:
				; GLOBAL-NEXT: addis 3, 2, .LCPI9_0@toc@ha
				; GLOBAL-NEXT: lfs 0, .LCPI9_0@toc@l(3)
				; GLOBAL-NEXT: xsmulsp 1, 1, 0
				; GLOBAL-NEXT: blr
				%mul = fmul contract float %x, 42.0
				%fma = call contract float @llvm.fma.f32(float %x, float 7.0, float %mul)
				ret float %fma
				}

				; On the FMF test, reassociation alone does _not_ imply that FMA contraction is allowed.

				spatelUnsubmitted Not Done Reply Inline Actions I was ok with the reasoning up to here, but this example lost me. Why does 'contract' alone allow splitting an fma? Is 'contract' relevant on anything besides fadd (with an fmul operand)? spatel: I was ok with the reasoning up to here, but this example lost me. Why does 'contract' alone…
				wristowAuthorUnsubmitted Done Reply Inline Actions I have to admit I'm handwaving here. I don't really understand why 'contract' alone allows this simpliciation: `fma(X, 7.0, X * 42.0) --> X * 49.0` to happen. But either that's correct, or it's a separate bug (and I waved my hands about explaining more detail). The short story is that even without my proposed change, having only 'contract' on the FMA intrinsic results in this being simplified to `X * 49` (and also having only 'reassoc' did). The original comment: `; This is the minimum FMF needed for this transform - the FMA allows reassociation.` is incomplete/misleading. A more complete comment (relevant before my change) would have been: `; This is the minimum FMF needed for this transform - either 'reassoc' or 'contract' applied to the FMA intrinsic allows reassociation.` And with my change, the result is that 'reassoc' is no longer relevant. But since I don't really understand why it should be simplified with only `contract`, I should put a TODO comment in. That is, to me it seems like both 'contract' and 'reassoc' should be needed for this simplification to happen (whereas previously it worked with either of them individually). So maybe change this comment to: ; This is the minimum FMF needed for this transform - applying 'contract' to the FMA intrinsic allows reassociation. ; TODO: It seems 'contract' and 'reassoc' combined should be needed for this transform. Why does it work with only 'contract'? wristow: I have to admit I'm handwaving here. I don't really understand why 'contract' alone allows…
	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_reassoc1:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_reassoc1:'
	; FMFDEBUG: fmul reassoc {{t[0-9]+}},			; FMFDEBUG: fmul {{t[0-9]+}},
				; FMFDEBUG: fma reassoc {{t[0-9]+}},
	; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_reassoc1:'			; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_reassoc1:'

	; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_reassoc1:'			; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_reassoc1:'
	; GLOBALDEBUG: fmul reassoc {{t[0-9]+}}			; GLOBALDEBUG: fmul reassoc {{t[0-9]+}}
	; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_reassoc1:'			; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_reassoc1:'

	define float @fmul_fma_reassoc1(float %x) {			define float @fmul_fma_reassoc1(float %x) {
	; FMF-LABEL: fmul_fma_reassoc1:			; FMF-LABEL: fmul_fma_reassoc1:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: addis 3, 2, .LCPI6_0@toc@ha			; FMF-NEXT: addis 3, 2, .LCPI10_0@toc@ha
	; FMF-NEXT: lfs 0, .LCPI6_0@toc@l(3)			; FMF-NEXT: lfs 0, .LCPI10_0@toc@l(3)
	; FMF-NEXT: xsmulsp 1, 1, 0			; FMF-NEXT: addis 3, 2, .LCPI10_1@toc@ha
				; FMF-NEXT: lfs 2, .LCPI10_1@toc@l(3)
				; FMF-NEXT: xsmulsp 0, 1, 0
				; FMF-NEXT: xsmaddasp 0, 1, 2
				; FMF-NEXT: fmr 1, 0
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: fmul_fma_reassoc1:			; GLOBAL-LABEL: fmul_fma_reassoc1:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: addis 3, 2, .LCPI6_0@toc@ha			; GLOBAL-NEXT: addis 3, 2, .LCPI10_0@toc@ha
	; GLOBAL-NEXT: lfs 0, .LCPI6_0@toc@l(3)			; GLOBAL-NEXT: lfs 0, .LCPI10_0@toc@l(3)
	; GLOBAL-NEXT: xsmulsp 1, 1, 0			; GLOBAL-NEXT: xsmulsp 1, 1, 0
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%mul = fmul float %x, 42.0			%mul = fmul float %x, 42.0
	%fma = call reassoc float @llvm.fma.f32(float %x, float 7.0, float %mul)			%fma = call reassoc float @llvm.fma.f32(float %x, float 7.0, float %mul)
	ret float %fma			ret float %fma
	}			}

	; This shouldn't change anything - the intermediate fmul result is now also flagged.			; This shouldn't change anything - the intermediate fmul result is now also flagged.

	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_reassoc2:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_reassoc2:'
	; FMFDEBUG: fmul reassoc {{t[0-9]+}}			; FMFDEBUG: fmul reassoc {{t[0-9]+}}
				; FMFDEBUG: fma reassoc {{t[0-9]+}}
	; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_reassoc2:'			; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_reassoc2:'

	; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_reassoc2:'			; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_reassoc2:'
	; GLOBALDEBUG: fmul reassoc {{t[0-9]+}}			; GLOBALDEBUG: fmul reassoc {{t[0-9]+}}
	; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_reassoc2:'			; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_reassoc2:'

	define float @fmul_fma_reassoc2(float %x) {			define float @fmul_fma_reassoc2(float %x) {
	; FMF-LABEL: fmul_fma_reassoc2:			; FMF-LABEL: fmul_fma_reassoc2:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: addis 3, 2, .LCPI7_0@toc@ha			; FMF-NEXT: addis 3, 2, .LCPI11_0@toc@ha
	; FMF-NEXT: lfs 0, .LCPI7_0@toc@l(3)			; FMF-NEXT: lfs 0, .LCPI11_0@toc@l(3)
	; FMF-NEXT: xsmulsp 1, 1, 0			; FMF-NEXT: addis 3, 2, .LCPI11_1@toc@ha
				; FMF-NEXT: lfs 2, .LCPI11_1@toc@l(3)
				; FMF-NEXT: xsmulsp 0, 1, 0
				; FMF-NEXT: xsmaddasp 0, 1, 2
				; FMF-NEXT: fmr 1, 0
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: fmul_fma_reassoc2:			; GLOBAL-LABEL: fmul_fma_reassoc2:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: addis 3, 2, .LCPI7_0@toc@ha			; GLOBAL-NEXT: addis 3, 2, .LCPI11_0@toc@ha
	; GLOBAL-NEXT: lfs 0, .LCPI7_0@toc@l(3)			; GLOBAL-NEXT: lfs 0, .LCPI11_0@toc@l(3)
	; GLOBAL-NEXT: xsmulsp 1, 1, 0			; GLOBAL-NEXT: xsmulsp 1, 1, 0
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%mul = fmul reassoc float %x, 42.0			%mul = fmul reassoc float %x, 42.0
	%fma = call reassoc float @llvm.fma.f32(float %x, float 7.0, float %mul)			%fma = call reassoc float @llvm.fma.f32(float %x, float 7.0, float %mul)
	ret float %fma			ret float %fma
	}			}

				; Reassociation applied with contract enables FMA contraction (of course).

				; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_contract_reassoc1:'
				; FMFDEBUG: fmul contract reassoc {{t[0-9]+}},
				; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_contract_reassoc1:'

				; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_contract_reassoc1:'
				; GLOBALDEBUG: fmul contract reassoc {{t[0-9]+}}
				; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_contract_reassoc1:'

				define float @fmul_fma_contract_reassoc1(float %x) {
				; FMF-LABEL: fmul_fma_contract_reassoc1:
				; FMF: # %bb.0:
				; FMF-NEXT: addis 3, 2, .LCPI12_0@toc@ha
				; FMF-NEXT: lfs 0, .LCPI12_0@toc@l(3)
				; FMF-NEXT: xsmulsp 1, 1, 0
				; FMF-NEXT: blr
				;
				; GLOBAL-LABEL: fmul_fma_contract_reassoc1:
				; GLOBAL: # %bb.0:
				; GLOBAL-NEXT: addis 3, 2, .LCPI12_0@toc@ha
				; GLOBAL-NEXT: lfs 0, .LCPI12_0@toc@l(3)
				; GLOBAL-NEXT: xsmulsp 1, 1, 0
				; GLOBAL-NEXT: blr
				%mul = fmul float %x, 42.0
				%fma = call contract reassoc float @llvm.fma.f32(float %x, float 7.0, float %mul)
				ret float %fma
				}

				; This shouldn't change anything - the intermediate fmul result is now also flagged.

				; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_contract_reassoc2:'
				; FMFDEBUG: fmul contract reassoc {{t[0-9]+}}
				; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_contract_reassoc2:'

				; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_contract_reassoc2:'
				; GLOBALDEBUG: fmul contract reassoc {{t[0-9]+}}
				; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_contract_reassoc2:'

				define float @fmul_fma_contract_reassoc2(float %x) {
				; FMF-LABEL: fmul_fma_contract_reassoc2:
				; FMF: # %bb.0:
				; FMF-NEXT: addis 3, 2, .LCPI13_0@toc@ha
				; FMF-NEXT: lfs 0, .LCPI13_0@toc@l(3)
				; FMF-NEXT: xsmulsp 1, 1, 0
				; FMF-NEXT: blr
				;
				; GLOBAL-LABEL: fmul_fma_contract_reassoc2:
				; GLOBAL: # %bb.0:
				; GLOBAL-NEXT: addis 3, 2, .LCPI13_0@toc@ha
				; GLOBAL-NEXT: lfs 0, .LCPI13_0@toc@l(3)
				; GLOBAL-NEXT: xsmulsp 1, 1, 0
				; GLOBAL-NEXT: blr
				%mul = fmul contract reassoc float %x, 42.0
				%fma = call contract reassoc float @llvm.fma.f32(float %x, float 7.0, float %mul)
				ret float %fma
				}

	; The FMA is now fully 'fast'. This implies that reassociation is allowed.			; The FMA is now fully 'fast'. This implies that reassociation is allowed.

	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_fast1:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_fast1:'
	; FMFDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}			; FMFDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}
	; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_fast1:'			; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_fast1:'

	; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_fast1:'			; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_fast1:'
	; GLOBALDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}			; GLOBALDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}
	; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_fast1:'			; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_fast1:'

	define float @fmul_fma_fast1(float %x) {			define float @fmul_fma_fast1(float %x) {
	; FMF-LABEL: fmul_fma_fast1:			; FMF-LABEL: fmul_fma_fast1:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: addis 3, 2, .LCPI8_0@toc@ha			; FMF-NEXT: addis 3, 2, .LCPI14_0@toc@ha
	; FMF-NEXT: lfs 0, .LCPI8_0@toc@l(3)			; FMF-NEXT: lfs 0, .LCPI14_0@toc@l(3)
	; FMF-NEXT: xsmulsp 1, 1, 0			; FMF-NEXT: xsmulsp 1, 1, 0
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: fmul_fma_fast1:			; GLOBAL-LABEL: fmul_fma_fast1:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: addis 3, 2, .LCPI8_0@toc@ha			; GLOBAL-NEXT: addis 3, 2, .LCPI14_0@toc@ha
	; GLOBAL-NEXT: lfs 0, .LCPI8_0@toc@l(3)			; GLOBAL-NEXT: lfs 0, .LCPI14_0@toc@l(3)
	; GLOBAL-NEXT: xsmulsp 1, 1, 0			; GLOBAL-NEXT: xsmulsp 1, 1, 0
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%mul = fmul float %x, 42.0			%mul = fmul float %x, 42.0
	%fma = call fast float @llvm.fma.f32(float %x, float 7.0, float %mul)			%fma = call fast float @llvm.fma.f32(float %x, float 7.0, float %mul)
	ret float %fma			ret float %fma
	}			}

	; This shouldn't change anything - the intermediate fmul result is now also flagged.			; This shouldn't change anything - the intermediate fmul result is now also flagged.

	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_fast2:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_fast2:'
	; FMFDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}			; FMFDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}
	; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_fast2:'			; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_fast2:'

	; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_fast2:'			; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fmul_fma_fast2:'
	; GLOBALDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}			; GLOBALDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}
	; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_fast2:'			; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fmul_fma_fast2:'

	define float @fmul_fma_fast2(float %x) {			define float @fmul_fma_fast2(float %x) {
	; FMF-LABEL: fmul_fma_fast2:			; FMF-LABEL: fmul_fma_fast2:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: addis 3, 2, .LCPI9_0@toc@ha			; FMF-NEXT: addis 3, 2, .LCPI15_0@toc@ha
	; FMF-NEXT: lfs 0, .LCPI9_0@toc@l(3)			; FMF-NEXT: lfs 0, .LCPI15_0@toc@l(3)
	; FMF-NEXT: xsmulsp 1, 1, 0			; FMF-NEXT: xsmulsp 1, 1, 0
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: fmul_fma_fast2:			; GLOBAL-LABEL: fmul_fma_fast2:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: addis 3, 2, .LCPI9_0@toc@ha			; GLOBAL-NEXT: addis 3, 2, .LCPI15_0@toc@ha
	; GLOBAL-NEXT: lfs 0, .LCPI9_0@toc@l(3)			; GLOBAL-NEXT: lfs 0, .LCPI15_0@toc@l(3)
	; GLOBAL-NEXT: xsmulsp 1, 1, 0			; GLOBAL-NEXT: xsmulsp 1, 1, 0
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%mul = fmul fast float %x, 42.0			%mul = fmul fast float %x, 42.0
	%fma = call fast float @llvm.fma.f32(float %x, float 7.0, float %mul)			%fma = call fast float @llvm.fma.f32(float %x, float 7.0, float %mul)
	ret float %fma			ret float %fma
	}			}

	; Reduced precision for sqrt is allowed - should use estimate and NR iterations.			; Reduced precision for sqrt is allowed - should use estimate and NR iterations.

	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_afn:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_afn:'
	; FMFDEBUG: fmul afn {{t[0-9]+}}			; FMFDEBUG: fmul afn {{t[0-9]+}}
	; FMFDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_afn:'			; FMFDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_afn:'

	; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_afn:'			; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_afn:'
	; GLOBALDEBUG: fmul afn {{t[0-9]+}}			; GLOBALDEBUG: fmul afn {{t[0-9]+}}
	; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_afn:'			; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_afn:'

	define float @sqrt_afn(float %x) {			define float @sqrt_afn(float %x) {
	; FMF-LABEL: sqrt_afn:			; FMF-LABEL: sqrt_afn:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: xxlxor 0, 0, 0			; FMF-NEXT: xxlxor 0, 0, 0
	; FMF-NEXT: fcmpu 0, 1, 0			; FMF-NEXT: fcmpu 0, 1, 0
	; FMF-NEXT: beq 0, .LBB10_2			; FMF-NEXT: beq 0, .LBB16_2
	; FMF-NEXT: # %bb.1:			; FMF-NEXT: # %bb.1:
	; FMF-NEXT: xsrsqrtesp 0, 1			; FMF-NEXT: xsrsqrtesp 0, 1
	; FMF-NEXT: addis 3, 2, .LCPI10_0@toc@ha			; FMF-NEXT: addis 3, 2, .LCPI16_0@toc@ha
	; FMF-NEXT: addis 4, 2, .LCPI10_1@toc@ha			; FMF-NEXT: addis 4, 2, .LCPI16_1@toc@ha
	; FMF-NEXT: lfs 2, .LCPI10_0@toc@l(3)			; FMF-NEXT: lfs 2, .LCPI16_0@toc@l(3)
	; FMF-NEXT: lfs 3, .LCPI10_1@toc@l(4)			; FMF-NEXT: lfs 3, .LCPI16_1@toc@l(4)
	; FMF-NEXT: xsmulsp 1, 1, 0			; FMF-NEXT: xsmulsp 1, 1, 0
	; FMF-NEXT: xsmulsp 0, 1, 0			; FMF-NEXT: xsmulsp 0, 1, 0
	; FMF-NEXT: xsmulsp 1, 1, 2			; FMF-NEXT: xsmulsp 1, 1, 2
	; FMF-NEXT: xsaddsp 0, 0, 3			; FMF-NEXT: xsaddsp 0, 0, 3
	; FMF-NEXT: xsmulsp 0, 1, 0			; FMF-NEXT: xsmulsp 0, 1, 0
	; FMF-NEXT: .LBB10_2:			; FMF-NEXT: .LBB16_2:
	; FMF-NEXT: fmr 1, 0			; FMF-NEXT: fmr 1, 0
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: sqrt_afn:			; GLOBAL-LABEL: sqrt_afn:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: xxlxor 0, 0, 0			; GLOBAL-NEXT: xxlxor 0, 0, 0
	; GLOBAL-NEXT: fcmpu 0, 1, 0			; GLOBAL-NEXT: fcmpu 0, 1, 0
	; GLOBAL-NEXT: beq 0, .LBB10_2			; GLOBAL-NEXT: beq 0, .LBB16_2
	; GLOBAL-NEXT: # %bb.1:			; GLOBAL-NEXT: # %bb.1:
	; GLOBAL-NEXT: xsrsqrtesp 0, 1			; GLOBAL-NEXT: xsrsqrtesp 0, 1
	; GLOBAL-NEXT: addis 3, 2, .LCPI10_0@toc@ha			; GLOBAL-NEXT: addis 3, 2, .LCPI16_0@toc@ha
	; GLOBAL-NEXT: addis 4, 2, .LCPI10_1@toc@ha			; GLOBAL-NEXT: addis 4, 2, .LCPI16_1@toc@ha
	; GLOBAL-NEXT: lfs 2, .LCPI10_0@toc@l(3)			; GLOBAL-NEXT: lfs 2, .LCPI16_0@toc@l(3)
	; GLOBAL-NEXT: lfs 3, .LCPI10_1@toc@l(4)			; GLOBAL-NEXT: lfs 3, .LCPI16_1@toc@l(4)
	; GLOBAL-NEXT: xsmulsp 1, 1, 0			; GLOBAL-NEXT: xsmulsp 1, 1, 0
	; GLOBAL-NEXT: xsmaddasp 2, 1, 0			; GLOBAL-NEXT: xsmaddasp 2, 1, 0
	; GLOBAL-NEXT: xsmulsp 0, 1, 3			; GLOBAL-NEXT: xsmulsp 0, 1, 3
	; GLOBAL-NEXT: xsmulsp 0, 0, 2			; GLOBAL-NEXT: xsmulsp 0, 0, 2
	; GLOBAL-NEXT: .LBB10_2:			; GLOBAL-NEXT: .LBB16_2:
	; GLOBAL-NEXT: fmr 1, 0			; GLOBAL-NEXT: fmr 1, 0
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%rt = call afn float @llvm.sqrt.f32(float %x)			%rt = call afn float @llvm.sqrt.f32(float %x)
	ret float %rt			ret float %rt
	}			}

	; The call is now fully 'fast'. This implies that approximation is allowed.			; The call is now fully 'fast'. This implies that approximation is allowed.

	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_fast:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_fast:'
	; FMFDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}			; FMFDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}
	; FMFDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_fast:'			; FMFDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_fast:'

	; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_fast:'			; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'sqrt_fast:'
	; GLOBALDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}			; GLOBALDEBUG: fmul nnan ninf nsz arcp contract afn reassoc {{t[0-9]+}}
	; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_fast:'			; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'sqrt_fast:'

	define float @sqrt_fast(float %x) {			define float @sqrt_fast(float %x) {
	; FMF-LABEL: sqrt_fast:			; FMF-LABEL: sqrt_fast:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: xxlxor 0, 0, 0			; FMF-NEXT: xxlxor 0, 0, 0
	; FMF-NEXT: fcmpu 0, 1, 0			; FMF-NEXT: fcmpu 0, 1, 0
	; FMF-NEXT: beq 0, .LBB11_2			; FMF-NEXT: beq 0, .LBB17_2
	; FMF-NEXT: # %bb.1:			; FMF-NEXT: # %bb.1:
	; FMF-NEXT: xsrsqrtesp 0, 1			; FMF-NEXT: xsrsqrtesp 0, 1
	; FMF-NEXT: addis 3, 2, .LCPI11_0@toc@ha			; FMF-NEXT: addis 3, 2, .LCPI17_0@toc@ha
	; FMF-NEXT: addis 4, 2, .LCPI11_1@toc@ha			; FMF-NEXT: addis 4, 2, .LCPI17_1@toc@ha
	; FMF-NEXT: lfs 2, .LCPI11_0@toc@l(3)			; FMF-NEXT: lfs 2, .LCPI17_0@toc@l(3)
	; FMF-NEXT: lfs 3, .LCPI11_1@toc@l(4)			; FMF-NEXT: lfs 3, .LCPI17_1@toc@l(4)
	; FMF-NEXT: xsmulsp 1, 1, 0			; FMF-NEXT: xsmulsp 1, 1, 0
	; FMF-NEXT: xsmaddasp 2, 1, 0			; FMF-NEXT: xsmaddasp 2, 1, 0
	; FMF-NEXT: xsmulsp 0, 1, 3			; FMF-NEXT: xsmulsp 0, 1, 3
	; FMF-NEXT: xsmulsp 0, 0, 2			; FMF-NEXT: xsmulsp 0, 0, 2
	; FMF-NEXT: .LBB11_2:			; FMF-NEXT: .LBB17_2:
	; FMF-NEXT: fmr 1, 0			; FMF-NEXT: fmr 1, 0
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: sqrt_fast:			; GLOBAL-LABEL: sqrt_fast:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: xxlxor 0, 0, 0			; GLOBAL-NEXT: xxlxor 0, 0, 0
	; GLOBAL-NEXT: fcmpu 0, 1, 0			; GLOBAL-NEXT: fcmpu 0, 1, 0
	; GLOBAL-NEXT: beq 0, .LBB11_2			; GLOBAL-NEXT: beq 0, .LBB17_2
	; GLOBAL-NEXT: # %bb.1:			; GLOBAL-NEXT: # %bb.1:
	; GLOBAL-NEXT: xsrsqrtesp 0, 1			; GLOBAL-NEXT: xsrsqrtesp 0, 1
	; GLOBAL-NEXT: addis 3, 2, .LCPI11_0@toc@ha			; GLOBAL-NEXT: addis 3, 2, .LCPI17_0@toc@ha
	; GLOBAL-NEXT: addis 4, 2, .LCPI11_1@toc@ha			; GLOBAL-NEXT: addis 4, 2, .LCPI17_1@toc@ha
	; GLOBAL-NEXT: lfs 2, .LCPI11_0@toc@l(3)			; GLOBAL-NEXT: lfs 2, .LCPI17_0@toc@l(3)
	; GLOBAL-NEXT: lfs 3, .LCPI11_1@toc@l(4)			; GLOBAL-NEXT: lfs 3, .LCPI17_1@toc@l(4)
	; GLOBAL-NEXT: xsmulsp 1, 1, 0			; GLOBAL-NEXT: xsmulsp 1, 1, 0
	; GLOBAL-NEXT: xsmaddasp 2, 1, 0			; GLOBAL-NEXT: xsmaddasp 2, 1, 0
	; GLOBAL-NEXT: xsmulsp 0, 1, 3			; GLOBAL-NEXT: xsmulsp 0, 1, 3
	; GLOBAL-NEXT: xsmulsp 0, 0, 2			; GLOBAL-NEXT: xsmulsp 0, 0, 2
	; GLOBAL-NEXT: .LBB11_2:			; GLOBAL-NEXT: .LBB17_2:
	; GLOBAL-NEXT: fmr 1, 0			; GLOBAL-NEXT: fmr 1, 0
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%rt = call fast float @llvm.sqrt.f32(float %x)			%rt = call fast float @llvm.sqrt.f32(float %x)
	ret float %rt			ret float %rt
	}			}

	; fcmp can have fast-math-flags.			; fcmp can have fast-math-flags.

	; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fcmp_nnan:'			; FMFDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fcmp_nnan:'
	; FMFDEBUG: select_cc nnan {{t[0-9]+}}			; FMFDEBUG: select_cc nnan {{t[0-9]+}}
	; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fcmp_nnan:'			; FMFDEBUG: Type-legalized selection DAG: %bb.0 'fcmp_nnan:'

	; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fcmp_nnan:'			; GLOBALDEBUG-LABEL: Optimized lowered selection DAG: %bb.0 'fcmp_nnan:'
	; GLOBALDEBUG: select_cc nnan {{t[0-9]+}}			; GLOBALDEBUG: select_cc nnan {{t[0-9]+}}
	; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fcmp_nnan:'			; GLOBALDEBUG: Type-legalized selection DAG: %bb.0 'fcmp_nnan:'

	define double @fcmp_nnan(double %a, double %y, double %z) {			define double @fcmp_nnan(double %a, double %y, double %z) {
	; FMF-LABEL: fcmp_nnan:			; FMF-LABEL: fcmp_nnan:
	; FMF: # %bb.0:			; FMF: # %bb.0:
	; FMF-NEXT: xxlxor 0, 0, 0			; FMF-NEXT: xxlxor 0, 0, 0
	; FMF-NEXT: xscmpudp 0, 1, 0			; FMF-NEXT: xscmpudp 0, 1, 0
	; FMF-NEXT: blt 0, .LBB12_2			; FMF-NEXT: blt 0, .LBB18_2
	; FMF-NEXT: # %bb.1:			; FMF-NEXT: # %bb.1:
	; FMF-NEXT: fmr 3, 2			; FMF-NEXT: fmr 3, 2
	; FMF-NEXT: .LBB12_2:			; FMF-NEXT: .LBB18_2:
	; FMF-NEXT: fmr 1, 3			; FMF-NEXT: fmr 1, 3
	; FMF-NEXT: blr			; FMF-NEXT: blr
	;			;
	; GLOBAL-LABEL: fcmp_nnan:			; GLOBAL-LABEL: fcmp_nnan:
	; GLOBAL: # %bb.0:			; GLOBAL: # %bb.0:
	; GLOBAL-NEXT: xxlxor 0, 0, 0			; GLOBAL-NEXT: xxlxor 0, 0, 0
	; GLOBAL-NEXT: xscmpudp 0, 1, 0			; GLOBAL-NEXT: xscmpudp 0, 1, 0
	; GLOBAL-NEXT: blt 0, .LBB12_2			; GLOBAL-NEXT: blt 0, .LBB18_2
	; GLOBAL-NEXT: # %bb.1:			; GLOBAL-NEXT: # %bb.1:
	; GLOBAL-NEXT: fmr 3, 2			; GLOBAL-NEXT: fmr 3, 2
	; GLOBAL-NEXT: .LBB12_2:			; GLOBAL-NEXT: .LBB18_2:
	; GLOBAL-NEXT: fmr 1, 3			; GLOBAL-NEXT: fmr 1, 3
	; GLOBAL-NEXT: blr			; GLOBAL-NEXT: blr
	%cmp = fcmp nnan ult double %a, 0.0			%cmp = fcmp nnan ult double %a, 0.0
	%z.y = select i1 %cmp, double %z, double %y			%z.y = select i1 %cmp, double %z, double %y
	ret double %z.y			ret double %z.y
	}			}

	; FP library calls can have fast-math-flags.			; FP library calls can have fast-math-flags.
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fp-contract.ll

This file was added.

				; Tests for -ffp-contract/-ffast-math interaction.
				; Specifically, -ffp-contract=off must suppress the use of FMA.

				; RUN: llc < %s -mcpu=haswell \| FileCheck %s --check-prefix=FMA

				; Scalar versions:

				define float @MulAddPlain(float %a, float %b, float %c) {
				; FMA-LABEL: MulAddPlain:
				; FMA: vmulss
				; FMA-NEXT: vaddss
				; FMA-NEXT: ret
				%mul = fmul float %a, %b
				%add = fadd float %mul, %c
				ret float %add
				}

				define float @MulAddFast(float %a, float %b, float %c) {
				; FMA-LABEL: MulAddFast:
				; FMA: vfmadd213ss
				; FMA-NEXT: ret
				%mul = fmul fast float %a, %b
				%add = fadd fast float %mul, %c
				ret float %add
				}

				define float @MulAddContract(float %a, float %b, float %c) {
				; FMA-LABEL: MulAddContract:
				; FMA: vfmadd213ss
				; FMA-NEXT: ret
				%mul = fmul contract float %a, %b
				%add = fadd contract float %mul, %c
				ret float %add
				}

				; Enabling all the fast-math-flags except 'contract' does not enable fused operations.
				define float @MulAddFastNoContract(float %a, float %b, float %c) {
				; FMA-LABEL: MulAddFastNoContract:
				; FMA: vmulss
				; FMA-NEXT: vaddss
				; FMA-NEXT: ret
				%mul = fmul nnan ninf nsz arcp afn reassoc float %a, %b
				%add = fadd nnan ninf nsz arcp afn reassoc float %mul, %c
				ret float %add
				}

				define float @MulAddReassoc(float %a, float %b, float %c) {
				; FMA-LABEL: MulAddReassoc:
				; FMA: vmulss
				; FMA-NEXT: vaddss
				; FMA-NEXT: ret
				%mul = fmul reassoc float %a, %b
				%add = fadd reassoc float %mul, %c
				ret float %add
				}

				define float @MulSubPlain(float %a, float %b, float %c) {
				; FMA-LABEL: MulSubPlain:
				; FMA: vmulss
				; FMA-NEXT: vsubss
				; FMA-NEXT: ret
				%mul = fmul float %a, %b
				%sub = fsub float %mul, %c
				ret float %sub
				}

				define float @MulSubFast(float %a, float %b, float %c) {
				; FMA-LABEL: MulSubFast:
				; FMA: vfmsub213ss
				; FMA-NEXT: ret
				%mul = fmul fast float %a, %b
				%sub = fsub fast float %mul, %c
				ret float %sub
				}

				define float @MulSubContract(float %a, float %b, float %c) {
				; FMA-LABEL: MulSubContract:
				; FMA: vfmsub213ss
				; FMA-NEXT: ret
				%mul = fmul contract float %a, %b
				%sub = fsub contract float %mul, %c
				ret float %sub
				}

				; Enabling all the fast-math-flags except 'contract' does not enable fused operations.
				define float @MulSubFastNoContract(float %a, float %b, float %c) {
				; FMA-LABEL: MulSubFastNoContract:
				; FMA: vmulss
				; FMA-NEXT: vsubss
				; FMA-NEXT: ret
				%mul = fmul nnan ninf nsz arcp afn reassoc float %a, %b
				%sub = fsub nnan ninf nsz arcp afn reassoc float %mul, %c
				ret float %sub
				}

				define float @MulSubReassoc(float %a, float %b, float %c) {
				; FMA-LABEL: MulSubReassoc:
				; FMA: vmulss
				; FMA-NEXT: vsubss
				; FMA-NEXT: ret
				%mul = fmul reassoc float %a, %b
				%sub = fsub reassoc float %mul, %c
				ret float %sub
				}

				; Vector versions:

				define <4 x float> @VecMulAddPlain(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
				; FMA-LABEL: VecMulAddPlain:
				; FMA: vmulps
				; FMA-NEXT: vaddps
				; FMA-NEXT: ret
				%mul = fmul <4 x float> %a, %b
				%add = fadd <4 x float> %mul, %c
				ret <4 x float> %add
				}

				define <4 x float> @VecMulAddFast(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
				; FMA-LABEL: VecMulAddFast:
				; FMA: vfmadd213ps
				; FMA-NEXT: ret
				%mul = fmul fast <4 x float> %a, %b
				%add = fadd fast <4 x float> %mul, %c
				ret <4 x float> %add
				}

				define <4 x float> @VecMulAddContract(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
				; FMA-LABEL: VecMulAddContract:
				; FMA: vfmadd213ps
				; FMA-NEXT: ret
				%mul = fmul contract <4 x float> %a, %b
				%add = fadd contract <4 x float> %mul, %c
				ret <4 x float> %add
				}

				; Enabling all the fast-math-flags except 'contract' does not enable fused operations.
				define <4 x float> @VecMulAddFastNoContract(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
				; FMA-LABEL: VecMulAddFastNoContract:
				; FMA: vmulps
				; FMA-NEXT: vaddps
				; FMA-NEXT: ret
				%mul = fmul nnan ninf nsz arcp afn reassoc <4 x float> %a, %b
				%add = fadd nnan ninf nsz arcp afn reassoc <4 x float> %mul, %c
				ret <4 x float> %add
				}

				define <4 x float> @VecMulAddReassoc(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
				; FMA-LABEL: VecMulAddReassoc:
				; FMA: vmulps
				; FMA-NEXT: vaddps
				; FMA-NEXT: ret
				%mul = fmul reassoc <4 x float> %a, %b
				%add = fadd reassoc <4 x float> %mul, %c
				ret <4 x float> %add
				}

				define <4 x float> @VecMulSubPlain(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
				; FMA-LABEL: VecMulSubPlain:
				; FMA: vmulps
				; FMA-NEXT: vsubps
				; FMA-NEXT: ret
				%mul = fmul <4 x float> %a, %b
				%sub = fsub <4 x float> %mul, %c
				ret <4 x float> %sub
				}

				define <4 x float> @VecMulSubFast(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
				; FMA-LABEL: VecMulSubFast:
				; FMA: vfmsub213ps
				; FMA-NEXT: ret
				%mul = fmul fast <4 x float> %a, %b
				%sub = fsub fast <4 x float> %mul, %c
				ret <4 x float> %sub
				}

				define <4 x float> @VecMulSubContract(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
				; FMA-LABEL: VecMulSubContract:
				; FMA: vfmsub213ps
				; FMA-NEXT: ret
				%mul = fmul contract <4 x float> %a, %b
				%sub = fsub contract <4 x float> %mul, %c
				ret <4 x float> %sub
				}

				; Enabling all the fast-math-flags except 'contract' does not enable fused operations.
				define <4 x float> @VecMulSubFastNoContract(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
				; FMA-LABEL: VecMulSubFastNoContract:
				; FMA: vmulps
				; FMA-NEXT: vsubps
				; FMA-NEXT: ret
				%mul = fmul nnan ninf nsz arcp afn reassoc <4 x float> %a, %b
				%sub = fsub nnan ninf nsz arcp afn reassoc <4 x float> %mul, %c
				ret <4 x float> %sub
				}

				define <4 x float> @VecMulSubReassoc(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
				; FMA-LABEL: VecMulSubReassoc:
				; FMA: vmulps
				; FMA-NEXT: vsubps
				; FMA-NEXT: ret
				%mul = fmul reassoc <4 x float> %a, %b
				%sub = fsub reassoc <4 x float> %mul, %c
				ret <4 x float> %sub
				}

This is an archive of the discontinued LLVM Phabricator instance.

[Clang][Driver] Fix -ffast-math/-ffp-contract interactionNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 237834

clang/lib/Driver/ToolChains/Clang.cpp

clang/test/Driver/fast-math.c

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/PowerPC/fmf-propagation.ll

llvm/test/CodeGen/X86/fp-contract.ll

[Clang][Driver] Fix -ffast-math/-ffp-contract interaction
Needs ReviewPublic