This is an archive of the discontinued LLVM Phabricator instance.

[clang/X86] Do not perform FMA bydefault
Needs RevisionPublic

Authored by long5hot on Aug 23 2023, 9:07 AM.

Details

Summary

Clang fold the fma constants for -ffp-contract=on ,but we have the a catch here,
what if some of intel targets doesn't support fma inst's.

in this case lets bail out with const folding.

Diff Detail

Event Timeline

long5hot created this revision.Aug 23 2023, 9:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 9:07 AM
Herald added a subscriber: pengfei. · View Herald Transcript
long5hot requested review of this revision.Aug 23 2023, 9:07 AM
long5hot updated this revision to Diff 553034.Aug 24 2023, 1:34 AM
long5hot edited the summary of this revision. (Show Details)
long5hot added a reviewer: craig.topper.

bisecting testcase fexcess-precision.c for default subtarget and AVX512-FP16, because update script update_cc_test_checks.py was removing some checks because of same -check-prefixes..

long5hot updated this revision to Diff 553036.Aug 24 2023, 1:36 AM
arsenm requested changes to this revision.Aug 26 2023, 4:30 AM
arsenm added a subscriber: arsenm.

This is a frontend solution to a backend problem. Fmuladd does not guarantee fusion or not and that’s all you have changed the emission of. Clang should not be considering the target for what it should emit. -fp-contract=on emits fmuladd and -fp-contract=fast emits fmul contract fadd contract.

The backend is responsible for deciding whether it’s profitable to fuse or not, where it’s also refined based on the FP type

The description also mentions constant folding which isn’t changed. fmuladd will constant fold as FMA and this is not an issue. That is permitted by the semantics, target preferred codegen doesn’t change that

This revision now requires changes to proceed.Aug 26 2023, 4:30 AM
long5hot added a comment.EditedAug 27 2023, 3:31 AM

This is a frontend solution to a backend problem. Fmuladd does not guarantee fusion or not and that’s all you have changed the emission of. Clang should not be considering the target for what it should emit. -fp-contract=on emits fmuladd and -fp-contract=fast emits fmul contract fadd contract.

The backend is responsible for deciding whether it’s profitable to fuse or not, where it’s also refined based on the FP type

What i meant to say is, Clang frontend emits fma instrinsic based on the option ffp-contract=on. Now while optimization, It may calculate that fma operation during constant folding.
It Does not check if Target supports fma instructions or not, e.g. (nehalem).
If target doesn't support fma instruciton and emits mul and add then there's a mismatch and loss of precision.

PR : #63190

This has also mentioned by @andrew.w.kaylor in #54927

ping!

Not a bug. This is operationally defined. The semantics of the operation should not depend on target preference. This also doesn't consider FMA availability differing based on type

This is a frontend solution to a backend problem. Fmuladd does not guarantee fusion or not and that’s all you have changed the emission of. Clang should not be considering the target for what it should emit. -fp-contract=on emits fmuladd and -fp-contract=fast emits fmul contract fadd contract.

The backend is responsible for deciding whether it’s profitable to fuse or not, where it’s also refined based on the FP type

What i meant to say is, Clang frontend emits fma instrinsic based on the option ffp-contract=on. Now while optimization, It may calculate that fma operation during constant folding.
It Does not check if Target supports fma instructions or not, e.g. (nehalem).
If target doesn't support fma instruciton and emits mul and add then there's a mismatch and loss of precision.

PR : #63190

This has also mentioned by @andrew.w.kaylor in #54927

As I just mentioned in https://github.com/llvm/llvm-project/issues/55230, @arsenm has convinced me that I was taking too hard of a stance on this. The use of -fp-contract=on explicitly permits the value change introduced by fused operations, so the fact that constant folding results in a different result than would have been produced by run-time evaluation is consistent with the rules under which the intrinsic was introduced. I'm not entirely convinced that the current behavior is the best we can do, but it is correct given the rules we are following. Users who want tighter numeric consistency should either use -fp-contract=off or make an explicit call to the fma function (which will be evaluated as if fused regardless of target hardware support).

I'm hoping to put together a detailed document on numeric consistency goals and rules for the LLVM optimizer, but I'm not sure when I'll have time to do so. I started a discussion here: https://discourse.llvm.org/t/fp-constant-folding-of-floating-point-operations/73138 I raised the FMA constant folding issue there, and @arsenm explained his reasoning for why either method of evaluation is acceptable.