Clang fold the fma constants for -ffp-contract=on ,but we have the a catch here,
what if some of intel targets doesn't support fma inst's.
in this case lets bail out with const folding.
Paths
| Differential D158632
[clang/X86] Do not perform FMA bydefault Needs RevisionPublic Authored by long5hot on Aug 23 2023, 9:07 AM.
Details
Summary Clang fold the fma constants for -ffp-contract=on ,but we have the a catch here, in this case lets bail out with const folding.
Diff Detail Event TimelineComment Actions bisecting testcase fexcess-precision.c for default subtarget and AVX512-FP16, because update script update_cc_test_checks.py was removing some checks because of same -check-prefixes.. Comment Actions This is a frontend solution to a backend problem. Fmuladd does not guarantee fusion or not and that’s all you have changed the emission of. Clang should not be considering the target for what it should emit. -fp-contract=on emits fmuladd and -fp-contract=fast emits fmul contract fadd contract. The backend is responsible for deciding whether it’s profitable to fuse or not, where it’s also refined based on the FP type The description also mentions constant folding which isn’t changed. fmuladd will constant fold as FMA and this is not an issue. That is permitted by the semantics, target preferred codegen doesn’t change that This revision now requires changes to proceed.Aug 26 2023, 4:30 AM Comment Actions
What i meant to say is, Clang frontend emits fma instrinsic based on the option ffp-contract=on. Now while optimization, It may calculate that fma operation during constant folding. PR : #63190 This has also mentioned by @andrew.w.kaylor in #54927 Comment Actions
Not a bug. This is operationally defined. The semantics of the operation should not depend on target preference. This also doesn't consider FMA availability differing based on type Comment Actions
As I just mentioned in https://github.com/llvm/llvm-project/issues/55230, @arsenm has convinced me that I was taking too hard of a stance on this. The use of -fp-contract=on explicitly permits the value change introduced by fused operations, so the fact that constant folding results in a different result than would have been produced by run-time evaluation is consistent with the rules under which the intrinsic was introduced. I'm not entirely convinced that the current behavior is the best we can do, but it is correct given the rules we are following. Users who want tighter numeric consistency should either use -fp-contract=off or make an explicit call to the fma function (which will be evaluated as if fused regardless of target hardware support). I'm hoping to put together a detailed document on numeric consistency goals and rules for the LLVM optimizer, but I'm not sure when I'll have time to do so. I started a discussion here: https://discourse.llvm.org/t/fp-constant-folding-of-floating-point-operations/73138 I raised the FMA constant folding issue there, and @arsenm explained his reasoning for why either method of evaluation is acceptable.
Revision Contents
Diff 553036 clang/include/clang/Basic/TargetInfo.h
clang/lib/Basic/Targets/X86.h
clang/lib/CodeGen/CGExprScalar.cpp
clang/test/CodeGen/X86/fexcess-precision-avx.c
clang/test/CodeGen/X86/fexcess-precision.c
clang/test/CodeGen/X86/fma-intrinsics.c
|