This patch changes the ffp-model=precise to enables -ffp-contract=on (previously -ffp-model=precise enabled -ffp-contract=fast). This is a follow-up to Andy Kaylor's comments in the llvm-dev discussion "Floating Point semantic modes". From the same email thread, I put Andy's distillation of floating point options and floating point modes into UsersManual.rst
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/docs/UsersManual.rst | ||
---|---|---|
1388 | I'm confused. Where in this patch the This patch establishes the default option for -ffp-model to select "precise". happens? LHS of diff says it is already default |
I agree that the default mode should be standards-compliant, which I believe permits fp-contract=on (contraction within expressions) but not fp-contract=fast (arbitrary contraction).
clang/docs/UsersManual.rst | ||
---|---|---|
1388 | The comments said that it was the default, but the actual default was something that didn't quite match any of the fp-models -- precise but with fp contraction off. | |
clang/lib/Driver/ToolChains/Clang.cpp | ||
2498–2504 | I think this always gets changed to fast, on, or off below, but making it empty here looks wrong. | |
2615–2619 | Does this mean that "-ffp-model=precise -ffp-contract=off" will leave FP contraction on? That doesn't seem right. | |
2722–2727 | We should never get here with FPContract empty. |
some replies to Andy. I'll upload another patch here which passed check-all locally. then i'll re-commit it.
clang/docs/UsersManual.rst | ||
---|---|---|
1388 | yes you're right. sorry for the confusion. i changed the title and description | |
clang/lib/Driver/ToolChains/Clang.cpp | ||
2615–2619 | No. When -ffp-model=precise is on the command line, we end up here because the code above changes optID to ffp_contract, and sets FPContract to "on". The Val string will be "precise", and it shouldn't be checked. I'll change the comment. |
FYI this caused failed builds for aarch64 (when building with -fno-signed-zeros), hitting unreachable statements in TargetLowering::getNegatedExpression - see https://bugs.llvm.org/show_bug.cgi?id=44892. I guess that's a bug in the aarch64 backend and not with this change in itself, but it was only uncovered by this change...
Looks like this patch regressed the test-suite:
https://lnt.llvm.org/db_default/v4/nts/recent_activity
Now 45min+
This is also breaking test-suites in our internal buildbots ..
This change is actually changing the default behavior: if user does NOT supply -ffp-contract, it becomes -ffp-contract=on.
So this means any code without explicitly add -ffp-contract=off will have -ffp-contract=on.
eg: -O0 will also generate FMA.
Is this intended behavior change? @mibintc @andrew.w.kaylor
You're right, -O0 shouldn't generate FMA. I'm preparing to revert this now -- just verifying the build.
You're right, -O0 shouldn't generate FMA. I'm preparing to revert this now -- just verifying the build.
Perhaps this should be
off with no optimization
on with -O1/-O2/-O3/-Os/-Oz
fast with fast math
Just a suggestion, I'm not sure whether that would be the best breakdown. Perhaps we can also see what the defaults are for GCC and unify with those?
@jsji Will your backend tolerate -ffp-contract=on if optimizations are not disabled, e.g. -O2? We are proposing that the default floating point model be -ffp-model=precise, and with precise model, the ffp-contract=on. However you are right we don't want the frontend to create FMA when optimizations are disabled -O0
It looks like something strange happened with the revert:
clang-11: warning: overriding '-ffp-model=strict' option with '-ffp-model=strict' [-Woverriding-t-option]
I believe the problem is that the original change that was being reverted contained this:
clang/lib/Driver/ToolChains/Clang.cpp @@ -2768,7 +2766,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, !AssociativeMath && !ReciprocalMath && SignedZeros && TrappingMath && RoundingFPMath && DenormalFPMath != llvm::DenormalMode::getIEEE() && + FPContract.empty()) - (FPContract.equals("off") || FPContract.empty()))
But sometime in the land-revert-land-revert cycle the line above that changed, causing the merge to miss this change in the most recent revert. I see that @MaskRay has since re-landed this change set, but it's going to cause problems for PowerPC. If someone needs to revert this yet again, I think it can be safely done by recovering the change above.
Apologies for the mess!
We normally default to -ffp-contract=on only for -O3 and above (for legacy compilers XL/gcc on PowerPC).
But yes, I think we are OK to accept -ffp-contract=on (not -ffp-contract=fast) at -O2 if there are strong justifications to do so on clang.
I've been discussing this with @scanon on the cfe-dev mailing list, and he has convinced me that we should create FMA options at -O0 if the fp-contract setting is "on" -- including when it is on by default. The reasoning that persuaded me was, "preserving FMA formation at O0 _helps_ debuggability, because it means that numerical behavior is more likely to match what a user observed at Os, allowing them to debug the problem."
Yeah, I agree with Steve here. The great virtue of -ffp-contract=on over the more aggressive modes is that FMA formation is purely local to a single source expression, which means there's really no obstacle to treating it as implementation-guaranteed semantics. Such behavior should be done consistently across compiler settings rather than potentially making the numeric result of an expression subject to optimizer settings and choices; the latter is fine for "fast" modes but not really acceptable as the default compiler behavior. If a backend needs to legalize the intrinsic back to separate multiply + add instructions, that's okay, but it should do so consistently rather than only under a specific optimization level.
We normally default to -ffp-contract=on only for -O3 and above (for legacy compilers XL/gcc on PowerPC).
Correction: XL on PowerPC default to generate FMA at -O0 as well. The corresponding option is -qfloat=maf.
https://www.ibm.com/support/knowledgecenter/SSXVZZ_16.1.1/com.ibm.xlcpp1611.lelinux.doc/compiler_ref/opt_float.html
GCC doesn't support "on" so I'm not sure it's possible to distinguish their intentions. I think it would be better to define what we think is best for clang and encourage GCC to unify with that.
IIUC the patch is currently reverted (in 78654e8511cf16d49f6680d782f3771a767ba942), due to ~20 llvm-test-suite failures.
The failures on X86 are listed below. Unfortunately most tests just compare the hash of the output, so it is not obvious in what way the results are changed. Any ideas on how to move forward on this? I'll update the diff with a rebased version on current trunk, which I used to reproduce the failures.
Failed Tests (22): test-suite :: MultiSource/Applications/oggenc/oggenc.test test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test test-suite :: MultiSource/Benchmarks/Rodinia/srad/srad.test test-suite :: SingleSource/Benchmarks/Linpack/linpack-pc.test test-suite :: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.test test-suite :: SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.test test-suite :: SingleSource/Benchmarks/Polybench/datamining/covariance/covariance.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/2mm/2mm.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/3mm/3mm.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg/bicg.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemver/gemver.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trmm/trmm.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt.test test-suite :: SingleSource/Benchmarks/Polybench/stencils/adi/adi.test test-suite :: SingleSource/UnitTests/Vector/SSE/Vector-sse.expandfft.test test-suite :: SingleSource/UnitTests/Vector/SSE/Vector-sse.stepfft.test
Presumably it means that whatever optimization options those test specify, are now either more strict or less strict.
I think this is really missing some codegen tests that show how codegen actually changes by this.
Any ideas on how to move forward on this? I'll update the diff with a rebased version on current trunk, which I used to reproduce the failures.
Failed Tests (22): test-suite :: MultiSource/Applications/oggenc/oggenc.test test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test test-suite :: MultiSource/Benchmarks/Rodinia/srad/srad.test test-suite :: SingleSource/Benchmarks/Linpack/linpack-pc.test test-suite :: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.test test-suite :: SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.test test-suite :: SingleSource/Benchmarks/Polybench/datamining/covariance/covariance.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/2mm/2mm.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/3mm/3mm.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg/bicg.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemver/gemver.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trmm/trmm.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt.test test-suite :: SingleSource/Benchmarks/Polybench/stencils/adi/adi.test test-suite :: SingleSource/UnitTests/Vector/SSE/Vector-sse.expandfft.test test-suite :: SingleSource/UnitTests/Vector/SSE/Vector-sse.stepfft.test
And by codegen changes i mostly mean newly-set/now-unset fp fast-math instruction flags.