Currently, the target-independent backend codegen will allow the generation of FMA instructions if *either* the 'contract' fast-math flag is set OR TargetOption::AllowFPOpFusion is set to FPOpFusion::Fast OR the TargetOption::UnsafeFPMath flag is set. This allows fp contraction to be controlled by a means other than the IR and prevents the generation of IR (by a front end) that would enable fusion in some functions and disable it in others.
Note: This change would render the clang -ffp-contract=fast-honor-pragma option obsolete. It also makes the llc -fp-contract option non-functional. These options will be removed in a later patch.
Also note: There are 17 additional lit tests that fail with this change. I updated tests for the X86 and AMDGPU backends (to have one I was familiar with and another I wasn't). It's tedious work, so I didn't want to update all the tests without getting feedback on this direction. Obviously, I'd fix all the tests before committing this patch. There may be a change needed to the front end for CUDA and HIP support before this patch is committed, but I'd like to keep that separate.
I'll send an RFC to llvm-dev to draw more attention to this proposed change.
We do need to have a way to preserve current behavior for CUDA compilation. There are many existing users that implicitly assume it.