This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions.
Details
Diff Detail
Event Timeline
test/CodeGen/PowerPC/fmf-propagation.ll | ||
---|---|---|
290 | No worries here, this is cross patching cruft, next update will have this sync up. |
test/CodeGen/X86/fmf-flags_fma.ll | ||
---|---|---|
3 | avx2 shouldn't be necessary? | |
11–12 | This case is already folded independently of this patch, right? The case where this patch will make a difference is something like this? define float @fadd_contract_with_strict_fmul(float %a , float %b , float %c) { %m = fmul float %a, %b %a = fadd contract float %m, %c ret float %a } |
test/CodeGen/AArch64/fma-aggressive.ll and test/CodeGen/X86/fmf-flags_fma.ll behave the same before and after the addition of this code, I will drop them for the next upload, and only one test file has changes, test/CodeGen/AArch64/neon-fma-FMF.ll, which I will check in.
The other two tests just have changed behavior with this code change and will remain unmodified until this code is checked in.
Adding more potential FMA reviewers. There may be some tricky bits here in the interaction between the FMF, the global settings, and the TLI hooks. See for example D28675.
IMO, we need to add more tests -- particularly for the targets that these folds were originally applied to (mostly AMDGPU).
We do have tests for those in AMDGPU, with -enable-unsafe-fp-math. We don't have systematic tests with the new contract/reassoc bits. At least the Mesa frontend doesn't generate those anyway at the moment.
@tpr, @rampitec maybe you want to add some tests for those for HSA / LLPC? I don't know if you're using the new bits yet.
In any case, the change looks good to me.
We should mostly just convert all the tests using the global flag to using the per-instruction ones.
The HSA case just inherits them from clang if its using them, which I assume it is.
Yes, we have the tests an can modify/duplicate them to use individual flags.
However, speaking of interaction with clang supposed to produce it I really have hard time not to get llvm.fmuladd.f32 call right from the clang, even at -O0. Which makes me think from a practical point of view we will not see any real codegen difference. Well, at least if I use OpenCL.
No worries here, this is cross patching cruft, next update will have this sync up.