This is an archive of the discontinued LLVM Phabricator instance.

Utilize new SDNode flag functionality to expand current support for fma
ClosedPublic

Authored by mcberg2017 on Jun 7 2018, 4:45 PM.

Details

Summary

This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions.

Diff Detail

Event Timeline

mcberg2017 created this revision.Jun 7 2018, 4:45 PM
mcberg2017 added inline comments.Jun 7 2018, 4:47 PM
test/CodeGen/PowerPC/fmf-propagation.ll
308

No worries here, this is cross patching cruft, next update will have this sync up.

mcberg2017 updated this revision to Diff 150433.Jun 7 2018, 4:56 PM

with the cleanup...

spatel added inline comments.Jun 8 2018, 9:15 AM
test/CodeGen/X86/fmf-flags_fma.ll
2 ↗(On Diff #150433)

avx2 shouldn't be necessary?

10–11 ↗(On Diff #150433)

This case is already folded independently of this patch, right?
Please check in all new tests with current codegen as a preliminary step.

The case where this patch will make a difference is something like this?

define float @fadd_contract_with_strict_fmul(float %a , float %b , float %c) {
  %m = fmul float %a, %b
  %a = fadd contract float %m, %c
  ret float %a
}

Removed an already covered test and avx2 requirement for testing.

mcberg2017 added a comment.EditedJun 11 2018, 12:04 PM

TBD, check in tests first...

mcberg2017 added a comment.EditedJun 11 2018, 5:46 PM

test/CodeGen/AArch64/fma-aggressive.ll and test/CodeGen/X86/fmf-flags_fma.ll behave the same before and after the addition of this code, I will drop them for the next upload, and only one test file has changes, test/CodeGen/AArch64/neon-fma-FMF.ll, which I will check in.
The other two tests just have changed behavior with this code change and will remain unmodified until this code is checked in.

Synced up after rL334461, with fewer test files.

Adding more potential FMA reviewers. There may be some tricky bits here in the interaction between the FMF, the global settings, and the TLI hooks. See for example D28675.

IMO, we need to add more tests -- particularly for the targets that these folds were originally applied to (mostly AMDGPU).

nhaehnle accepted this revision.Jun 15 2018, 3:23 AM
nhaehnle added a subscriber: tpr.

We do have tests for those in AMDGPU, with -enable-unsafe-fp-math. We don't have systematic tests with the new contract/reassoc bits. At least the Mesa frontend doesn't generate those anyway at the moment.

@tpr, @rampitec maybe you want to add some tests for those for HSA / LLPC? I don't know if you're using the new bits yet.

In any case, the change looks good to me.

This revision is now accepted and ready to land.Jun 15 2018, 3:23 AM

We do have tests for those in AMDGPU, with -enable-unsafe-fp-math. We don't have systematic tests with the new contract/reassoc bits. At least the Mesa frontend doesn't generate those anyway at the moment.

@tpr, @rampitec maybe you want to add some tests for those for HSA / LLPC? I don't know if you're using the new bits yet.

In any case, the change looks good to me.

We should mostly just convert all the tests using the global flag to using the per-instruction ones.

The HSA case just inherits them from clang if its using them, which I assume it is.

We do have tests for those in AMDGPU, with -enable-unsafe-fp-math. We don't have systematic tests with the new contract/reassoc bits. At least the Mesa frontend doesn't generate those anyway at the moment.

@tpr, @rampitec maybe you want to add some tests for those for HSA / LLPC? I don't know if you're using the new bits yet.

In any case, the change looks good to me.

We should mostly just convert all the tests using the global flag to using the per-instruction ones.

The HSA case just inherits them from clang if its using them, which I assume it is.

Yes, we have the tests an can modify/duplicate them to use individual flags.
However, speaking of interaction with clang supposed to produce it I really have hard time not to get llvm.fmuladd.f32 call right from the clang, even at -O0. Which makes me think from a practical point of view we will not see any real codegen difference. Well, at least if I use OpenCL.

This revision was automatically updated to reflect the committed changes.