This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Expand contractable fmuladd into fmul + fadd
Needs ReviewPublic

Authored by foad on Jun 5 2023, 7:35 AM.

Details

Summary

The intention of fmuladd is that it represents an fmul and an fadd that
may be fused with each other but may not be fused with any other
instructions. If the fmuladd is marked as contractable then this
restriction disappears, so we should canonicalize it to separate
(contractable) fmul and fadd instructions, which are generally better
supported by InstCombine and other optimization passes.

Diff Detail

Event Timeline

foad created this revision.Jun 5 2023, 7:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2023, 7:35 AM
foad requested review of this revision.Jun 5 2023, 7:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2023, 7:35 AM

I don't think I'm sufficiently familiar with FP semantics to review this...

Checking fast is way too much. At most it might also need nsz, but I'm pretty sure contract is sufficient

llvm/test/Transforms/InstCombine/fma.ll
230

Could use tests that other flags are preserved

fhahn added a subscriber: fhahn.Jun 28 2023, 5:26 AM

Is there a test that shows the actual benefit of this, e..g due to further combines? One potential issue is that (parts of) the cost model won't consider fadd/fmul pairs as single instruction, so. there may be negative fallout in things like SLPVectorizer.

Is there a test that shows the actual benefit of this, e..g due to further combines? One potential issue is that (parts of) the cost model won't consider fadd/fmul pairs as single instruction, so. there may be negative fallout in things like SLPVectorizer.

This is a question for whether we want the original combine here or to remove it. If we're going to do this it at all it should use the minimal required flags

foad updated this revision to Diff 535412.Jun 28 2023, 7:57 AM

Rebase.

foad added a comment.Jun 28 2023, 7:58 AM

Is there a test that shows the actual benefit of this, e..g due to further combines? One potential issue is that (parts of) the cost model won't consider fadd/fmul pairs as single instruction, so. there may be negative fallout in things like SLPVectorizer.

I added test/CodeGen/AMDGPU/fmuladd-to-fma.ll which shows one case where we get better optimization. I didn't look into exactly why that happens in this case.