Page MenuHomePhabricator

[AArch64] Consider instruction-level contract FMFs in combiner patterns.

Authored by fhahn on Thu, Jul 30, 3:43 AM.



Currently, instruction level fast math flags are not considered when
generating patterns for the machine combiner.

This currently leads to some missed opportunities to generate FMAs in
combination with #pragma clang fp contract (fast).

For example, when building the example below with -O3 for AArch64, no
FMADD is generated. If built with -O2 and the DAGCombiner is used
instead of the MachineCombiner for FMAs, an FMADD is generated.

With this patch, the same code is generated in both cases.

float madd_contract(float a, float b, float c) {
#pragma clang fp contract (fast)
  return (a * b) + c;

Diff Detail

Event Timeline

fhahn created this revision.Thu, Jul 30, 3:43 AM
fhahn requested review of this revision.Thu, Jul 30, 3:43 AM
dmgreen added inline comments.Thu, Jul 30, 7:22 AM

-> We can only fuse ...


Do both instructions need the fast-math flag, or just the add? I guess it's safer to check both, but the code would end up being simpler if it was just the add.

Reading DAGCombiner::visitFADDForFMACombine it's a bit messy, but seems like it only really considers the add:
Reading the code though, I'm not sure that's really what it is intending to do.


-> both

fhahn updated this revision to Diff 282628.Mon, Aug 3, 8:03 AM

Only require contract on FADD/FSUB for fusion.

fhahn added inline comments.Mon, Aug 3, 8:06 AM

Indeed it seems like it should be sufficient on the FADD/FSUB. Given that is what DAGCOmbine does, it should be safe to do so in the machine combiner too (plus it ensures consistency between -O3 and other levels). It also makes the code much simpler.

I also found llvm/test/CodeGen/AArch64/neon-fma-FMF.ll, which tests with different contract flags and it specifically checks for fusion with contract on FADD and no fusion with contract on FMUL. I added -O3, which checks the machine-combiner path as well.

dmgreen accepted this revision.Tue, Aug 4, 1:12 AM

Thanks. LGTM, in that this work the same as DAGCombiner.

This revision is now accepted and ready to land.Tue, Aug 4, 1:12 AM