Page MenuHomePhabricator

[mlir][VectorOps] Lower vector.fma to llvm.fmuladd instead of llvm.fma
ClosedPublic

Authored by bkramer on Mon, Jul 13, 3:26 AM.

Details

Summary

These are semantically equivalent, but fmuladd allows decaying the op
into fmul+fadd if there is no fma instruction available. llvm.fma lowers
to scalar calls to libm fmaf, which is a lot slower.

Diff Detail

Event Timeline

bkramer created this revision.Mon, Jul 13, 3:26 AM
Herald added a project: Restricted Project. · View Herald Transcript
nicolasvasilache accepted this revision.Mon, Jul 13, 3:30 AM

Great, is this the root cause of the issue we were seeing on older HW ?

This revision is now accepted and ready to land.Mon, Jul 13, 3:30 AM

Great, is this the root cause of the issue we were seeing on older HW ?

Yup, this change makes the generated matmul 8x faster when targeting SSE with no fma instructions.

This revision was automatically updated to reflect the committed changes.

LGTM, nice finding for SSE!

note that we guarantee use of llvm.fma in some of the vector doc, we probably want to update that as well