This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add fma operation to std dialect
ClosedPublic

Authored by ezhulenev on Feb 16 2021, 10:47 AM.

Details

Summary

Will remove vector.fma operation in the followup CLs.

Diff Detail

Event Timeline

ezhulenev created this revision.Feb 16 2021, 10:47 AM
ezhulenev requested review of this revision.Feb 16 2021, 10:47 AM
ezhulenev edited the summary of this revision. (Show Details)Feb 16 2021, 10:48 AM
ezhulenev added reviewers: herhut, mehdi_amini.

Where should this go when the standard dialect is split into pieces?

Where should this go when the standard dialect is split into pieces?

Keep it in the same dialect as add and mul (is it going to be std)? I was thinking about adding it to math, but @herhut said it's "not math enough" and I agree. Current situation with fma only in vector is also questionable because fma doesn't have anything vector specific.

I agree that this may be closer to an "arithmetic" dialect than the math dialect, even though it is a bit on the edge: for example you're mapping this to an LLVM intrinsic which can be a pessimization for the optimizer compared to a sequence of add(mul()) with the reassociation flag (this goes with my inline comment somehow)

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
1463

We need to specify this a bit more: for example do we guarantee the fused precision? How is is implemented on a target which does not have native FMA? What are the fast-math flags effect on this?

What is the intended used for this? Why would someone use this operation rather than emitting add(mul()) with some fast-math reassociation flags? Can we codify this here better?

ezhulenev added inline comments.Feb 16 2021, 4:24 PM
mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
1463

My concrete use case is polynomial approximations of Tanh, without fma perf is ~2x worse than Eigen, and at the same time I'm not sure that it is safe to turn on reassoc flag on a whole compiled module (and right now it seems impossible to emit add and mul with separate flags).

Maybe add that semantics is the same as llvm.fma intrinsic? And all guarantees are whatever llvm provides (when lowered to LLVM)?

ezhulenev added inline comments.Feb 16 2021, 5:16 PM
mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
1463

Talked a bit with Rasmus on this topic: FMA vs no-FMA has a huge difference for accuracy and polynomial approximation coefficients are different, turning fast-math on is not an option, and we need precise control how add(mul(...)) is executed (fma vs non-fma).

I guess polynomial approximation pass will just take a flag, fma on or off, and will select approximation based on that.

mehdi_amini added inline comments.Feb 16 2021, 7:27 PM
mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
1463

So it isn't a performance issue but a correctness one here?

I'm not sure that it is safe to turn on reassoc flag on a whole compiled module (and right now it seems impossible to emit add and mul with separate flags).

Right I wouldn't suggest turning fast-math at a module level ever. I missed that we don't have fast-maths flag on these ops right now...

ezhulenev added inline comments.Feb 17 2021, 1:09 AM
mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
1463

Performance and correctness. Tanh approximation in D96739 assumes FMA operations. Regarding performance, Rasmus mentioned that without FMA the number of executed instructions to get to the same accuracy can be 3x-4x of FMA version.

ezhulenev updated this revision to Diff 324318.Feb 17 2021, 8:34 AM
ezhulenev edited the summary of this revision. (Show Details)

Documen fma operation semantics

mehdi_amini accepted this revision.Feb 17 2021, 9:54 AM
This revision is now accepted and ready to land.Feb 17 2021, 9:54 AM
This revision was automatically updated to reflect the committed changes.