Will remove `vector.fma` operation in the followup CLs.

# Details

# Diff Detail

- Repository
- rG LLVM Github Monorepo

### Event Timeline

Keep it in the same dialect as `add` and `mul` (is it going to be std)? I was thinking about adding it to `math`, but @herhut said it's "not math enough" and I agree. Current situation with `fma` only in vector is also questionable because `fma` doesn't have anything vector specific.

I agree that this may be closer to an "arithmetic" dialect than the math dialect, even though it is a bit on the edge: for example you're mapping this to an LLVM intrinsic which can be a pessimization for the optimizer compared to a sequence of `add(mul())` with the reassociation flag (this goes with my inline comment somehow)

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td | ||
---|---|---|

1463 | We need to specify this a bit more: for example do we guarantee the fused precision? How is is implemented on a target which does not have native FMA? What are the fast-math flags effect on this? What is the intended used for this? Why would someone use this operation rather than emitting |

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td | ||
---|---|---|

1463 | My concrete use case is polynomial approximations of Tanh, without fma perf is ~2x worse than Eigen, and at the same time I'm not sure that it is safe to turn on reassoc flag on a whole compiled module (and right now it seems impossible to emit Maybe add that semantics is the same as llvm.fma intrinsic? And all guarantees are whatever llvm provides (when lowered to LLVM)? |

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td | ||
---|---|---|

1463 | Talked a bit with Rasmus on this topic: FMA vs no-FMA has a huge difference for accuracy and polynomial approximation coefficients are different, turning fast-math on is not an option, and we need precise control how I guess polynomial approximation pass will just take a flag, fma on or off, and will select approximation based on that. |

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td | ||
---|---|---|

1463 | So it isn't a performance issue but a correctness one here?
Right I wouldn't suggest turning fast-math at a module level ever. I missed that we don't have fast-maths flag on these ops right now... |