Will remove vector.fma operation in the followup CLs.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Keep it in the same dialect as add and mul (is it going to be std)? I was thinking about adding it to math, but @herhut said it's "not math enough" and I agree. Current situation with fma only in vector is also questionable because fma doesn't have anything vector specific.
I agree that this may be closer to an "arithmetic" dialect than the math dialect, even though it is a bit on the edge: for example you're mapping this to an LLVM intrinsic which can be a pessimization for the optimizer compared to a sequence of add(mul()) with the reassociation flag (this goes with my inline comment somehow)
mlir/include/mlir/Dialect/StandardOps/IR/Ops.td | ||
---|---|---|
1463 | We need to specify this a bit more: for example do we guarantee the fused precision? How is is implemented on a target which does not have native FMA? What are the fast-math flags effect on this? What is the intended used for this? Why would someone use this operation rather than emitting add(mul()) with some fast-math reassociation flags? Can we codify this here better? |
mlir/include/mlir/Dialect/StandardOps/IR/Ops.td | ||
---|---|---|
1463 | My concrete use case is polynomial approximations of Tanh, without fma perf is ~2x worse than Eigen, and at the same time I'm not sure that it is safe to turn on reassoc flag on a whole compiled module (and right now it seems impossible to emit add and mul with separate flags). Maybe add that semantics is the same as llvm.fma intrinsic? And all guarantees are whatever llvm provides (when lowered to LLVM)? |
mlir/include/mlir/Dialect/StandardOps/IR/Ops.td | ||
---|---|---|
1463 | Talked a bit with Rasmus on this topic: FMA vs no-FMA has a huge difference for accuracy and polynomial approximation coefficients are different, turning fast-math on is not an option, and we need precise control how add(mul(...)) is executed (fma vs non-fma). I guess polynomial approximation pass will just take a flag, fma on or off, and will select approximation based on that. |
mlir/include/mlir/Dialect/StandardOps/IR/Ops.td | ||
---|---|---|
1463 | So it isn't a performance issue but a correctness one here?
Right I wouldn't suggest turning fast-math at a module level ever. I missed that we don't have fast-maths flag on these ops right now... |
We need to specify this a bit more: for example do we guarantee the fused precision? How is is implemented on a target which does not have native FMA? What are the fast-math flags effect on this?
What is the intended used for this? Why would someone use this operation rather than emitting add(mul()) with some fast-math reassociation flags? Can we codify this here better?