This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add math polynomial approximation pass
ClosedPublic

Authored by ezhulenev on Feb 15 2021, 5:11 PM.

Details

Summary

This gives ~30x speedup compared to expanding Tanh into exp operations:

name                  old cpu/op  new cpu/op  delta
BM_mlir_Tanh_f32/10    253ns ± 3%    55ns ± 7%  -78.35%  (p=0.000 n=44+41)
BM_mlir_Tanh_f32/100  2.21µs ± 4%  0.14µs ± 8%  -93.85%  (p=0.000 n=48+49)
BM_mlir_Tanh_f32/1k   22.6µs ± 4%   0.7µs ± 5%  -96.68%  (p=0.000 n=32+42)
BM_mlir_Tanh_f32/10k   225µs ± 5%     7µs ± 6%  -96.88%  (p=0.000 n=49+55)

name                  old time/op             new time/op             delta
BM_mlir_Tanh_f32/10    259ns ± 1%               56ns ± 2%  -78.31%        (p=0.000 n=41+39)
BM_mlir_Tanh_f32/100  2.27µs ± 1%             0.14µs ± 5%  -93.89%        (p=0.000 n=46+49)
BM_mlir_Tanh_f32/1k   22.9µs ± 1%              0.8µs ± 4%  -96.67%        (p=0.000 n=30+42)
BM_mlir_Tanh_f32/10k   230µs ± 0%                7µs ± 3%  -96.88%        (p=0.000 n=37+55)

This approximations is based on Eigen::generic_fast_tanh function

Diff Detail

Event Timeline

ezhulenev created this revision.Feb 15 2021, 5:11 PM
ezhulenev requested review of this revision.Feb 15 2021, 5:11 PM
ezhulenev edited the summary of this revision. (Show Details)Feb 15 2021, 5:12 PM
ezhulenev added reviewers: tpopp, herhut.
mehdi_amini requested changes to this revision.Feb 15 2021, 5:24 PM

I don't think a pass is the right way to model this, if we want to have support for fast-math we should model it through ops on each attributes, on the model of what LLVM is doing.

This revision now requires changes to proceed.Feb 15 2021, 5:24 PM
herhut added a subscriber: ftynse.Feb 16 2021, 12:45 AM

I don't think a pass is the right way to model this, if we want to have support for fast-math we should model it through ops on each attributes, on the model of what LLVM is doing.

I think this is different than usual fast-math flags. This is not about special properties that allow for optimizations to apply that normally would not. These are different approximations for trigonometric functions, so in a way a different set of intrinsics or library implementations for math functions. For those, I think it makes sense to model them as lowering pattern that compilers can mix in if this is the implementation they want (and not a libm based implementation via function calls).

mlir/lib/Dialect/Math/Transforms/FastMathExpansion.cpp
39 ↗(On Diff #323851)

This adds yet another convenience builder infrastructure. What happened to EDSC, is that still the way to go? @ftynse do you know?

204 ↗(On Diff #323851)

Can you expose these via a populate method?

mlir/test/mlir-cpu-runner/fast-math.mlir
8 ↗(On Diff #323851)

Please drop the dump input.

I think this is different than usual fast-math flags.

Fast-math is probably a bad name for this, this is mostly about function approximations using polynomials, and other techniques to avoid using library functions.

ezhulenev updated this revision to Diff 324129.Feb 16 2021, 4:17 PM
ezhulenev marked an inline comment as done.

Rename fast-math to math-polynomial-approximation pass

ezhulenev retitled this revision from [mlir] Add fast math expansion pass to math dialect to [mlir] Add math polynomial approximation pass.Feb 16 2021, 4:17 PM

I don't think a pass is the right way to model this, if we want to have support for fast-math we should model it through ops on each attributes, on the model of what LLVM is doing.

I think this is different than usual fast-math flags. This is not about special properties that allow for optimizations to apply that normally would not. These are different approximations for trigonometric functions, so in a way a different set of intrinsics or library implementations for math functions. For those, I think it makes sense to model them as lowering pattern that compilers can mix in if this is the implementation they want (and not a libm based implementation via function calls).

You're right: the "fast" naming of the original revision threw me off here. Are these expansions equivalent to libm always right now?

Also the pass only makes sense as a "test" for the patterns, so I rather see the pass in the test folder. A "real" compiler should have a concept of target and legality and load the patterns it needs.

ezhulenev updated this revision to Diff 324236.Feb 17 2021, 1:55 AM

Register polynomial expansion as test pass

Are these expansions equivalent to libm always right now?

No, currently in Eigen they are usually different. And the plan for MLIR is to use https://www.sollya.org to build these math approximations going forward.

Also the pass only makes sense as a "test" for the patterns, so I rather see the pass in the test folder. A "real" compiler should have a concept of target and legality and load the patterns it needs.

Done.

mehdi_amini accepted this revision.Feb 17 2021, 9:50 AM
This revision is now accepted and ready to land.Feb 17 2021, 9:50 AM
tpopp added inline comments.Feb 18 2021, 1:02 AM
mlir/lib/Dialect/Math/Transforms/PolynomialApproximation.cpp
10

Maybe include a link to Sollya if that's the desired way to create new approximations.

148

I'm guessing Sollya generates the code in this way? It would still be nice to follow the LLVM case style.
s/alpha_/alpha/
s/beta_/beta/

ezhulenev updated this revision to Diff 324807.Feb 18 2021, 4:25 PM
ezhulenev marked 2 inline comments as done.

Fix xtyle warning

mravishankar added inline comments.
mlir/test/Dialect/Math/polynomial-approximation.mlir
7

Confused by the test. What is it supposed to generate instead?

ezhulenev marked an inline comment as done.Feb 19 2021, 12:41 PM
ezhulenev added inline comments.
mlir/lib/Dialect/Math/Transforms/PolynomialApproximation.cpp
10

I've discussed this today with Rasmus, and the plan is to create a tiny-dsl with roughly the same operations as Eigen packet operations (abstract layer that hides SIMD details) and port important approximations from Eigen. Maybe Sollya will be used to build some of them, but that is not clear yet.

148

Fixed all of that. This particular approximation is based on Eigen code (although it is used pretty much everywhere).

mlir/test/Dialect/Math/polynomial-approximation.mlir
7

Approximation built from std dialect operations. Don't want to put a lot of checks in this test, and tested correctness in a separate mlir-cpu-runner test.

This revision was automatically updated to reflect the committed changes.
ezhulenev marked an inline comment as done.