If the matrix.multiply calls have the contract fast math flag, we can
use fmuladd. This als adds a command line option to force fmuladd
generation. We can retire this option once there is a clang-level
option.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Build result: FAILURE - Could not check out parent git hash "0e07e583a225ea05fa811c2ee2b64c757c8efe0e". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...
ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt
We should be able to use fmuladd instead of fmul/fadd, as this just
reduces the rounding error between operations.
I am concerned about the stability of the result in the presence of the FMA and non-FMA targets. We ought to generate the same result in IEEE fashion but with this we wouldn't. I think that this should still be driven by the fp-contract fp flag and perhaps we should have command-line option that enables this specifically for matmul?
I am concerned about the stability of the result in the presence of the FMA and non-FMA targets. We ought to generate the same result in IEEE fashion but with this we wouldn't. I think that this should still be driven by the fp-contract fp flag and perhaps we should have command-line option that enables this specifically for matmul?
Agreed! I've added a matrix-allow-contraction option. I'll add support for using fast-math flags as follow up
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
50 | Shouldn't this be using the fpflags from the intrinsic? I.e. I was thinking of a clang flag. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
50 | We can certainly add this on the clang level, although I am not sure how easy/hard it will be to add additional flags there. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
50 | Can we do both? I.e. propagate fp-contract from the intrinsic's fpflag and also allow the backend to globally enable this? I'd be fine with that. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
50 | Yep I've updated it to respect the FMF and also kept the option |
Shouldn't this be using the fpflags from the intrinsic? I.e. I was thinking of a clang flag.