This patch makes it so, floating-point instructions created in
LowerMatrixIntrinsics retain fast-math flags from instructions that are
higher up the chain.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
| Time | Test | |
|---|---|---|
| 10,190 ms | x64 debian > libarcher.races::lock-unrelated.c |
Event Timeline
Thanks for the patch! Could you add test coverage for lowering of the other FP instructions (fsub, fmul, fneg)? And maybe also with some variations of individual FMFs.
| llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
|---|---|---|
| 1161–1163 | I think it would be cleaner to just pass in the FMF object here? We don't need the instruction for anything else. Also, I think we don't need to separately pass in AllowContraction any longer? | |
| llvm/test/Transforms/LowerMatrixIntrinsics/preserve-existing-fast-math-flags.ll | ||
| 17 | Is there a reason we need the insert element instructions here, rather than passing in the second operand as vector to the function as well? | |
| llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
|---|---|---|
| 1161–1163 | +1 | |
Pass FastMathFlags directly into emitMatrixMultiply and remove the
AllowContraction operator from emitMatrixMultiply
Thanks for the update! A few more suggestions about dealing with the FMF object.
| llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
|---|---|---|
| 407 | Instead of returning None, could we just return a FastMathFlags object with all options set the false (which is the default) ? | |
| 1183 | I think this should be the other way around, if AllowContractEnabled we could set AllowContraction on FMF. This default could be set in getFastMathFlags? | |
| 1689 | Can we instead use the FMF object in those versions instead? AllowContraction option should probably also add the contract flag to those. | |
LGTM with the inline nits. Thanks!
| llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
|---|---|---|
| 410 | nit: drop {} | |
| llvm/test/Transforms/LowerMatrixIntrinsics/preserve-existing-fast-math-flags.ll | ||
| 179 | could you split up the test, possibly into separate ones for each multiply? Otherwise the IR to check for a single function seems unnecessarily big. | |
Instead of returning None, could we just return a FastMathFlags object with all options set the false (which is the default) ?