The current implementation of truncf %* : f32 to bf16 in the expand
ops pass is slow and does not match the truncsfbf2 / float2bf16
function in the execution utilities. This has caused divergences in
how floats are truncated between code that uses this pass and code
which relies on truncsfbf2 implementations, creating spurious
failures in conformance testing.
Details
- Reviewers
kuhar nicolasvasilache rsuderman
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I agree that this is the right decomposition. The original implementation matched this behavior as well, however it was reverted due to other parties requesting truncation of fp should include rounding. Approving so you can land, just be aware this may bring wider discussions.
The current implementation of truncf %* : f32 to bf16 in the expand ops pass is slow and does not match the truncsfbf2 / float2bf16 function in the execution utilities.
Can you elaborate on the actual numerical difference between the current expansion and the proposed one?
Can you link to these previous discussion?
- I couldn't find this discussion. @rsuderman, do you have pointers?
- The numerical difference, as far as I can tell, has to do with the exact details of the rounding, as can be seen in the updated tests. I don't know exactly how to characterize it, though.
@rsuderman do you have some info on the original implementation (it was yours right?) and the change in numerics here?
mlir/test/mlir-cpu-runner/expand-arith-ops.mlir | ||
---|---|---|
47 | 0x7f7fffff is 3.40282347E+38 ; so the new expansion is less precise than the previous one? |
mlir/test/mlir-cpu-runner/expand-arith-ops.mlir | ||
---|---|---|
47 | Actually it is 3.40282347E+38 in f32, I don't know what's expected for this conversion, does IEEE says anything? LLVM says "Results assume the round-to-nearest rounding mode" in general. |
On Discourse, you linked to the LLVM backend lowering, which basically just the 16bits right shift: https://github.com/llvm/llvm-project/blob/1783185790de29b24d3850d33d9a9d586e6bbd39/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp#L3230
Your expansion mimics truncsfbf2 , but we could also modify truncsfbf2 instead.
The implementation claims it comes from Eigen, but here is Eigen https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/arch/Default/BFloat16.h#L347-359 ; it matches the LLVM backend
The original implementation landed was a non-rounding truncation, here is the revert of the pattern:
https://github.com/llvm/llvm-project/commit/3bde144de32dc09a0b227f7afcff94f908ac6739
I do not believe the reverter was necessarily right, I believe it just changed their downstream behavior so they reverted.
mlir/test/mlir-cpu-runner/expand-arith-ops.mlir | ||
---|---|---|
47 |
I think you are right Mehdi and this should not change. The new behavior is not quite clamping to the expected behavior. |
Does anyone know where that code in ExecutionEngine came from, then? Because perhaps that needs to be corrected as well
0x7f7fffff is 3.40282347E+38 ; so the new expansion is less precise than the previous one?