This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Arith] Change F32 to BF16 truncation to match __truncsfbf2
Needs RevisionPublic

Authored by krzysz00 on Jul 26 2023, 1:17 PM.

Details

Summary

The current implementation of truncf %* : f32 to bf16 in the expand
ops pass is slow and does not match the truncsfbf2 / float2bf16
function in the execution utilities. This has caused divergences in
how floats are truncated between code that uses this pass and code
which relies on
truncsfbf2 implementations, creating spurious
failures in conformance testing.

Diff Detail

Event Timeline

krzysz00 created this revision.Jul 26 2023, 1:17 PM
Herald added a project: Restricted Project. · View Herald Transcript
krzysz00 requested review of this revision.Jul 26 2023, 1:17 PM
rsuderman accepted this revision.Aug 16 2023, 2:48 PM

I agree that this is the right decomposition. The original implementation matched this behavior as well, however it was reverted due to other parties requesting truncation of fp should include rounding. Approving so you can land, just be aware this may bring wider discussions.

This revision is now accepted and ready to land.Aug 16 2023, 2:48 PM

The current implementation of truncf %* : f32 to bf16 in the expand ops pass is slow and does not match the truncsfbf2 / float2bf16 function in the execution utilities.

Can you elaborate on the actual numerical difference between the current expansion and the proposed one?

I agree that this is the right decomposition. The original implementation matched this behavior as well, however it was reverted due to other parties requesting truncation of fp should include rounding.

Can you link to these previous discussion?

@mehdi_amini

  1. I couldn't find this discussion. @rsuderman, do you have pointers?
  1. The numerical difference, as far as I can tell, has to do with the exact details of the rounding, as can be seen in the updated tests. I don't know exactly how to characterize it, though.

@rsuderman do you have some info on the original implementation (it was yours right?) and the change in numerics here?

mlir/test/mlir-cpu-runner/expand-arith-ops.mlir
47

0x7f7fffff is 3.40282347E+38 ; so the new expansion is less precise than the previous one?

mehdi_amini added inline comments.Aug 26 2023, 3:12 PM
mlir/test/mlir-cpu-runner/expand-arith-ops.mlir
47

Actually it is 3.40282347E+38 in f32, I don't know what's expected for this conversion, does IEEE says anything?

LLVM says "Results assume the round-to-nearest rounding mode" in general.

On Discourse, you linked to the LLVM backend lowering, which basically just the 16bits right shift: https://github.com/llvm/llvm-project/blob/1783185790de29b24d3850d33d9a9d586e6bbd39/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp#L3230
Your expansion mimics truncsfbf2 , but we could also modify truncsfbf2 instead.
The implementation claims it comes from Eigen, but here is Eigen https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/arch/Default/BFloat16.h#L347-359 ; it matches the LLVM backend

@rsuderman do you have some info on the original implementation (it was yours right?) and the change in numerics here?

The original implementation landed was a non-rounding truncation, here is the revert of the pattern:
https://github.com/llvm/llvm-project/commit/3bde144de32dc09a0b227f7afcff94f908ac6739

I do not believe the reverter was necessarily right, I believe it just changed their downstream behavior so they reverted.

mlir/test/mlir-cpu-runner/expand-arith-ops.mlir
47

Actually it is 3.40282347E+38 in f32, I don't know what's expected for this conversion, does IEEE says anything?

LLVM says "Results assume the round-to-nearest rounding mode" in general.

I think you are right Mehdi and this should not change. The new behavior is not quite clamping to the expected behavior.

rsuderman requested changes to this revision.Aug 31 2023, 10:09 AM
This revision now requires changes to proceed.Aug 31 2023, 10:09 AM

Does anyone know where that code in ExecutionEngine came from, then? Because perhaps that needs to be corrected as well