This is an archive of the discontinued LLVM Phabricator instance.

Promote bf16 to f32 when the target doesn't support it
ClosedPublic

Authored by bkramer on Jun 3 2022, 2:24 AM.

Details

Summary

This is modeled after the half-precision fp support. Two new nodes are
introduced for casting from and to bf16. Since casting from bf16 is a
simple operation I opted to always directly lower it to integer
arithmetic. The other way round is more complicated if you want to
preserve IEEE semantics, so it's handled by a new __truncsfbf2
compiler-rt builtin.

This is of course very bare bones, but sufficient to get a semi-softened
fadd on x86.

Possible future improvements:

  • Targets with bf16 conversion instructions can now make fp_to_bf16 legal
  • The software conversion to bf16 can be replaced by a trivial implementation under fast math.

Diff Detail

Event Timeline

bkramer created this revision.Jun 3 2022, 2:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2022, 2:24 AM
bkramer requested review of this revision.Jun 3 2022, 2:24 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 3 2022, 2:24 AM
Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript

Post-wwdc ping.

t.p.northover accepted this revision.Jun 15 2022, 1:11 AM

Looks like everything's in place and working to me.

This revision is now accepted and ready to land.Jun 15 2022, 1:11 AM
This revision was landed with ongoing or failed builds.Jun 15 2022, 4:01 AM
This revision was automatically updated to reflect the committed changes.
arsenm added a subscriber: arsenm.Dec 10 2022, 5:08 AM
arsenm added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2915–2918

Why can this just shift into the high bits? Why don't the mantissa bits need to be adjusted down to the low bits?

pengfei added inline comments.Dec 10 2022, 7:14 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2915–2918

Expand a normal value doesn't need to adjust the mantissa bits. We do have concerns like DAZ or signaling NaN are not respected. But BF16 is not a IEEE standard type. There's no such strict rule for it AFAIK. And GCC does it in the same way.

lebedev.ri added a subscriber: lebedev.ri.EditedDec 10 2022, 8:02 AM

FWIW, i agree with @arsenm, the legalization is wrong.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2915–2918

This answer makes no sense. This expansion is an active miscompile.
The proper way to lower it is https://godbolt.org/z/GzM3n7Tdc

FWIW, i agree with @arsenm, the legalization is wrong.

The lowering is correct. Mantissa for ieee numbers are normalized by shifting left to avoid storing the first 1.

Consider the number 1.5. In f32 it is stored as 0x3fc00000
sign = 0
exponent = 127
mantissa = 0x400000

1.5 in bfloat16 is 0x3fc0
sign = 0
exponent = 127
mantissa = 0x400

FWIW, i agree with @arsenm, the legalization is wrong.

The lowering is correct. Mantissa for ieee numbers are normalized by shifting left to avoid storing the first 1.

Consider the number 1.5. In f32 it is stored as 0x3fc00000
sign = 0
exponent = 127
mantissa = 0x400000

1.5 in bfloat16 is 0x3fc0
sign = 0
exponent = 127
mantissa = 0x400

Ok, i forgot that bit (i even implemented similar widening elsewhere previously!).
So yes, this is identical except for subnormals

My apologies...

craig.topper added a comment.EditedDec 10 2022, 1:04 PM

FWIW, i agree with @arsenm, the legalization is wrong.

The lowering is correct. Mantissa for ieee numbers are normalized by shifting left to avoid storing the first 1.

Consider the number 1.5. In f32 it is stored as 0x3fc00000
sign = 0
exponent = 127
mantissa = 0x400000

1.5 in bfloat16 is 0x3fc0
sign = 0
exponent = 127
mantissa = 0x400

Ok, i forgot that bit (i even implemented similar widening elsewhere previously!).
So yes, this is identical except for subnormals

My apologies...

It should be the same even for subnormals. The exponents in float32 and bfloat16 are the same width and use the same bias. A subnormal in bfloat16 can't be normalized in float32. The exponent can't get any smaller.

It appears the code in RawSpeed assumes the differences in biases is greater than width of the mantissa of the smaller type. If the number of shifts needed to normalize is greater than the difference in bias, the exponent will go negative, but the code doesn't check for that.

Craig is correct, a subnormal in bfloat16 is also subnormal in fp32 if no DAZ.