This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt][SelectionDAG] Add extendbfsf2 libcall and use it for bf16 extends with soft FP
AbandonedPublic

Authored by asb on May 25 2023, 6:04 AM.

Details

Summary

Previously this resulted in an assert (reproducible on RISC-V with soft FP). The existing code path assumes a libcall is present, and adding the libcall seems like the easiest fix. This libcall _is_ provided by libgcc, which perhaps providing its own motivation for adding it here.

The legalisation code in LegalizeDAG lowers to an anyext and shift which might be an alternative. This would however be more invasive to support vs just adding an extra case to the existing libcall lowering logic, and these soft targets are likely not a target we care strongly about BF16 support beyond wanting some basic support for completeness.

I'm not able to convince myself that the anyext+shift lowering is always identical to the more elaborate extension performed by the libcall in all cases (and if so, why do the trunc and extend libcalls even exist?). though I'm not sure I can convince myself. I know @craig.topper was involved in a previous discussion on this so I'd appreciate your view.

Diff Detail

Event Timeline

asb created this revision.May 25 2023, 6:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2023, 6:04 AM
Herald added subscribers: luke, wingo, Enna1 and 24 others. · View Herald Transcript
asb requested review of this revision.May 25 2023, 6:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2023, 6:04 AM

I'm not able to convince myself that the anyext+shift lowering is always identical to the more elaborate extension performed by the libcall in all cases (and if so, why do the trunc and extend libcalls even exist?). though I'm not sure I can convince myself. I know @craig.topper was involved in a previous discussion on this so I'd appreciate your view.

fp32 has more bits of mantissa than bfloat16 but they have the same number of exponent bits.

The trunc libcall exists because the extra bits of mantissa that exist in fp32 need to be rounded to convert to bfloat16. Also some f32 subnormal values can't be represented in bfloat16. So it can't be done as an integer truncate.

For extend, we should just need to add 0s to the end of the mantissa. The +0.0, -0.0 are encoded as all 0s in the mantissa and exponent in both encodings. infinity is encoded with a special exponent and all 0 mantissa in both formats. nan uses the same exponent as infinity but a non-zero mantissa. If the mantissa is already non-zero, adding more zeros doesn't change that. Adding zeros to the end of the mantissa for normals and denormals shouldn't change their value.

asb edited the summary of this revision. (Show Details)May 25 2023, 9:03 AM
asb added a comment.May 25 2023, 9:54 AM

For extend, we should just need to add 0s to the end of the mantissa. The +0.0, -0.0 are encoded as all 0s in the mantissa and exponent in both encodings. infinity is encoded with a special exponent and all 0 mantissa in both formats. nan uses the same exponent as infinity but a non-zero mantissa. If the mantissa is already non-zero, adding more zeros doesn't change that. Adding zeros to the end of the mantissa for normals and denormals shouldn't change their value.

And then we'd just lose out on FE_INVALID being set if the input is a signalling NaN - it seems libgcc does have some support for setting these exception bits (on some platforms at least, with the right support hooks implemented) while compiler-rt has none. So I think that justifies the libcall for them. Thanks for helping clear that up.

You would only need to worry about snans with the constrained fptrunc