This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Suppress some foldings of rounding to fp16
Needs ReviewPublic

Authored by wristow on Jan 10 2023, 11:47 AM.

Details

Summary

Folding of 'x86_fp80 -> float -> half' into a direct rounding of
'x86_fp80 -> half' is legal when "unsafe-fp-math" is true, but it has
been suppressed for efficiency reasons because it generally results in
a run-time call rather than native instructions. The same applies for
vector versions, but those foldings have not been suppressed. The same
efficiency aspect applies to folding of 'double -> float -> half', but
those foldings have not been suppressed (scalar or vector).

This patch suppresses the foldings for the 'double' source (scalar and
vector), and for the vector version of 'x86_fp80' (to match the scalar
approach).

Diff Detail

Event Timeline

wristow created this revision.Jan 10 2023, 11:47 AM
wristow requested review of this revision.Jan 10 2023, 11:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 11:47 AM

I intend to commit an NFC version of the updated test that shows the current behavior of main, with TODOs in it indicating the desired changes.

Sounds like the lowering should just be improved to do the opposite of this?

Sounds like the lowering should just be improved to do the opposite of this?

Thanks for the quick response @lebedev.ri.

So (focusing on the 'double' source) by that, do you mean splitting 'double -> half' into 'double -> float -> half'? That isn't technically safe, although it could be done legally in "unsafe-fp-math" mode. Ironically, that's what used to happen, but 02fe96b24018bb8ce65cb264e0621459507cf989 suppressed that feature. We could modify the change of that commit to no longer suppress it in "unsafe-fp-math" mode.

FTR, that commit 02fe96b24018bb8ce65cb264e0621459507cf989 caused us trouble, in that we encountered user-code that essentially did:

void test(double d, __fp16 *pFp16) {
  *pFp16 = __fp16(float(d));
}

and that previously worked fine even though dagcombine folded this into a single rounding, __fp16(d), because the lowering un-did that folding. With that commit, this folding results in a call to __truncdfhf2 for our target (and some others). And we don't supply that routine in our run-time lib. Seeing the comment in dagcombine:
"f80 to f16 always generates an expensive (and as yet, unimplemented) libcall to __truncxfhf2"
I was motivated to expand the check here to also handle the folding from double (and to have the code here also handle the vector versions).

Until your suggestion, I hadn't considered modifying the change of 02fe96b24018bb8ce65cb264e0621459507cf989 to allow it to do the transform when "unsafe-fp-math" is on. Thinking about it now, that should also generate better code (when "unsafe-fp-math" is on) for the case where the user has explicitly written a conversion from double to __fp16. So that seems better to me. I'll do that, unless others come back with an argument to have this done in dagcombine.

pengfei added inline comments.Jan 10 2023, 10:38 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16274–16275

This equals to VT.getScalarType() == MVT::f16.

llvm/test/CodeGen/X86/fastmath-float-half-conversion.ll
68

It's still efficient to in the AVX case. Calling to single run-time is better than vcvtsd2ss + __truncsfhf2.

I'll re-work this to handle it in the lowering, as @lebedev.ri suggested. And take the other comments into account then.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16274–16275

Thanks @pengfei. (I should have seen that.)

llvm/test/CodeGen/X86/fastmath-float-half-conversion.ll
68

Good point. When I re-work this to handle things in the lowering, I'll take that into account.