This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not.
ClosedPublic

Authored by craig.topper on Mar 15 2021, 4:38 PM.

Details

Summary

Without Zfh the half type isn't legal, but it could still be
used as an argument/return in IR. Clang will not generate this today.

Previously we promoted the half value to float for arguments and
returns if the F extension is enabled but Zfh isn't. Then depending on
which ABI is enabled we would pass it in either an FPR or a GPR in
float format.

If the F extension isn't enabled, it would get passed in the lower
16 bits of a GPR in half format.

With this patch the value will always in half format and will be
in the lower bits of a GPR or FPR. This should be consistent
with where the bits are located when Zfh is enabled.

I've based this implementation off of how this is done on ARM.

I've manually nan-boxed the value to 32 bits using integer ops.
It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't
canonicalize nans so should leave the value alone. I think those
are the instructions that could get used on this value.

This probably needs more tests to test the different ABIs,
running out of FPRs, running of GPRs, etc. I've done some spot
checking and everything looks ok so far. I'll add some directed
tests in the next day or two unless anyone has objections to this
patch.

Diff Detail

Event Timeline

craig.topper created this revision.Mar 15 2021, 4:38 PM
craig.topper requested review of this revision.Mar 15 2021, 4:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2021, 4:38 PM
Herald added a subscriber: MaskRay. · View Herald Transcript

I'm slightly concerned that the upper bits of of the FPR are garbage so this might be a valid float value or it might be NaN.

IIUC, psabi already has a rule for upper bits of the FPR?
When a floating-point argument narrower than FLEN bits is passed in a floating-point register, it is 1-extended (NaN-boxed) to FLEN bits.

NaN-box the value before putting it into a float register.

craig.topper edited the summary of this revision. (Show Details)Mar 22 2021, 12:13 PM

Add more exhaustive argument/return test.

Pass %s to FileCheck in RUN lines

I just found an issue is unrelated this patch, but related to fp16, RISC-V GCC using the traditional libgcc function name scheme like __extendhfdf2(__<op><srcT><dstT><N_OP>) rather than __gnu_f2h_ieee (__gnu_*2*_ieee).

Function used in GCC:

  • __extendhfsf2 for half -> float
  • __truncsfhf2 for float -> half
  • __extendhfdf2 for double -> half
  • __truncdfhf2 for half -> double
kito-cheng accepted this revision.Mar 29 2021, 6:24 PM

LGTM for NaN-boxing behavior.

This revision is now accepted and ready to land.Mar 29 2021, 6:24 PM
ldrumm added a subscriber: ldrumm.Mar 30 2021, 3:11 AM

I just found an issue is unrelated this patch, but related to fp16, RISC-V GCC using the traditional libgcc function name scheme like __extendhfdf2(__<op><srcT><dstT><N_OP>) rather than __gnu_f2h_ieee (__gnu_*2*_ieee).

Function used in GCC:

  • __extendhfsf2 for half -> float
  • __truncsfhf2 for float -> half
  • __extendhfdf2 for double -> half
  • __truncdfhf2 for half -> double

Yes. This was my finding as well. I taught llvm and compiler-rt about them in this patch (unmerged). Would this be useful to revisit?

kito-cheng added a comment.EditedMar 30 2021, 7:24 AM

I just found an issue is unrelated this patch, but related to fp16, RISC-V GCC using the traditional libgcc function name scheme like __extendhfdf2(__<op><srcT><dstT><N_OP>) rather than __gnu_f2h_ieee (__gnu_*2*_ieee).

Function used in GCC:

  • __extendhfsf2 for half -> float
  • __truncsfhf2 for float -> half
  • __extendhfdf2 for double -> half
  • __truncdfhf2 for half -> double

Yes. This was my finding as well. I taught llvm and compiler-rt about them in this patch (unmerged). Would this be useful to revisit?

Yeah, I believe that would be very useful, and I guess we also need __trunctfhf2 and __extendhftf2 like https://reviews.llvm.org/D86453, calling conversion function twice while convert half to double (fp64) or long double(fp128) seems not benefit anything.