Page MenuHomePhabricator

[RISCV] Select +0.0 immediate using fmv.{w,d}.x / fcvt.d.w
ClosedPublic

Authored by rogfer01 on Mar 6 2020, 2:25 AM.

Details

Summary

Floating point positive zero can be selected using fmv.w.x / fmv.d.x / fcvt.d.w and a zero source register.

Diff Detail

Event Timeline

rogfer01 created this revision.Mar 6 2020, 2:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2020, 2:25 AM

GCC seems to use fmv.s.x / fmv.d.x (the former of which now should be called fmv.w.x? ) instead of fcvt, not sure why. A quick Googling returned:

http://gcc.1065356.n8.nabble.com/New-Port-for-RISC-V-td1338959.html

+;; Floating-point constant +0.0, used for FCVT-based moves when FMV is
+;; not available in RV32.

Which would suggest that, at least at some point in the past, there was a reason for preferring FMV over FCVT? Weird. I have to look into that more carefully.

A quick check with the GCC 10 in Fedora Rawhide shows the following results for this small testcase

float zero_float(void)
{
  return 0;
}

double zero_double(void)
{
  return 0;
}
-march=rv32gc -mabi=ilp32d-march=rv64gc -mabi=lp64d
zero_floatfmv.s.xfmv.s.x
zero_doublefcvt.d.wfmv.d.x

Perhaps these are slightly better than just fcvt everywhere because they avoid the rounding mode required by fcvt.s.w / fcvt.s.l / fcvt.d.l?

I'm happy to use those fmv.{s,d}.x instead.

Perhaps these are slightly better than just fcvt everywhere because they avoid the rounding mode required by fcvt.s.w / fcvt.s.l / fcvt.d.l?

With the rounding mode being stateless (it's per instruction), I'm don't know what the issue would be with that.

I'm happy to use those fmv.{s,d}.x instead.

I haven't yet had the opportunity to better investigate this. At first glance it seems fine to follow the GCC lead, and it could always be changed later, so if you want to update the patch for that I think I would be fine with approving it. But ideally we wouldn't just cargo cult this, and actually investigate what motivated that choice :-)
Thanks for sharing the table Roger.

Perhaps these are slightly better than just fcvt everywhere because they avoid the rounding mode required by fcvt.s.w / fcvt.s.l / fcvt.d.l?

With the rounding mode being stateless (it's per instruction), I'm don't know what the issue would be with that.

Unless you fast-path register zero (or the value itself being zero), a naive implementation is going to have to do a full integer->float conversion, which is more likely to be a multi-cycle operation, regardless of rounding mode. Contrast that with an fmv, which is a special case of fsgnj, but the generic case is still just bit selection and concatenation, so it should always be single-cycle. Having said that, the Rocket schedule (and reading the code) indicates that both are 2-cycle operations, and for the Bluespec Piccolo/Flute cores both are 1-cycle operations, but the GCC schedule for the 7 series SiFive cores claims that fmv will be in the A pipe (address, i.e. loads/stores, but also any FP<->int given that already needs to be present on the load/store paths), and fcvt will be in the B pipe (branches, but also mul/div and and any other FP ops).

Perhaps these are slightly better than just fcvt everywhere because they avoid the rounding mode required by fcvt.s.w / fcvt.s.l / fcvt.d.l?

With the rounding mode being stateless (it's per instruction), I'm don't know what the issue would be with that.

Unless you fast-path register zero (or the value itself being zero), a naive implementation is going to have to do a full integer->float conversion, which is more likely to be a multi-cycle operation, regardless of rounding mode. Contrast that with an fmv, which is a special case of fsgnj, but the generic case is still just bit selection and concatenation, so it should always be single-cycle. Having said that, the Rocket schedule (and reading the code) indicates that both are 2-cycle operations, and for the Bluespec Piccolo/Flute cores both are 1-cycle operations, but the GCC schedule for the 7 series SiFive cores claims that fmv will be in the A pipe (address, i.e. loads/stores, but also any FP<->int given that already needs to be present on the load/store paths), and fcvt will be in the B pipe (branches, but also mul/div and and any other FP ops).

Actually I got confused, fmv.x.?and fmv.?.x aren't special cases of fsgnj (that's only true for the FP<->FP fmv's), but their own instructions outright. The rest of what I said still applies though.

Unless you fast-path register zero (or the value itself being zero), a naive implementation is going to have to do a full integer->float conversion, which is more likely to be a multi-cycle operation, regardless of rounding mode. Contrast that with an fmv, which is a special case of fsgnj, but the generic case is still just bit selection and concatenation, so it should always be single-cycle. Having said that, the Rocket schedule (and reading the code) indicates that both are 2-cycle operations, and for the Bluespec Piccolo/Flute cores both are 1-cycle operations, but the GCC schedule for the 7 series SiFive cores claims that fmv will be in the A pipe (address, i.e. loads/stores, but also any FP<->int given that already needs to be present on the load/store paths), and fcvt will be in the B pipe (branches, but also mul/div and and any other FP ops).

Actually I got confused, fmv.x.?and fmv.?.x aren't special cases of fsgnj (that's only true for the FP<->FP fmv's), but their own instructions outright. The rest of what I said still applies though.

Thanks for the detailed info James. So the GCC table is the ideal implementation, and probably what we should adopt, it seems?

rogfer01 updated this revision to Diff 249753.Mar 11 2020, 2:23 PM
rogfer01 retitled this revision from [RISCV] Select +0.0 immediate using fcvt.{d,s}.{x,w} fN, x0 to [RISCV] Select +0.0 immediate using fmv.{w,d}.x / fcvt.d.w.
rogfer01 edited the summary of this revision. (Show Details)

ChangeLog:

  • Use fmv.{w,d}.x / fcvt.d.w instead
luismarques accepted this revision.Mar 17 2020, 10:48 AM

LGTM!

llvm/lib/Target/RISCV/RISCVInstrInfoF.td
397–411

Make sure the whitespace new line introduced here is consistent with the rest of the file.

This revision is now accepted and ready to land.Mar 17 2020, 10:48 AM
rogfer01 marked an inline comment as done.Mar 20 2020, 1:29 AM

Thanks a lot for the review @luismarques !

llvm/lib/Target/RISCV/RISCVInstrInfoF.td
397–411

Sure I'll do.

rogfer01 updated this revision to Diff 251577.Mar 20 2020, 2:28 AM

ChangeLog:

  • Remove whitespace prior committing
This revision was automatically updated to reflect the committed changes.