This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Inline ceil/floor/trunc for float and double
AbandonedPublic

Authored by reames on Jul 26 2022, 11:46 AM.

Details

Summary

For ceil, floor, and trunc we have existing custom lowering for vectors. This change extends that custom lowering for the scalar cases. One slight subtlety with the scalar case is that we must use the full XLEN for the integer type, and that for doubles on riscv32 we can't use this as we can't round trip the full range through integer.

I would appreciate careful review here. I am by no means a floating point expert. My reasoning on the correctness of this basically comes down to matching what the vector code already does and the knowledge that rounding modes are shared between scalar and vector so if the existing code gets that right, then presumably so must the scalar version.

Diff Detail

Event Timeline

reames created this revision.Jul 26 2022, 11:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2022, 11:46 AM
reames requested review of this revision.Jul 26 2022, 11:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2022, 11:46 AM
craig.topper added a comment.EditedJul 26 2022, 12:05 PM

The code in libm is already special cased for RISC-V and is I think optimal. It uses a range check and static rounding mode on the generated instructions. Other than removing the call overhead, the rest of this code is worse than what is in libm.

The code in libm is already special cased for RISC-V and is I think optimal. It uses a range check and static rounding mode on the generated instructions. Other than removing the call overhead, the rest of this code is worse than what is in libm.

Do you have a pointer to the exact code you're looking at? I thought I'd looked at the right libm implementation, and it wasn't as fully optimized as your describing.

Assuming you're correct - you probably are - would a patch which "just" inlines the libm version be reasonable?

craig.topper added a comment.EditedJul 26 2022, 12:41 PM

The code in libm is already special cased for RISC-V and is I think optimal. It uses a range check and static rounding mode on the generated instructions. Other than removing the call overhead, the rest of this code is worse than what is in libm.

Do you have a pointer to the exact code you're looking at? I thought I'd looked at the right libm implementation, and it wasn't as fully optimized as your describing.

Assuming you're correct - you probably are - would a patch which "just" inlines the libm version be reasonable?

https://github.com/bminor/glibc/blob/master/sysdeps/riscv/rvf/s_floorf.c

I had forgotten that the libm version needs to be strict fp correct so it also saves/restore fflags and checks for snan. We can probably inline without those things.

asb added a comment.Aug 1 2022, 3:10 AM

I'd wondered about FROUND too (I'd looked a little at this chatting to an intern implementing the equivalent operation in JSC). Given we don't have an instruction for fround like AArch64 does, it's not an obvious win unless the gains from avoiding the libcall (and perhaps strict fp correctness) are worth it, or if inlining exposes more opportunities for optimisation.

reames abandoned this revision.Aug 15 2022, 9:14 AM

Plan to return to topic, but this particular patch isn't going anywhere.