Initial builtins for LoongArch.
Add loongarch64 to ALL_CRT_SUPPORTED_ARCH list.
Support fe_getround and fe_raise_inexact in builtins.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
compiler-rt/cmake/Modules/CompilerRTUtils.cmake | ||
---|---|---|
189 | It's better not to use GRLEN here. It will be possible to run loongarch32 code on a loongarch64 processor. Just use "Unsupported pointer size", I think. | |
compiler-rt/lib/builtins/loongarch/fp_mode.c | ||
34 | I think we can simplify this by | |
34 | ... removing this default:, and removing #else, and move #endif before return CRT_FE_TONEAREST. | |
47 | "ri" does not make sense here. From GCC documentation about extended inline asm (I guess Clang will behave the same way):
But movgr2fcsr does not allow an immediate operand, so i should not be allowed. |
compiler-rt/lib/builtins/loongarch/fp_mode.c | ||
---|---|---|
34 | Well, I'm not sure if the compiler is clever enough not to emit a warning here. If it believes we need default: please ignore my comment. |
compiler-rt/lib/builtins/loongarch/fp_mode.c | ||
---|---|---|
42 | I'm not sure about the expected semantics of __fe_raise_inexact. On i386, it's __asm__ __volatile__ ("fdivs %1" : "+t" (f) : "m" (g)); And I got: $ cat ex.c #define _GNU_SOURCE #include <fenv.h> int main() { float f = 1.0f, g = 3.0f; feenableexcept(FE_INEXACT); __asm__ __volatile__ ("fdivs %1" : "+t" (f) : "m" (g)); return 0; } $ cc ex.c -lm $ ./a.out floating point exception (core dumped) On AArch64, it's uint64_t fpsr; __asm__ __volatile__("mrs %0, fpsr" : "=r" (fpsr)); __asm__ __volatile__("msr fpsr, %0" : : "ri" (fpsr | AARCH64_INEXACT)); But I got the following result on an Apple M1: $ cat ex.c #define _GNU_SOURCE #include <fenv.h> #include <stdint.h> #include <stdio.h> int main() { uint64_t fpsr; if (feenableexcept(FE_INEXACT) != 0) perror("the CPU does not support trapping FP operation"); __asm__ __volatile__("mrs %0, fpsr" : "=r" (fpsr)); __asm__ __volatile__("msr fpsr, %0" : : "r" (fpsr | 0x10)); } $ cc ex.c -lm $ ./a.out Nothing happens. So which model should we use here? |
compiler-rt/lib/builtins/loongarch/fp_mode.c | ||
---|---|---|
34 | Compiling with clang (default option) does not show warnings, I wonder if I need to add other options to test? i386 without default:, arm/aarch64/riscv with default: in fp_mode.c | |
42 | On LoongArch: $ cat ex_gcc.c #define _GNU_SOURCE #include <fenv.h> #include <stdint.h> #include <stdio.h> int main() { int fcsr; __asm__ __volatile__("movfcsr2gr %0, $r0" : "=r" (fcsr)); __asm__ __volatile__("movgr2fcsr $r0, %0" ::"ri" (fcsr | 0x10000)); return 0; } $ cc ex.c -lm $ ./a.out Nothing happens. $ cat ex_clang.c #define _GNU_SOURCE #include <fenv.h> #include <stdint.h> #include <stdio.h> int main() { int fcsr; __asm__ __volatile__("movfcsr2gr %0, $fcsr0" : "=r" (fcsr)); __asm__ __volatile__("movgr2fcsr $fcsr0, %0" ::"ri" (fcsr | 0x10000)); return 0; } $ clang ex.c -lm $ ./a.out Nothing happens. fdivs does the actual floating point division, while aarch64/riscv/loongarch only sets the relevant register flags, thus making this difference possible. |
compiler-rt/lib/builtins/loongarch/fp_mode.c | ||
---|---|---|
42 | I can understand the logic. But it we just interpret the name __fe_raise_inexact, it should behave like feraiseexcept(FE_INEXACT). On LoongArch: #define _GNU_SOURCE #include <fenv.h> int main() { feenableexcept(FE_INEXACT); feraiseexcept(FE_INEXACT); } gives a SIGFPE. Maybe the name is not very precise then... But I can't find any documentation about the expected behavior of __fe_raise_inexact. |
@kongyi: Should __fe_raise_inexact behave exactly like feraiseexcept(FE_INVALID) (really raising the exception), or simply set the INVALID flag in the FP CSR?
compiler-rt/lib/builtins/loongarch/fp_mode.c | ||
---|---|---|
42 | gcc:
clang:
Is this an issue that user has to write different asm for different compilers? |
compiler-rt/lib/builtins/loongarch/fp_mode.c | ||
---|---|---|
42 | Regarding the $fcsrX vs $rX case, obviously it is binutils that needs to be fixed to accept $fcsrX in addition to the status quo. It's blatant violation of the manual syntax they wrote themselves, gravely misleading users. We may accept GPR in place of FCSR for compatibility (like the __loongarch64 case discussed in D136413), but it's definitely not the kind of thing that should get preserved forever. |
This interface was introduced in D57143, but I can't see any comments describing its intention. I think we can keep current implementation (like arm and riscv) except the Cause case.
compiler-rt/lib/builtins/loongarch/fp_mode.c | ||
---|---|---|
47 | Beside setting the Flags bits in fcsr, do we need to set the Cause bits? That is the bit[24]. |
If simply set the INVALID flag in the FP CSR. IMHO, just setting Flags, even if Cause is set, will not raise an actual exception.
Take the Division by Zero (Z) operation as an example:
$ cat ex.c #define _GNU_SOURCE #include <fenv.h> #include <stdint.h> #include <stdio.h> int main() { int fcsr; #if __clang__ __asm__ __volatile__("movfcsr2gr %0, $fcsr0" : "=r" (fcsr)); __asm__ __volatile__("movgr2fcsr $fcsr0, %0" :: "r" (fcsr | 0x1 << 4)); #else __asm__ __volatile__("movfcsr2gr %0, $r0" : "=r" (fcsr)); __asm__ __volatile__("movgr2fcsr $r0, %0" :: "r" (fcsr | 0x1 << 4)); #endif /* Division by Zero */ __asm__ __volatile__( "movgr2fr.d $f0, $zero\n\t" "fdiv.d $f1, $f0, $f0\n\t"); return 0; } $ clang ex.c -lm $ ./a.out Floating point exception (core dumped)
If we really need to raise the exception, we will need to enable the Enables bits.
BTW, why does the Z operation here need to actually enable V(Invaild Operation), instead of enabling bit 3?
Do we need to use __clang__ to differentiate between $fcsrX and $rX in fp_mode.c to avoid build errors with gcc, similar to the one in ex.c above?
No, "raising the exception" is not "raising SIGFPE". "2.0 / 3.0" raises the INEXACT exception, but does not raise SIGFPE unless INEXACT is enabled. If an enabled exception is raised, SIGFPE is raised.
A program can enable an exception by calling feenableexcept. Sometimes we use feenableexcept in programming contests to detect floating-point related bugs in our programs.
- Modify the content of message.
- Remove the i.
- Add subtf3_test.c and addtf3_test.c tests in LoongArch (RISCV can also add it).
compiler-rt/lib/builtins/loongarch/fp_mode.c | ||
---|---|---|
15–16 | This is not really meaningful... OR-ing non-bitflags together does NOT typically make sense. I believe the only reason it currently works is because LOONGARCH_DOWNWARD is effectively acting as the mask. Let's drop this and just use $fcsr3 for the FCSR moves regarding the rounding mode. This way some code could be simplified as well. | |
compiler-rt/test/builtins/Unit/addtf3_test.c | ||
69–70 | Could be defined(__loongarch_hard_float) instead? (On an unrelated note, I'm wondering why riscv isn't here for several seconds. I haven't dug deeper though.) |
compiler-rt/lib/builtins/loongarch/fp_mode.c | ||
---|---|---|
42 |
Agree with this point. | |
compiler-rt/test/builtins/Unit/addtf3_test.c | ||
69–70 |
In the case of __loongarch_soft_float, __loongarch_frlen may not be 0, but it can still read the hardware floating point registers, just don't pass the call parameters through the floating point registers.
riscv should also add it, but it was probably missed at the time. |
It's better not to use GRLEN here. It will be possible to run loongarch32 code on a loongarch64 processor.
Just use "Unsupported pointer size", I think.