This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt][builtins] Support builtins for LoongArch
ClosedPublic

Authored by tangyouling on Oct 20 2022, 5:06 AM.

Details

Summary

Initial builtins for LoongArch.
Add loongarch64 to ALL_CRT_SUPPORTED_ARCH list.
Support fe_getround and fe_raise_inexact in builtins.

Diff Detail

Event Timeline

tangyouling created this revision.Oct 20 2022, 5:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 5:06 AM
tangyouling requested review of this revision.Oct 20 2022, 5:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 5:06 AM
Herald added subscribers: Restricted Project, pcwang-thead, aheejin. · View Herald Transcript
xry111 added inline comments.Oct 20 2022, 5:28 AM
compiler-rt/cmake/Modules/CompilerRTUtils.cmake
189

It's better not to use GRLEN here. It will be possible to run loongarch32 code on a loongarch64 processor.

Just use "Unsupported pointer size", I think.

compiler-rt/lib/builtins/loongarch/fp_mode.c
33

I think we can simplify this by

33

... removing this default:, and removing #else, and move #endif before return CRT_FE_TONEAREST.

46

"ri" does not make sense here. From GCC documentation about extended inline asm (I guess Clang will behave the same way):

When you list more than one possible location (for example, "=rm"), the compiler chooses the most efficient one based on the current context. If you list as many alternates as the asm statement allows, you permit the optimizers to produce the best possible code.

But movgr2fcsr does not allow an immediate operand, so i should not be allowed.

tangyouling added inline comments.Oct 20 2022, 5:49 AM
compiler-rt/cmake/Modules/CompilerRTUtils.cmake
189

OK.

compiler-rt/lib/builtins/loongarch/fp_mode.c
33

OK

46

Thanks, I will modify it.

xry111 added inline comments.Oct 20 2022, 6:02 AM
compiler-rt/lib/builtins/loongarch/fp_mode.c
33

Well, I'm not sure if the compiler is clever enough not to emit a warning here. If it believes we need default: please ignore my comment.

xry111 added inline comments.Oct 20 2022, 6:10 AM
compiler-rt/lib/builtins/loongarch/fp_mode.c
41

I'm not sure about the expected semantics of __fe_raise_inexact. On i386, it's

__asm__ __volatile__ ("fdivs %1" : "+t" (f) : "m" (g));

And I got:

$ cat ex.c
#define _GNU_SOURCE
#include <fenv.h>
int main()
{
	float f = 1.0f, g = 3.0f;
	feenableexcept(FE_INEXACT);
	__asm__ __volatile__ ("fdivs %1" : "+t" (f) : "m" (g));
	return 0;
}
$ cc ex.c -lm
$ ./a.out
floating point exception (core dumped)

On AArch64, it's

uint64_t fpsr;
__asm__ __volatile__("mrs  %0, fpsr" : "=r" (fpsr));
__asm__ __volatile__("msr  fpsr, %0" : : "ri" (fpsr | AARCH64_INEXACT));

But I got the following result on an Apple M1:

$ cat ex.c
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	uint64_t fpsr;
	if (feenableexcept(FE_INEXACT) != 0)
		perror("the CPU does not support trapping FP operation");
	__asm__ __volatile__("mrs %0, fpsr" : "=r" (fpsr));
	__asm__ __volatile__("msr fpsr, %0" : : "r" (fpsr | 0x10));
}
$ cc ex.c -lm
$ ./a.out

Nothing happens.

So which model should we use here?

tangyouling added inline comments.Oct 20 2022, 7:25 PM
compiler-rt/lib/builtins/loongarch/fp_mode.c
33

Compiling with clang (default option) does not show warnings, I wonder if I need to add other options to test?

i386 without default:, arm/aarch64/riscv with default: in fp_mode.c

41

On LoongArch:

$ cat ex_gcc.c 
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	int fcsr;

	__asm__ __volatile__("movfcsr2gr %0, $r0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $r0, %0" ::"ri" (fcsr | 0x10000));

	return 0;
}
$ cc ex.c -lm
$ ./a.out

Nothing happens.

$ cat ex_clang.c 
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	int fcsr;

	__asm__ __volatile__("movfcsr2gr %0, $fcsr0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $fcsr0, %0" ::"ri" (fcsr | 0x10000));

	return 0;
}
$ clang ex.c -lm
$ ./a.out

Nothing happens.
Same phenomenon as aarch64 (guessing riscv works the same way).

fdivs does the actual floating point division, while aarch64/riscv/loongarch only sets the relevant register flags, thus making this difference possible.

xry111 added inline comments.Oct 20 2022, 11:22 PM
compiler-rt/lib/builtins/loongarch/fp_mode.c
41

I can understand the logic. But it we just interpret the name __fe_raise_inexact, it should behave like feraiseexcept(FE_INEXACT). On LoongArch:

#define _GNU_SOURCE
#include <fenv.h>

int main()
{
	feenableexcept(FE_INEXACT);
	feraiseexcept(FE_INEXACT);
}

gives a SIGFPE.

Maybe the name is not very precise then... But I can't find any documentation about the expected behavior of __fe_raise_inexact.

xry111 added a subscriber: kongyi.

@kongyi: Should __fe_raise_inexact behave exactly like feraiseexcept(FE_INVALID) (really raising the exception), or simply set the INVALID flag in the FP CSR?

SixWeining added inline comments.Oct 21 2022, 1:21 AM
compiler-rt/lib/builtins/loongarch/fp_mode.c
41

gcc:

asm volatile("movfcsr2gr %0, $r0" : "=r" (fcsr));
asm volatile("movgr2fcsr $r0, %0" ::"ri" (fcsr | 0x10000));

clang:

asm volatile("movfcsr2gr %0, $fcsr0" : "=r" (fcsr));
asm volatile("movgr2fcsr $fcsr0, %0" ::"ri" (fcsr | 0x10000));

Is this an issue that user has to write different asm for different compilers?

xen0n added inline comments.Oct 21 2022, 1:26 AM
compiler-rt/lib/builtins/loongarch/fp_mode.c
41

Regarding the $fcsrX vs $rX case, obviously it is binutils that needs to be fixed to accept $fcsrX in addition to the status quo. It's blatant violation of the manual syntax they wrote themselves, gravely misleading users. We may accept GPR in place of FCSR for compatibility (like the __loongarch64 case discussed in D136413), but it's definitely not the kind of thing that should get preserved forever.

@kongyi: Should __fe_raise_inexact behave exactly like feraiseexcept(FE_INVALID) (really raising the exception), or simply set the INVALID flag in the FP CSR?

This interface was introduced in D57143, but I can't see any comments describing its intention. I think we can keep current implementation (like arm and riscv) except the Cause case.

compiler-rt/lib/builtins/loongarch/fp_mode.c
46

Beside setting the Flags bits in fcsr, do we need to set the Cause bits? That is the bit[24].

@kongyi: Should __fe_raise_inexact behave exactly like feraiseexcept(FE_INVALID) (really raising the exception), or simply set the INVALID flag in the FP CSR?

This interface was introduced in D57143, but I can't see any comments describing its intention. I think we can keep current implementation (like arm and riscv) except the Cause case.

If simply set the INVALID flag in the FP CSR. IMHO, just setting Flags, even if Cause is set, will not raise an actual exception.

Take the Division by Zero (Z) operation as an example:

$ cat ex.c 
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	int fcsr;

#if __clang__
	__asm__ __volatile__("movfcsr2gr %0, $fcsr0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $fcsr0, %0" :: "r" (fcsr | 0x1 << 4));
#else
	__asm__ __volatile__("movfcsr2gr %0, $r0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $r0, %0" :: "r" (fcsr | 0x1 << 4));
#endif

	/* Division by Zero */
	__asm__ __volatile__(
			     "movgr2fr.d $f0, $zero\n\t"
			     "fdiv.d $f1, $f0, $f0\n\t");

	return 0;
}

$ clang ex.c -lm
$ ./a.out 
Floating point exception (core dumped)

If we really need to raise the exception, we will need to enable the Enables bits.
BTW, why does the Z operation here need to actually enable V(Invaild Operation), instead of enabling bit 3?

Do we need to use __clang__ to differentiate between $fcsrX and $rX in fp_mode.c to avoid build errors with gcc, similar to the one in ex.c above?

@kongyi: Should __fe_raise_inexact behave exactly like feraiseexcept(FE_INVALID) (really raising the exception), or simply set the INVALID flag in the FP CSR?

This interface was introduced in D57143, but I can't see any comments describing its intention. I think we can keep current implementation (like arm and riscv) except the Cause case.

If simply set the INVALID flag in the FP CSR. IMHO, just setting Flags, even if Cause is set, will not raise an actual exception.

Take the Division by Zero (Z) operation as an example:

$ cat ex.c 
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	int fcsr;

#if __clang__
	__asm__ __volatile__("movfcsr2gr %0, $fcsr0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $fcsr0, %0" :: "r" (fcsr | 0x1 << 4));
#else
	__asm__ __volatile__("movfcsr2gr %0, $r0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $r0, %0" :: "r" (fcsr | 0x1 << 4));
#endif

	/* Division by Zero */
	__asm__ __volatile__(
			     "movgr2fr.d $f0, $zero\n\t"
			     "fdiv.d $f1, $f0, $f0\n\t");

	return 0;
}

$ clang ex.c -lm
$ ./a.out 
Floating point exception (core dumped)

If we really need to raise the exception, we will need to enable the Enables bits.

No, "raising the exception" is not "raising SIGFPE". "2.0 / 3.0" raises the INEXACT exception, but does not raise SIGFPE unless INEXACT is enabled. If an enabled exception is raised, SIGFPE is raised.

A program can enable an exception by calling feenableexcept. Sometimes we use feenableexcept in programming contests to detect floating-point related bugs in our programs.

SixWeining accepted this revision.Oct 27 2022, 10:30 PM

The current impl LGTM since __fe_raise_inexact is not well documented.

compiler-rt/lib/builtins/loongarch/fp_mode.c
46

As @xry111 said, i should be removed.

This revision is now accepted and ready to land.Oct 27 2022, 10:30 PM
tangyouling updated this revision to Diff 471406.EditedOct 27 2022, 11:36 PM
  • Modify the content of message.
  • Remove the i.
  • Add subtf3_test.c and addtf3_test.c tests in LoongArch (RISCV can also add it).
tangyouling marked an inline comment as done and an inline comment as not done.

Remove the extra space.

MaskRay accepted this revision.Oct 28 2022, 11:54 PM

Happy when @xen0n or @xry111 is happy.

compiler-rt/lib/builtins/loongarch/fp_mode.c
24

asm volatile

26
45

asm volatile

xen0n added inline comments.Oct 29 2022, 1:52 AM
compiler-rt/lib/builtins/loongarch/fp_mode.c
16–17

This is not really meaningful... OR-ing non-bitflags together does NOT typically make sense. I believe the only reason it currently works is because LOONGARCH_DOWNWARD is effectively acting as the mask.

Let's drop this and just use $fcsr3 for the FCSR moves regarding the rounding mode. This way some code could be simplified as well.

compiler-rt/test/builtins/Unit/addtf3_test.c
69–70 ↗(On Diff #471414)

Could be defined(__loongarch_hard_float) instead?

(On an unrelated note, I'm wondering why riscv isn't here for several seconds. I haven't dug deeper though.)

tangyouling added inline comments.Oct 29 2022, 2:26 AM
compiler-rt/lib/builtins/loongarch/fp_mode.c
41

Regarding the $fcsrX vs $rX case, obviously it is binutils that needs to be fixed to accept $fcsrX in addition to the status quo. It's blatant violation of the manual syntax they wrote themselves, gravely misleading users. We may accept GPR in place of FCSR for compatibility (like the __loongarch64 case discussed in D136413), but it's definitely not the kind of thing that should get preserved forever.

Agree with this point.

compiler-rt/test/builtins/Unit/addtf3_test.c
69–70 ↗(On Diff #471414)

Could be defined(__loongarch_hard_float) instead?

In the case of __loongarch_soft_float, __loongarch_frlen may not be 0, but it can still read the hardware floating point registers, just don't pass the call parameters through the floating point registers.

(On an unrelated note, I'm wondering why riscv isn't here for several seconds. I haven't dug deeper though.)

riscv should also add it, but it was probably missed at the time.

This revision was landed with ongoing or failed builds.Nov 1 2022, 5:10 AM
This revision was automatically updated to reflect the committed changes.