This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/
-
cmake/
-
Modules/
2
CompilerRTUtils.cmake
-
base-config-ix.cmake
-
builtin-config-ix.cmake
-
crt-config-ix.cmake
-
lib/builtins/
-
builtins/
-
CMakeLists.txt
-
loongarch/
6/19
fp_mode.c
-
test/builtins/Unit/
-
builtins/
-
Unit/
1/2
addtf3_test.c
-
subtf3_test.c

Differential D136338

[compiler-rt][builtins] Support builtins for LoongArch
ClosedPublic

Authored by tangyouling on Oct 20 2022, 5:06 AM.

Download Raw Diff

Details

Reviewers

SixWeining
wangleiat
xen0n
xry111
MaskRay
kongyi

Commits

rG6e6704b0dc2c: [compiler-rt][builtins] Support builtins for LoongArch

Summary

Initial builtins for LoongArch.
Add loongarch64 to ALL_CRT_SUPPORTED_ARCH list.
Support fe_getround and fe_raise_inexact in builtins.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tangyouling created this revision.Oct 20 2022, 5:06 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 5:06 AM

Herald added subscribers: Enna1, StephenFan, s.egerton and 2 others. · View Herald Transcript

tangyouling requested review of this revision.Oct 20 2022, 5:06 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 5:06 AM

Herald added subscribers: Restricted Project, • pcwang-thead, aheejin. · View Herald Transcript

Harbormaster completed remote builds in B193192: Diff 469169.Oct 20 2022, 5:25 AM

xry111 added inline comments.Oct 20 2022, 5:28 AM

compiler-rt/cmake/Modules/CompilerRTUtils.cmake
189	It's better not to use `GRLEN` here. It will be possible to run loongarch32 code on a loongarch64 processor. Just use "Unsupported pointer size", I think.
compiler-rt/lib/builtins/loongarch/fp_mode.c
34	I think we can simplify this by
34	... removing this `default:`, and removing `#else`, and move `#endif` before `return CRT_FE_TONEAREST`.
47	`"ri"` does not make sense here. From GCC documentation about extended inline asm (I guess Clang will behave the same way): When you list more than one possible location (for example, "=rm"), the compiler chooses the most efficient one based on the current context. If you list as many alternates as the asm statement allows, you permit the optimizers to produce the best possible code. But `movgr2fcsr` does not allow an immediate operand, so `i` should not be allowed.

tangyouling added inline comments.Oct 20 2022, 5:49 AM

compiler-rt/cmake/Modules/CompilerRTUtils.cmake
189	OK.
compiler-rt/lib/builtins/loongarch/fp_mode.c
34	OK
47	Thanks, I will modify it.

xry111 added inline comments.Oct 20 2022, 6:02 AM

compiler-rt/lib/builtins/loongarch/fp_mode.c
34	Well, I'm not sure if the compiler is clever enough not to emit a warning here. If it believes we need `default:` please ignore my comment.

xry111 added inline comments.Oct 20 2022, 6:10 AM

compiler-rt/lib/builtins/loongarch/fp_mode.c

I'm not sure about the expected semantics of __fe_raise_inexact. On i386, it's

__asm__ __volatile__ ("fdivs %1" : "+t" (f) : "m" (g));

And I got:

$ cat ex.c
#define _GNU_SOURCE
#include <fenv.h>
int main()
{
	float f = 1.0f, g = 3.0f;
	feenableexcept(FE_INEXACT);
	__asm__ __volatile__ ("fdivs %1" : "+t" (f) : "m" (g));
	return 0;
}
$ cc ex.c -lm
$ ./a.out
floating point exception (core dumped)

On AArch64, it's

uint64_t fpsr;
__asm__ __volatile__("mrs  %0, fpsr" : "=r" (fpsr));
__asm__ __volatile__("msr  fpsr, %0" : : "ri" (fpsr | AARCH64_INEXACT));

But I got the following result on an Apple M1:

$ cat ex.c
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	uint64_t fpsr;
	if (feenableexcept(FE_INEXACT) != 0)
		perror("the CPU does not support trapping FP operation");
	__asm__ __volatile__("mrs %0, fpsr" : "=r" (fpsr));
	__asm__ __volatile__("msr fpsr, %0" : : "r" (fpsr | 0x10));
}
$ cc ex.c -lm
$ ./a.out

Nothing happens.

So which model should we use here?

tangyouling added inline comments.Oct 20 2022, 7:25 PM

compiler-rt/lib/builtins/loongarch/fp_mode.c

Compiling with clang (default option) does not show warnings, I wonder if I need to add other options to test?

i386 without default:, arm/aarch64/riscv with default: in fp_mode.c

On LoongArch:

$ cat ex_gcc.c 
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	int fcsr;

	__asm__ __volatile__("movfcsr2gr %0, $r0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $r0, %0" ::"ri" (fcsr | 0x10000));

	return 0;
}
$ cc ex.c -lm
$ ./a.out

Nothing happens.

$ cat ex_clang.c 
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	int fcsr;

	__asm__ __volatile__("movfcsr2gr %0, $fcsr0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $fcsr0, %0" ::"ri" (fcsr | 0x10000));

	return 0;
}
$ clang ex.c -lm
$ ./a.out

Nothing happens.
Same phenomenon as aarch64 (guessing riscv works the same way).

fdivs does the actual floating point division, while aarch64/riscv/loongarch only sets the relevant register flags, thus making this difference possible.

xry111 added inline comments.Oct 20 2022, 11:22 PM

compiler-rt/lib/builtins/loongarch/fp_mode.c
42	I can understand the logic. But it we just interpret the name `__fe_raise_inexact`, it should behave like `feraiseexcept(FE_INEXACT)`. On LoongArch: #define _GNU_SOURCE #include <fenv.h> int main() { feenableexcept(FE_INEXACT); feraiseexcept(FE_INEXACT); } gives a SIGFPE. Maybe the name is not very precise then... But I can't find any documentation about the expected behavior of `__fe_raise_inexact`.

@kongyi: Should __fe_raise_inexact behave exactly like feraiseexcept(FE_INVALID) (really raising the exception), or simply set the INVALID flag in the FP CSR?

SixWeining added inline comments.Oct 21 2022, 1:21 AM

compiler-rt/lib/builtins/loongarch/fp_mode.c
42	gcc: asm volatile("movfcsr2gr %0, $r0" : "=r" (fcsr)); asm volatile("movgr2fcsr $r0, %0" ::"ri" (fcsr \| 0x10000)); clang: asm volatile("movfcsr2gr %0, $fcsr0" : "=r" (fcsr)); asm volatile("movgr2fcsr $fcsr0, %0" ::"ri" (fcsr \| 0x10000)); Is this an issue that user has to write different asm for different compilers?

xen0n added inline comments.Oct 21 2022, 1:26 AM

compiler-rt/lib/builtins/loongarch/fp_mode.c
42	Regarding the `$fcsrX` vs `$rX` case, obviously it is binutils that needs to be fixed to accept `$fcsrX` in addition to the status quo. It's blatant violation of the manual syntax they wrote themselves, gravely misleading users. We may accept GPR in place of FCSR for compatibility (like the `__loongarch64` case discussed in D136413), but it's definitely not the kind of thing that should get preserved forever.

In D136338#3873455, @xry111 wrote:

@kongyi: Should __fe_raise_inexact behave exactly like feraiseexcept(FE_INVALID) (really raising the exception), or simply set the INVALID flag in the FP CSR?

This interface was introduced in D57143, but I can't see any comments describing its intention. I think we can keep current implementation (like arm and riscv) except the Cause case.

compiler-rt/lib/builtins/loongarch/fp_mode.c
47	Beside setting the `Flags` bits in fcsr, do we need to set the `Cause` bits? That is the `bit[24]`.

In D136338#3878346, @SixWeining wrote:

In D136338#3873455, @xry111 wrote:

@kongyi: Should __fe_raise_inexact behave exactly like feraiseexcept(FE_INVALID) (really raising the exception), or simply set the INVALID flag in the FP CSR?

This interface was introduced in D57143, but I can't see any comments describing its intention. I think we can keep current implementation (like arm and riscv) except the Cause case.

If simply set the INVALID flag in the FP CSR. IMHO, just setting Flags, even if Cause is set, will not raise an actual exception.

Take the Division by Zero (Z) operation as an example:

$ cat ex.c 
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	int fcsr;

#if __clang__
	__asm__ __volatile__("movfcsr2gr %0, $fcsr0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $fcsr0, %0" :: "r" (fcsr | 0x1 << 4));
#else
	__asm__ __volatile__("movfcsr2gr %0, $r0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $r0, %0" :: "r" (fcsr | 0x1 << 4));
#endif

	/* Division by Zero */
	__asm__ __volatile__(
			     "movgr2fr.d $f0, $zero\n\t"
			     "fdiv.d $f1, $f0, $f0\n\t");

	return 0;
}

$ clang ex.c -lm
$ ./a.out 
Floating point exception (core dumped)

If we really need to raise the exception, we will need to enable the Enables bits.
BTW, why does the Z operation here need to actually enable V(Invaild Operation), instead of enabling bit 3?

Do we need to use __clang__ to differentiate between $fcsrX and $rX in fp_mode.c to avoid build errors with gcc, similar to the one in ex.c above?

In D136338#3884362, @tangyouling wrote:
In D136338#3878346, @SixWeining wrote:

In D136338#3873455, @xry111 wrote:

@kongyi: Should __fe_raise_inexact behave exactly like feraiseexcept(FE_INVALID) (really raising the exception), or simply set the INVALID flag in the FP CSR?

This interface was introduced in D57143, but I can't see any comments describing its intention. I think we can keep current implementation (like arm and riscv) except the Cause case.

If simply set the INVALID flag in the FP CSR. IMHO, just setting Flags, even if Cause is set, will not raise an actual exception.

Take the Division by Zero (Z) operation as an example:
$ cat ex.c 
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	int fcsr;

#if __clang__
	__asm__ __volatile__("movfcsr2gr %0, $fcsr0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $fcsr0, %0" :: "r" (fcsr | 0x1 << 4));
#else
	__asm__ __volatile__("movfcsr2gr %0, $r0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $r0, %0" :: "r" (fcsr | 0x1 << 4));
#endif

	/* Division by Zero */
	__asm__ __volatile__(
			     "movgr2fr.d $f0, $zero\n\t"
			     "fdiv.d $f1, $f0, $f0\n\t");

	return 0;
}

$ clang ex.c -lm
$ ./a.out 
Floating point exception (core dumped)
If we really need to raise the exception, we will need to enable the Enables bits.

No, "raising the exception" is not "raising SIGFPE". "2.0 / 3.0" raises the INEXACT exception, but does not raise SIGFPE unless INEXACT is enabled. If an enabled exception is raised, SIGFPE is raised.

A program can enable an exception by calling feenableexcept. Sometimes we use feenableexcept in programming contests to detect floating-point related bugs in our programs.

The current impl LGTM since __fe_raise_inexact is not well documented.

compiler-rt/lib/builtins/loongarch/fp_mode.c
47	As @xry111 said, `i` should be removed.

This revision is now accepted and ready to land.Oct 27 2022, 10:30 PM

Modify the content of message.
Remove the i.
Add subtf3_test.c and addtf3_test.c tests in LoongArch (RISCV can also add it).

Remove the extra space.

Harbormaster completed remote builds in B194838: Diff 471414.Oct 28 2022, 12:19 AM

tangyouling added a child revision: D136921: [builtins][LoongArch] Port __clear_cache to LoongArch Linux.Oct 28 2022, 12:32 AM

tangyouling added a child revision: D137002: [crt][LoongArch] Support LoongArch when building without init_array.Oct 28 2022, 7:41 PM

Happy when @xen0n or @xry111 is happy.

compiler-rt/lib/builtins/loongarch/fp_mode.c
24	`asm volatile`
26
45	`asm volatile`

xen0n added inline comments.Oct 29 2022, 1:52 AM

compiler-rt/lib/builtins/loongarch/fp_mode.c
16–17	This is not really meaningful... OR-ing non-bitflags together does NOT typically make sense. I believe the only reason it currently works is because `LOONGARCH_DOWNWARD` is effectively acting as the mask. Let's drop this and just use `$fcsr3` for the FCSR moves regarding the rounding mode. This way some code could be simplified as well.
compiler-rt/test/builtins/Unit/addtf3_test.c
69–71	Could be `defined(__loongarch_hard_float)` instead? (On an unrelated note, I'm wondering why riscv isn't here for several seconds. I haven't dug deeper though.)

tangyouling added inline comments.Oct 29 2022, 2:26 AM

compiler-rt/lib/builtins/loongarch/fp_mode.c
42	Regarding the `$fcsrX` vs `$rX` case, obviously it is binutils that needs to be fixed to accept `$fcsrX` in addition to the status quo. It's blatant violation of the manual syntax they wrote themselves, gravely misleading users. We may accept GPR in place of FCSR for compatibility (like the `__loongarch64` case discussed in D136413), but it's definitely not the kind of thing that should get preserved forever. Agree with this point.
compiler-rt/test/builtins/Unit/addtf3_test.c
69–71	Could be `defined(__loongarch_hard_float)` instead? In the case of `__loongarch_soft_float`, `__loongarch_frlen` may not be 0, but it can still read the hardware floating point registers, just don't pass the call parameters through the floating point registers. (On an unrelated note, I'm wondering why riscv isn't here for several seconds. I haven't dug deeper though.) riscv should also add it, but it was probably missed at the time.

This revision was landed with ongoing or failed builds.Nov 1 2022, 5:10 AM

Closed by commit rG6e6704b0dc2c: [compiler-rt][builtins] Support builtins for LoongArch (authored by tangyouling, committed by SixWeining). · Explain Why

This revision was automatically updated to reflect the committed changes.

SixWeining added a commit: rG6e6704b0dc2c: [compiler-rt][builtins] Support builtins for LoongArch.

Revision Contents

Path

Size

compiler-rt/

cmake/

Modules/

CompilerRTUtils.cmake

9 lines

base-config-ix.cmake

2 lines

builtin-config-ix.cmake

3 lines

crt-config-ix.cmake

3 lines

lib/

builtins/

CMakeLists.txt

8 lines

loongarch/

fp_mode.c

49 lines

test/

builtins/

Unit/

addtf3_test.c

3 lines

subtf3_test.c

3 lines

Diff 472260

compiler-rt/cmake/Modules/CompilerRTUtils.cmake

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
endmacro()		endmacro()

macro(detect_target_arch)		macro(detect_target_arch)
check_symbol_exists(__arm__ "" __ARM)		check_symbol_exists(__arm__ "" __ARM)
check_symbol_exists(__AVR__ "" __AVR)		check_symbol_exists(__AVR__ "" __AVR)
check_symbol_exists(__aarch64__ "" __AARCH64)		check_symbol_exists(__aarch64__ "" __AARCH64)
check_symbol_exists(__x86_64__ "" __X86_64)		check_symbol_exists(__x86_64__ "" __X86_64)
check_symbol_exists(__i386__ "" __I386)		check_symbol_exists(__i386__ "" __I386)
		check_symbol_exists(__loongarch__ "" __LOONGARCH)
check_symbol_exists(__mips__ "" __MIPS)		check_symbol_exists(__mips__ "" __MIPS)
check_symbol_exists(__mips64__ "" __MIPS64)		check_symbol_exists(__mips64__ "" __MIPS64)
check_symbol_exists(__powerpc__ "" __PPC)		check_symbol_exists(__powerpc__ "" __PPC)
check_symbol_exists(__powerpc64__ "" __PPC64)		check_symbol_exists(__powerpc64__ "" __PPC64)
check_symbol_exists(__powerpc64le__ "" __PPC64LE)		check_symbol_exists(__powerpc64le__ "" __PPC64LE)
check_symbol_exists(__riscv "" __RISCV)		check_symbol_exists(__riscv "" __RISCV)
check_symbol_exists(__s390x__ "" __S390X)		check_symbol_exists(__s390x__ "" __S390X)
check_symbol_exists(__sparc "" __SPARC)		check_symbol_exists(__sparc "" __SPARC)
Show All 12 Lines	if(CMAKE_SIZEOF_VOID_P EQUAL "4")
add_default_target_arch(x32)		add_default_target_arch(x32)
elseif(CMAKE_SIZEOF_VOID_P EQUAL "8")		elseif(CMAKE_SIZEOF_VOID_P EQUAL "8")
add_default_target_arch(x86_64)		add_default_target_arch(x86_64)
else()		else()
message(FATAL_ERROR "Unsupported pointer size for X86_64")		message(FATAL_ERROR "Unsupported pointer size for X86_64")
endif()		endif()
elseif(__I386)		elseif(__I386)
add_default_target_arch(i386)		add_default_target_arch(i386)
		elseif(__LOONGARCH)
		if(CMAKE_SIZEOF_VOID_P EQUAL "4")
		add_default_target_arch(loongarch32)
		elseif(CMAKE_SIZEOF_VOID_P EQUAL "8")
		add_default_target_arch(loongarch64)
		else()
		message(FATAL_ERROR "Unsupported pointer size for LoongArch")
		xry111Unsubmitted Not Done Reply Inline Actions It's better not to use `GRLEN` here. It will be possible to run loongarch32 code on a loongarch64 processor. Just use "Unsupported pointer size", I think. xry111: It's better not to use `GRLEN` here. It will be possible to run loongarch32 code on a…
		tangyoulingAuthorUnsubmitted Not Done Reply Inline Actions OK. tangyouling: OK.
		endif()
elseif(__MIPS64) # must be checked before __MIPS		elseif(__MIPS64) # must be checked before __MIPS
add_default_target_arch(mips64)		add_default_target_arch(mips64)
elseif(__MIPS)		elseif(__MIPS)
add_default_target_arch(mips)		add_default_target_arch(mips)
elseif(__PPC64) # must be checked before __PPC		elseif(__PPC64) # must be checked before __PPC
add_default_target_arch(powerpc64)		add_default_target_arch(powerpc64)
elseif(__PPC64LE)		elseif(__PPC64LE)
add_default_target_arch(powerpc64le)		add_default_target_arch(powerpc64le)
▲ Show 20 Lines • Show All 453 Lines • Show Last 20 Lines

compiler-rt/cmake/base-config-ix.cmake

Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "i[2-6]86\|x86\|amd64")
test_target_arch(i386 __i386__ "-m32")		test_target_arch(i386 __i386__ "-m32")
else()		else()
if (CMAKE_SIZEOF_VOID_P EQUAL 4)		if (CMAKE_SIZEOF_VOID_P EQUAL 4)
test_target_arch(i386 "" "")		test_target_arch(i386 "" "")
else()		else()
test_target_arch(x86_64 "" "")		test_target_arch(x86_64 "" "")
endif()		endif()
endif()		endif()
		elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "loongarch64")
		test_target_arch(loongarch64 "" "")
elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "powerpc64le\|ppc64le")		elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "powerpc64le\|ppc64le")
test_target_arch(powerpc64le "" "-m64")		test_target_arch(powerpc64le "" "-m64")
elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "powerpc")		elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "powerpc")
test_target_arch(powerpc "" "-m32")		test_target_arch(powerpc "" "-m32")
test_target_arch(powerpc64 "" "-m64")		test_target_arch(powerpc64 "" "-m64")
elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "s390x")		elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "s390x")
test_target_arch(s390x "" "")		test_target_arch(s390x "" "")
elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "sparc")		elseif("${COMPILER_RT_DEFAULT_TARGET_ARCH}" MATCHES "sparc")
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

compiler-rt/cmake/builtin-config-ix.cmake

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	")			")

	set(ARM64 aarch64)			set(ARM64 aarch64)
	set(ARM32 arm armhf armv6m armv7m armv7em armv7 armv7s armv7k armv8m.main armv8.1m.main)			set(ARM32 arm armhf armv6m armv7m armv7em armv7 armv7s armv7k armv8m.main armv8.1m.main)
	set(AVR avr)			set(AVR avr)
	set(HEXAGON hexagon)			set(HEXAGON hexagon)
	set(X86 i386)			set(X86 i386)
	set(X86_64 x86_64)			set(X86_64 x86_64)
				set(LOONGARCH64 loongarch64)
	set(MIPS32 mips mipsel)			set(MIPS32 mips mipsel)
	set(MIPS64 mips64 mips64el)			set(MIPS64 mips64 mips64el)
	set(PPC32 powerpc powerpcspe)			set(PPC32 powerpc powerpcspe)
	set(PPC64 powerpc64 powerpc64le)			set(PPC64 powerpc64 powerpc64le)
	set(RISCV32 riscv32)			set(RISCV32 riscv32)
	set(RISCV64 riscv64)			set(RISCV64 riscv64)
	set(SPARC sparc)			set(SPARC sparc)
	set(SPARCV9 sparcv9)			set(SPARCV9 sparcv9)
	set(WASM32 wasm32)			set(WASM32 wasm32)
	set(WASM64 wasm64)			set(WASM64 wasm64)
	set(VE ve)			set(VE ve)

	if(APPLE)			if(APPLE)
	set(ARM64 arm64 arm64e)			set(ARM64 arm64 arm64e)
	set(ARM32 armv7 armv7k armv7s)			set(ARM32 armv7 armv7k armv7s)
	set(X86_64 x86_64 x86_64h)			set(X86_64 x86_64 x86_64h)
	endif()			endif()

	set(ALL_BUILTIN_SUPPORTED_ARCH			set(ALL_BUILTIN_SUPPORTED_ARCH
	${X86} ${X86_64} ${ARM32} ${ARM64} ${AVR}			${X86} ${X86_64} ${ARM32} ${ARM64} ${AVR}
	${HEXAGON} ${MIPS32} ${MIPS64} ${PPC32} ${PPC64}			${HEXAGON} ${MIPS32} ${MIPS64} ${PPC32} ${PPC64}
	${RISCV32} ${RISCV64} ${SPARC} ${SPARCV9}			${RISCV32} ${RISCV64} ${SPARC} ${SPARCV9}
	${WASM32} ${WASM64} ${VE})			${WASM32} ${WASM64} ${VE} ${LOONGARCH64})

	include(CompilerRTUtils)			include(CompilerRTUtils)
	include(CompilerRTDarwinUtils)			include(CompilerRTDarwinUtils)

	if(APPLE)			if(APPLE)

	find_darwin_sdk_dir(DARWIN_osx_SYSROOT macosx)			find_darwin_sdk_dir(DARWIN_osx_SYSROOT macosx)
	find_darwin_sdk_dir(DARWIN_iossim_SYSROOT iphonesimulator)			find_darwin_sdk_dir(DARWIN_iossim_SYSROOT iphonesimulator)
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

compiler-rt/cmake/crt-config-ix.cmake

Show All 17 Lines	else()
set(OS_NAME "${CMAKE_SYSTEM_NAME}")		set(OS_NAME "${CMAKE_SYSTEM_NAME}")
endif()		endif()

set(ARM64 aarch64)		set(ARM64 aarch64)
set(ARM32 arm armhf)		set(ARM32 arm armhf)
set(HEXAGON hexagon)		set(HEXAGON hexagon)
set(X86 i386)		set(X86 i386)
set(X86_64 x86_64)		set(X86_64 x86_64)
		set(LOONGARCH64 loongarch64)
set(PPC32 powerpc powerpcspe)		set(PPC32 powerpc powerpcspe)
set(PPC64 powerpc64 powerpc64le)		set(PPC64 powerpc64 powerpc64le)
set(RISCV32 riscv32)		set(RISCV32 riscv32)
set(RISCV64 riscv64)		set(RISCV64 riscv64)
set(VE ve)		set(VE ve)

set(ALL_CRT_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64} ${PPC32}		set(ALL_CRT_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64} ${PPC32}
${PPC64} ${RISCV32} ${RISCV64} ${VE} ${HEXAGON})		${PPC64} ${RISCV32} ${RISCV64} ${VE} ${HEXAGON} ${LOONGARCH64})

include(CompilerRTUtils)		include(CompilerRTUtils)

if(NOT APPLE)		if(NOT APPLE)
if(COMPILER_RT_CRT_STANDALONE_BUILD)		if(COMPILER_RT_CRT_STANDALONE_BUILD)
test_targets()		test_targets()
endif()		endif()
# Architectures supported by compiler-rt crt library.		# Architectures supported by compiler-rt crt library.
Show All 9 Lines

compiler-rt/lib/builtins/CMakeLists.txt

Show First 20 Lines • Show All 614 Lines • ▼ Show 20 Lines	set(hexagon_SOURCES
hexagon/udivmodsi4.S		hexagon/udivmodsi4.S
hexagon/udivsi3.S		hexagon/udivsi3.S
hexagon/umoddi3.S		hexagon/umoddi3.S
hexagon/umodsi3.S		hexagon/umodsi3.S
${GENERIC_SOURCES}		${GENERIC_SOURCES}
${GENERIC_TF_SOURCES}		${GENERIC_TF_SOURCES}
)		)

		set(loongarch_SOURCES
		loongarch/fp_mode.c
		${GENERIC_SOURCES}
		${GENERIC_TF_SOURCES}
		)
		set(loongarch64_SOURCES
		${loongarch_SOURCES}
		)

set(mips_SOURCES ${GENERIC_SOURCES})		set(mips_SOURCES ${GENERIC_SOURCES})
set(mipsel_SOURCES ${mips_SOURCES})		set(mipsel_SOURCES ${mips_SOURCES})
set(mips64_SOURCES ${GENERIC_TF_SOURCES}		set(mips64_SOURCES ${GENERIC_TF_SOURCES}
${mips_SOURCES})		${mips_SOURCES})
set(mips64el_SOURCES ${GENERIC_TF_SOURCES}		set(mips64el_SOURCES ${GENERIC_TF_SOURCES}
${mips_SOURCES})		${mips_SOURCES})

▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

compiler-rt/lib/builtins/loongarch/fp_mode.c

This file was added.

//=== lib/builtins/loongarch/fp_mode.c - Floaing-point mode utilities -*- C -*-===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

#include "../fp_mode.h"

#define LOONGARCH_TONEAREST 0x0000

#define LOONGARCH_TOWARDZERO 0x0100

#define LOONGARCH_UPWARD 0x0200

#define LOONGARCH_DOWNWARD 0x0300

#define LOONGARCH_RMODE_MASK (LOONGARCH_TONEAREST | LOONGARCH_TOWARDZERO | \

LOONGARCH_UPWARD | LOONGARCH_DOWNWARD)

xen0nUnsubmitted

Not Done

This is not really meaningful... OR-ing non-bitflags together does NOT typically make sense. I believe the only reason it currently works is because LOONGARCH_DOWNWARD is effectively acting as the mask.

Let's drop this and just use $fcsr3 for the FCSR moves regarding the rounding mode. This way some code could be simplified as well.

xen0n: This is not really meaningful... OR-ing non-bitflags together does NOT typically make sense. I…

#define LOONGARCH_INEXACT 0x10000

CRT_FE_ROUND_MODE __fe_getround(void) {

#if __loongarch_frlen != 0

int fcsr;

__asm__ __volatile__("movfcsr2gr %0, $fcsr0" : "=r" (fcsr));

fcsr &= LOONGARCH_RMODE_MASK;

MaskRayUnsubmitted

Not Done

asm volatile

MaskRay: `asm volatile`

switch (fcsr) {

case LOONGARCH_TOWARDZERO:

MaskRayUnsubmitted

Not Done

fcsr &= LOONGARCH_RMODE_MASK;

- switch (fcsr) {

+ switch (fcsr & LOONGARCH_RMODE_MASK) {

case LOONGARCH_TOWARDZERO:

MaskRay:

return CRT_FE_TOWARDZERO;

case LOONGARCH_DOWNWARD:

return CRT_FE_DOWNWARD;

case LOONGARCH_UPWARD:

return CRT_FE_UPWARD;

case LOONGARCH_TONEAREST:

default:

return CRT_FE_TONEAREST;

xry111Unsubmitted

Not Done

I think we can simplify this by

xry111: I think we can simplify this by

xry111Unsubmitted

Not Done

... removing this default:, and removing #else, and move #endif before return CRT_FE_TONEAREST.

xry111: ... removing this `default:`, and removing `#else`, and move `#endif` before `return…

tangyoulingAuthorUnsubmitted

Done

tangyouling: OK

xry111Unsubmitted

Not Done

Well, I'm not sure if the compiler is clever enough not to emit a warning here. If it believes we need default: please ignore my comment.

xry111: Well, I'm not sure if the compiler is clever enough not to emit a warning here. If it believes…

tangyoulingAuthorUnsubmitted

Done

Compiling with clang (default option) does not show warnings, I wonder if I need to add other options to test?

i386 without default:, arm/aarch64/riscv with default: in fp_mode.c

tangyouling: Compiling with `clang` (default option) does not show warnings, I wonder if I need to add other…

}

#else

return CRT_FE_TONEAREST;

#endif

}

int __fe_raise_inexact(void) {

#if __loongarch_frlen != 0

xry111Unsubmitted

Not Done

I'm not sure about the expected semantics of __fe_raise_inexact. On i386, it's

__asm__ __volatile__ ("fdivs %1" : "+t" (f) : "m" (g));

And I got:

$ cat ex.c
#define _GNU_SOURCE
#include <fenv.h>
int main()
{
	float f = 1.0f, g = 3.0f;
	feenableexcept(FE_INEXACT);
	__asm__ __volatile__ ("fdivs %1" : "+t" (f) : "m" (g));
	return 0;
}
$ cc ex.c -lm
$ ./a.out
floating point exception (core dumped)

On AArch64, it's

uint64_t fpsr;
__asm__ __volatile__("mrs  %0, fpsr" : "=r" (fpsr));
__asm__ __volatile__("msr  fpsr, %0" : : "ri" (fpsr | AARCH64_INEXACT));

But I got the following result on an Apple M1:

$ cat ex.c
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	uint64_t fpsr;
	if (feenableexcept(FE_INEXACT) != 0)
		perror("the CPU does not support trapping FP operation");
	__asm__ __volatile__("mrs %0, fpsr" : "=r" (fpsr));
	__asm__ __volatile__("msr fpsr, %0" : : "r" (fpsr | 0x10));
}
$ cc ex.c -lm
$ ./a.out

Nothing happens.

So which model should we use here?

xry111: I'm not sure about the expected semantics of `__fe_raise_inexact`. On i386, it's ``` __asm__…

tangyoulingAuthorUnsubmitted

Done

On LoongArch:

$ cat ex_gcc.c 
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	int fcsr;

	__asm__ __volatile__("movfcsr2gr %0, $r0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $r0, %0" ::"ri" (fcsr | 0x10000));

	return 0;
}
$ cc ex.c -lm
$ ./a.out

Nothing happens.

$ cat ex_clang.c 
#define _GNU_SOURCE
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
	int fcsr;

	__asm__ __volatile__("movfcsr2gr %0, $fcsr0" : "=r" (fcsr));
	__asm__ __volatile__("movgr2fcsr $fcsr0, %0" ::"ri" (fcsr | 0x10000));

	return 0;
}
$ clang ex.c -lm
$ ./a.out

Nothing happens.
Same phenomenon as aarch64 (guessing riscv works the same way).

fdivs does the actual floating point division, while aarch64/riscv/loongarch only sets the relevant register flags, thus making this difference possible.

tangyouling: On LoongArch: ``` $ cat ex_gcc.c #define _GNU_SOURCE #include <fenv.h> #include <stdint.h>…

xry111Unsubmitted

Not Done

I can understand the logic. But it we just interpret the name __fe_raise_inexact, it should behave like feraiseexcept(FE_INEXACT). On LoongArch:

#define _GNU_SOURCE
#include <fenv.h>

int main()
{
	feenableexcept(FE_INEXACT);
	feraiseexcept(FE_INEXACT);
}

gives a SIGFPE.

Maybe the name is not very precise then... But I can't find any documentation about the expected behavior of __fe_raise_inexact.

xry111: I can understand the logic. But it we just interpret the name `__fe_raise_inexact`, it should…

SixWeiningUnsubmitted

Not Done

gcc:

asm volatile("movfcsr2gr %0, $r0" : "=r" (fcsr));
asm volatile("movgr2fcsr $r0, %0" ::"ri" (fcsr | 0x10000));

clang:

asm volatile("movfcsr2gr %0, $fcsr0" : "=r" (fcsr));
asm volatile("movgr2fcsr $fcsr0, %0" ::"ri" (fcsr | 0x10000));

Is this an issue that user has to write different asm for different compilers?

SixWeining: gcc: > __asm__ __volatile__("movfcsr2gr %0, $r0" : "=r" (fcsr)); > __asm__ __volatile__…

xen0nUnsubmitted

Not Done

Regarding the $fcsrX vs $rX case, obviously it is binutils that needs to be fixed to accept $fcsrX in addition to the status quo. It's blatant violation of the manual syntax they wrote themselves, gravely misleading users. We may accept GPR in place of FCSR for compatibility (like the __loongarch64 case discussed in D136413), but it's definitely not the kind of thing that should get preserved forever.

xen0n: Regarding the `$fcsrX` vs `$rX` case, obviously it is //binutils// that needs to be fixed to…

tangyoulingAuthorUnsubmitted

Done

Regarding the $fcsrX vs $rX case, obviously it is binutils that needs to be fixed to accept $fcsrX in addition to the status quo. It's blatant violation of the manual syntax they wrote themselves, gravely misleading users. We may accept GPR in place of FCSR for compatibility (like the __loongarch64 case discussed in D136413), but it's definitely not the kind of thing that should get preserved forever.

Agree with this point.

tangyouling: > Regarding the `$fcsrX` vs `$rX` case, obviously it is //binutils// that needs to be fixed to…

int fcsr;

__asm__ __volatile__("movfcsr2gr %0, $fcsr0" : "=r" (fcsr));

__asm__ __volatile__(

MaskRayUnsubmitted

Not Done

asm volatile

MaskRay: `asm volatile`

"movgr2fcsr $fcsr0, %0" :: "r" (fcsr | LOONGARCH_INEXACT));

#endif

xry111Unsubmitted

Not Done

"ri" does not make sense here. From GCC documentation about extended inline asm (I guess Clang will behave the same way):

When you list more than one possible location (for example, "=rm"), the compiler chooses the most efficient one based on the current context. If you list as many alternates as the asm statement allows, you permit the optimizers to produce the best possible code.

But movgr2fcsr does not allow an immediate operand, so i should not be allowed.

xry111: `"ri"` does not make sense here. From GCC documentation about extended inline asm (I guess…

tangyoulingAuthorUnsubmitted

Done

Thanks, I will modify it.

tangyouling: Thanks, I will modify it.

SixWeiningUnsubmitted

Not Done

Beside setting the Flags bits in fcsr, do we need to set the Cause bits? That is the bit[24].

SixWeining: Beside setting the `Flags` bits in fcsr, do we need to set the `Cause` bits? That is the `bit…

SixWeiningUnsubmitted

Done

As @xry111 said, i should be removed.

SixWeining: As @xry111 said, `i` should be removed.

return 0;

}

compiler-rt/test/builtins/Unit/addtf3_test.c

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	#if __LDBL_MANT_DIG__ == 113
// any + any		// any + any
if (test__addtf3(0x1.23456734245345543849abcdefp+5L,		if (test__addtf3(0x1.23456734245345543849abcdefp+5L,
0x1.edcba52449872455634654321fp-1L,		0x1.edcba52449872455634654321fp-1L,
UINT64_C(0x40042afc95c8b579),		UINT64_C(0x40042afc95c8b579),
UINT64_C(0x61e58dd6c51eb77c)))		UINT64_C(0x61e58dd6c51eb77c)))
return 1;		return 1;

#if (defined(__arm__) \|\| defined(__aarch64__)) && defined(__ARM_FP) \|\| \		#if (defined(__arm__) \|\| defined(__aarch64__)) && defined(__ARM_FP) \|\| \
defined(i386) \|\| defined(__x86_64__)		defined(i386) \|\| defined(__x86_64__) \|\| (defined(__loongarch__) && \
		__loongarch_frlen != 0)
// Rounding mode tests on supported architectures		// Rounding mode tests on supported architectures
		xen0nUnsubmitted Not Done Reply Inline Actions Could be `defined(__loongarch_hard_float)` instead? (On an unrelated note, I'm wondering why riscv isn't here for several seconds. I haven't dug deeper though.) xen0n: Could be `defined(__loongarch_hard_float)` instead? (On an unrelated note, I'm wondering why…
		tangyoulingAuthorUnsubmitted Done Reply Inline Actions Could be `defined(__loongarch_hard_float)` instead? In the case of `__loongarch_soft_float`, `__loongarch_frlen` may not be 0, but it can still read the hardware floating point registers, just don't pass the call parameters through the floating point registers. (On an unrelated note, I'm wondering why riscv isn't here for several seconds. I haven't dug deeper though.) riscv should also add it, but it was probably missed at the time. tangyouling: > Could be `defined(__loongarch_hard_float)` instead? In the case of `__loongarch_soft_float`…
const long double m = 1234.0L, n = 0.01L;		const long double m = 1234.0L, n = 0.01L;

fesetround(FE_UPWARD);		fesetround(FE_UPWARD);
if (test__addtf3(m, n,		if (test__addtf3(m, n,
UINT64_C(0x40093480a3d70a3d),		UINT64_C(0x40093480a3d70a3d),
UINT64_C(0x70a3d70a3d70a3d8)))		UINT64_C(0x70a3d70a3d70a3d8)))
return 1;		return 1;

Show All 26 Lines

compiler-rt/test/builtins/Unit/subtf3_test.c

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	#if __LDBL_MANT_DIG__ == 113
// any - any		// any - any
if (test__subtf3(0x1.234567829a3bcdef5678ade36734p+5L,		if (test__subtf3(0x1.234567829a3bcdef5678ade36734p+5L,
0x1.ee9d7c52354a6936ab8d7654321fp-1L,		0x1.ee9d7c52354a6936ab8d7654321fp-1L,
UINT64_C(0x40041b8af1915166),		UINT64_C(0x40041b8af1915166),
UINT64_C(0xa44a7bca780a166c)))		UINT64_C(0xa44a7bca780a166c)))
return 1;		return 1;

#if (defined(__arm__) \|\| defined(__aarch64__)) && defined(__ARM_FP) \|\| \		#if (defined(__arm__) \|\| defined(__aarch64__)) && defined(__ARM_FP) \|\| \
defined(i386) \|\| defined(__x86_64__)		defined(i386) \|\| defined(__x86_64__) \|\| (defined(__loongarch__) && \
		__loongarch_frlen != 0)
// Rounding mode tests on supported architectures		// Rounding mode tests on supported architectures
const long double m = 1234.02L, n = 0.01L;		const long double m = 1234.02L, n = 0.01L;

fesetround(FE_UPWARD);		fesetround(FE_UPWARD);
if (test__subtf3(m, n,		if (test__subtf3(m, n,
UINT64_C(0x40093480a3d70a3d),		UINT64_C(0x40093480a3d70a3d),
UINT64_C(0x70a3d70a3d70a3d7)))		UINT64_C(0x70a3d70a3d70a3d7)))
return 1;		return 1;
Show All 26 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt][builtins] Support builtins for LoongArchClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 472260

compiler-rt/cmake/Modules/CompilerRTUtils.cmake

compiler-rt/cmake/base-config-ix.cmake

compiler-rt/cmake/builtin-config-ix.cmake

compiler-rt/cmake/crt-config-ix.cmake

compiler-rt/lib/builtins/CMakeLists.txt

compiler-rt/lib/builtins/loongarch/fp_mode.c

compiler-rt/test/builtins/Unit/addtf3_test.c

compiler-rt/test/builtins/Unit/subtf3_test.c

[compiler-rt][builtins] Support builtins for LoongArch
ClosedPublic