This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/
-
cmake/
-
Modules/
-
CompilerRTUtils.cmake
-
config-ix.cmake
-
lib/builtins/
-
builtins/
-
CMakeLists.txt
-
arm/
-
sync-ops.h

Differential D116088

[compiler-rt] Implement ARM atomic operations for architectures without SMP support
Needs ReviewPublic

Authored by kpdev42 on Dec 20 2021, 10:41 PM.

Download Raw Diff

Details

Reviewers

mstorsjo
samsonov
dsanders
pcc
phosek
efriedma

Commits

rG910a642c0a5b: [compiler-rt] Implement ARM atomic operations for architectures without SMP…

Summary

ARMv5 and older architectures don’t support SMP and do not have atomic instructions. Still they’re in use in IoT world, where one has to stick to libgcc.

~~~

OS Laboratory. Huawei RRI. Saint-Petersburg

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kpdev42 created this revision.Dec 20 2021, 10:41 PM

Herald added subscribers: kristof.beyls, mgorny, dberris. · View Herald TranscriptDec 20 2021, 10:41 PM

kpdev42 requested review of this revision.Dec 20 2021, 10:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2021, 10:41 PM

Herald added a subscriber: Restricted Project. · View Herald Transcript

Harbormaster completed remote builds in B140198: Diff 395597.Dec 20 2021, 11:01 PM

kpdev42 added a subscriber: cfe-commits.Dec 29 2021, 1:35 AM

Ping

I think this looks reasonable. I haven't tried it, but I guess we should proceed with it instead of holding it back, as there's nobody else reviewing it.

This revision is now accepted and ready to land.Jan 19 2022, 3:25 AM

Correct me if I'm wrong, but I don't think this stubs are async signal safe nor will they work for preemptive multitasking systems?

In D116088#3254400, @joerg wrote:

Correct me if I'm wrong, but I don't think this stubs are async signal safe nor will they work for preemptive multitasking systems?

Those stubs are basically cas loops (https://en.wikipedia.org/wiki/Compare-and-swap) which are not much different from their SMP counterparts, except memory sync ops are not used. This should work normally in case of preemption (preempting thread will spend its quota in busy-wait). Signal can be a problem if it preempt a thread executing atomic op, but I wonder if SMP code handles this

This revision was landed with ongoing or failed builds.Feb 16 2022, 11:11 PM

Closed by commit rG910a642c0a5b: [compiler-rt] Implement ARM atomic operations for architectures without SMP… (authored by kpdev42). · Explain Why

This revision was automatically updated to reflect the committed changes.

kpdev42 added a commit: rG910a642c0a5b: [compiler-rt] Implement ARM atomic operations for architectures without SMP….

I'm concerned providing these is going to cause issues. The provided implementation are not atomic. Blindly assuming that the user is compiling for a target that doesn't have pre-emptible threads seems like a bad idea.

How do you expect users to use these methods, anyway? clang shouldn't be generating calls to these methods. (It looks like it actually does in some cases on ARM, but that's not intended behavior.)

For users who want to pretend threads don't exist, we should provide a compiler flag. -mthread-model doesn't quite work right for this at the moment, but it would make sense to fix it, I think.

efriedma mentioned this in D120026: [ARM] Fix ARM backend to correctly use atomic expansion routines..Feb 17 2022, 2:04 AM

Posted D120026 .

efriedma added a reverting change: rG0389f2edf7c2: Revert "[compiler-rt] Implement ARM atomic operations for architectures….Feb 17 2022, 2:20 AM

Reverted in 0389f2edf7

This revision is now accepted and ready to land.Feb 17 2022, 2:21 AM

efriedma requested changes to this revision.Feb 17 2022, 2:22 AM

This revision now requires changes to proceed.Feb 17 2022, 2:22 AM

lkail added a subscriber: lkail.Feb 17 2022, 2:24 AM

efriedma mentioned this in rG2f497ec3a005: [ARM] Fix ARM backend to correctly use atomic expansion routines..Mar 18 2022, 12:44 PM

D120026 is merged now, which addresses the issue of the compiler generating __sync calls where it isn't supposed to.

Does anyone want to continue discussing what changes to compiler-rt would be appropriate? I didn't mean to completely shut down discussion with my comment.

Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2022, 2:00 PM

In D116088#3393350, @efriedma wrote:

D120026 is merged now, which addresses the issue of the compiler generating __sync calls where it isn't supposed to.

Does anyone want to continue discussing what changes to compiler-rt would be appropriate? I didn't mean to completely shut down discussion with my comment.

@efriedma

Imagine we have the following piece of code in the program:

volatile int G;
int foo() { return __sync_add_and_fetch(&G, 1); }

Now we want having this built and running on armv5 platform. At the moment the only option we have is to use libgcc. Unfortunately this have one big disadvantage: we're only limited to Linux, because call to __sync_add_and_fetch boils down to Linux kernel user helper. We want this to work on other platforms also, and here is what compiler-rt good for.

However sync ops operations in compiler-rt use memory barriers, which doesn't work on armv5: any attempt to use memory barrier on the latter will result in SIGILL. As armv5 doesn't support SMP (but still supports preemptive multitasking) it's possible in out opinion to implement sync ops as a compare and swap loop without memory barriers. What's your opinion on this?

On a target that doesn't support SMP, you don't need memory barriers, sure. (I think we'd want a CMake flag to explicitly assert that you're only going to run the code on chips without SMP.)

That doesn't really solve your issue, though. To implement atomic compare-and-swap or rmw operations, you need to ensure your code doesn't get interrupted in the middle. If your system supports multithreading, and a thread can be preempted at any time, you need one of the following:

A natively atomic operation, like strex.
A way to temporarily turn off interrupts, so the thread can't be preempted for a short time.
Kernel-assisted operations, like the Linux kernel provides.
A separate lock.

None of these options work on armv5 without some sort of kernel assistance. (Well, technically, you can implement a spinlock with swp, but that's very inefficient.)

At the moment, in case of compiler-rt, __sync_add_and_fetch boils down to
__sync_add_and_fetch_N, where N is the size of data being fetched (4 for int).
The implementation of __sync_fetch_and_add_N does approximately the following:

Sets memory barrier
Calls atomic load from memory location
Modifies data
Calls atomic store to memory location
Checks that operation is consistent, if not goes to step 2.

IMO, performance-wise there is not much difference (if any) between this and
modifying data with acquiring spinlock.

No code in compiler-rt disables interrupts, so it can and will be interrupted in the middle,
by a different thread however I don't see any problem in this.

Now if we are on a platform which doesn't support SMP we can use ordinary memory operations instead
of atomic ones, can't we?

In D116088#3534100, @kpdev42 wrote:

Sets memory barrier

Calls atomic load from memory location

Modifies data

Calls atomic store to memory location

Checks that operation is consistent, if not goes to step 2.

Now if we are on a platform which doesn't support SMP we can use ordinary memory operations instead
of atomic ones, can't we?

Even on a non-SMP processor, there's a bit of magic in ldrex/strex: ldrex sets a hidden "lock" bit, and strex checks it. If there's a context switch between the load and the store, the switch will clear that bit. So when the code continues to execute, the store fails.

Well, after some investigation it turned out that:

ARMv5 has DMB instruction in the form of mcr p15, #0, <Rd>, c7, c10, #5
There is SWP instruction (deprecated on ARMv6), which does atomic exchange of 32-bit values

I've reimplemented sync ops using these primitves, PTAL
Theoretically this should work on ARMv6 and higher, though I didn't check this

Harbormaster completed remote builds in B166603: Diff 432478.May 27 2022, 12:58 AM

ldr+op+swp still isn't atomic. For each point in the code, please try the exercise of "what if my code is interrupted here"?

The only way to use swp to implement general atomic operations is to use a separate spinlock.

Revision Contents

Path

Size

compiler-rt/

cmake/

Modules/

CompilerRTUtils.cmake

10 lines

config-ix.cmake

5 lines

lib/

builtins/

CMakeLists.txt

1 line

arm/

sync-ops.h

40 lines

Diff 432478

compiler-rt/cmake/Modules/CompilerRTUtils.cmake

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	if("${def}" STREQUAL "")
return()		return()
endif()		endif()
cmake_push_check_state()		cmake_push_check_state()
set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} ${argstring}")		set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} ${argstring}")
check_symbol_exists(${def} "" ${out_var})		check_symbol_exists(${def} "" ${out_var})
cmake_pop_check_state()		cmake_pop_check_state()
endfunction()		endfunction()

		macro(test_arm_smp_support arch cflags_var)
		if (${arch} STREQUAL "arm")
		try_compile(HAS_${arch}_SMP ${CMAKE_BINARY_DIR}
		${ARM_SMP_CHECK_SRC} COMPILE_DEFINITIONS "${CMAKE_C_FLAGS} ${_TARGET_${arch}_CFLAGS}")
		if (HAS_${arch}_SMP)
		list(APPEND ${cflags_var} -DCOMPILER_RT_HAS_SMP_SUPPORT)
		endif()
		endif()
		endmacro()

# test_target_arch(<arch> <def> <target flags...>)		# test_target_arch(<arch> <def> <target flags...>)
# Checks if architecture is supported: runs host compiler with provided		# Checks if architecture is supported: runs host compiler with provided
# flags to verify that:		# flags to verify that:
# 1) <def> is defined (if non-empty)		# 1) <def> is defined (if non-empty)
# 2) simple file can be successfully built.		# 2) simple file can be successfully built.
# If successful, saves target flags for this architecture.		# If successful, saves target flags for this architecture.
macro(test_target_arch arch def)		macro(test_target_arch arch def)
set(TARGET_${arch}_CFLAGS ${ARGN})		set(TARGET_${arch}_CFLAGS ${ARGN})
▲ Show 20 Lines • Show All 481 Lines • Show Last 20 Lines

compiler-rt/cmake/config-ix.cmake

	Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines

	# Try to compile a very simple source file to ensure we can target the given			# Try to compile a very simple source file to ensure we can target the given
	# platform. We use the results of these tests to build only the various target			# platform. We use the results of these tests to build only the various target
	# runtime libraries supported by our current compilers cross-compiling			# runtime libraries supported by our current compilers cross-compiling
	# abilities.			# abilities.
	set(SIMPLE_SOURCE ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/simple.cc)			set(SIMPLE_SOURCE ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/simple.cc)
	file(WRITE ${SIMPLE_SOURCE} "#include <stdlib.h>\n#include <stdio.h>\nint main() { printf(\"hello, world\"); }\n")			file(WRITE ${SIMPLE_SOURCE} "#include <stdlib.h>\n#include <stdio.h>\nint main() { printf(\"hello, world\"); }\n")

				# Check if we have SMP support for particular ARM architecture
				# If not use stubs instead of real atomic operations - see sync-ops.h
				set(ARM_SMP_CHECK_SRC ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/arm-barrier.cc)
				file(WRITE ${ARM_SMP_CHECK_SRC} "int main() { asm(\"dmb\"); return 0; }")

	# Detect whether the current target platform is 32-bit or 64-bit, and setup			# Detect whether the current target platform is 32-bit or 64-bit, and setup
	# the correct commandline flags needed to attempt to target 32-bit and 64-bit.			# the correct commandline flags needed to attempt to target 32-bit and 64-bit.
	# AVR and MSP430 are omitted since they have 16-bit pointers.			# AVR and MSP430 are omitted since they have 16-bit pointers.
	if (NOT CMAKE_SIZEOF_VOID_P EQUAL 4 AND			if (NOT CMAKE_SIZEOF_VOID_P EQUAL 4 AND
	NOT CMAKE_SIZEOF_VOID_P EQUAL 8 AND			NOT CMAKE_SIZEOF_VOID_P EQUAL 8 AND
	NOT ${arch} MATCHES "avr\|msp430")			NOT ${arch} MATCHES "avr\|msp430")
	message(FATAL_ERROR "Please use architecture with 4 or 8 byte pointers.")			message(FATAL_ERROR "Please use architecture with 4 or 8 byte pointers.")
	endif()			endif()
	▲ Show 20 Lines • Show All 642 Lines • Show Last 20 Lines

compiler-rt/lib/builtins/CMakeLists.txt

Show First 20 Lines • Show All 746 Lines • ▼ Show 20 Lines	if (CAN_TARGET_${arch})
filter_builtin_sources(${arch}_SOURCES ${arch})		filter_builtin_sources(${arch}_SOURCES ${arch})

# Needed for clear_cache on debug mode, due to r7's usage in inline asm.		# Needed for clear_cache on debug mode, due to r7's usage in inline asm.
# Release mode already sets it via -O2/3, Debug mode doesn't.		# Release mode already sets it via -O2/3, Debug mode doesn't.
if (${arch} STREQUAL "armhf")		if (${arch} STREQUAL "armhf")
list(APPEND BUILTIN_CFLAGS_${arch} -fomit-frame-pointer -DCOMPILER_RT_ARMHF_TARGET)		list(APPEND BUILTIN_CFLAGS_${arch} -fomit-frame-pointer -DCOMPILER_RT_ARMHF_TARGET)
endif()		endif()

		test_arm_smp_support(${arch} BUILTIN_CFLAGS_${arch})
# For RISCV32, we must force enable int128 for compiling long		# For RISCV32, we must force enable int128 for compiling long
# double routines.		# double routines.
if("${arch}" STREQUAL "riscv32")		if("${arch}" STREQUAL "riscv32")
list(APPEND BUILTIN_CFLAGS_${arch} -fforce-enable-int128)		list(APPEND BUILTIN_CFLAGS_${arch} -fforce-enable-int128)
endif()		endif()

if(arch STREQUAL "aarch64")		if(arch STREQUAL "aarch64")
add_custom_target(		add_custom_target(
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

compiler-rt/lib/builtins/arm/sync-ops.h

Show All 10 Lines
// ARM and Thumb-2 versions of the functions.		// ARM and Thumb-2 versions of the functions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "../assembly.h"		#include "../assembly.h"

#if __ARM_ARCH >= 7		#if __ARM_ARCH >= 7
#define DMB dmb		#define DMB dmb
#elif __ARM_ARCH >= 6		#elif __ARM_ARCH >= 5
#define DMB mcr p15, #0, r0, c7, c10, #5		#define DMB mcr p15, #0, r0, c7, c10, #5
#else		#else
#error DMB is only supported on ARMv6+		#error DMB is only supported on ARMv6+
#endif		#endif

		#ifdef COMPILER_RT_HAS_SMP_SUPPORT

#define SYNC_OP_4(op) \		#define SYNC_OP_4(op) \
.p2align 2; \		.p2align 2; \
.syntax unified; \		.syntax unified; \
DEFINE_COMPILERRT_FUNCTION(__sync_fetch_and_##op) \		DEFINE_COMPILERRT_FUNCTION(__sync_fetch_and_##op) \
DMB; \		DMB; \
mov r12, r0; \		mov r12, r0; \
LOCAL_LABEL(tryatomic_##op) : ldrex r0, [r12]; \		LOCAL_LABEL(tryatomic_##op) : ldrex r0, [r12]; \
op(r2, r0, r1); \		op(r2, r0, r1); \
Show All 13 Lines	#define SYNC_OP_8(op) \
LOCAL_LABEL(tryatomic_##op) : ldrexd r0, r1, [r12]; \		LOCAL_LABEL(tryatomic_##op) : ldrexd r0, r1, [r12]; \
op(r4, r5, r0, r1, r2, r3); \		op(r4, r5, r0, r1, r2, r3); \
strexd r6, r4, r5, [r12]; \		strexd r6, r4, r5, [r12]; \
cmp r6, #0; \		cmp r6, #0; \
bne LOCAL_LABEL(tryatomic_##op); \		bne LOCAL_LABEL(tryatomic_##op); \
DMB; \		DMB; \
pop { r4, r5, r6, pc }		pop { r4, r5, r6, pc }

		#else

		#define SYNC_OP_4(op) \
		.p2align 2; \
		DEFINE_COMPILERRT_FUNCTION(__sync_fetch_and_##op) \
		DMB; \
		mov r12, r0; \
		LOCAL_LABEL(tryatomic_##op) : ldr r0, [r12]; \
		op(r2, r0, r1); \
		swp r3, r2, [r12]; \
		cmp r3, r0; \
		bne LOCAL_LABEL(tryatomic_##op); \
		DMB; \
		bx lr

		#define SYNC_OP_8(op) \
		.p2align 2; \
		DEFINE_COMPILERRT_FUNCTION(__sync_fetch_and_##op) \
		push{r4, r5, r6, lr}; \
		DMB; \
		mov r12, r0; \
		LOCAL_LABEL(tryatomic_##op) : ldm r12, {r0, r1}; \
		op(r4, r5, r0, r1, r2, r3); \
		swp r6, r4, [r12]; \
		cmp r6, r0; \
		bne LOCAL_LABEL(tryatomic_##op); \
		add r12, r12, #4; \
		swp r6, r5, [r12]; \
		sub r12, r12, #4; \
		cmp r6, r1; \
		bne LOCAL_LABEL(tryatomic_##op); \
		DMB; \
		pop { r4, r5, r6, pc }

		#endif

#define MINMAX_4(rD, rN, rM, cmp_kind) \		#define MINMAX_4(rD, rN, rM, cmp_kind) \
cmp rN, rM; \		cmp rN, rM; \
mov rD, rM; \		mov rD, rM; \
it cmp_kind; \		it cmp_kind; \
mov##cmp_kind rD, rN		mov##cmp_kind rD, rN

#define MINMAX_8(rD_LO, rD_HI, rN_LO, rN_HI, rM_LO, rM_HI, cmp_kind) \		#define MINMAX_8(rD_LO, rD_HI, rN_LO, rN_HI, rM_LO, rM_HI, cmp_kind) \
cmp rN_LO, rM_LO; \		cmp rN_LO, rM_LO; \
sbcs rN_HI, rM_HI; \		sbcs rN_HI, rM_HI; \
mov rD_LO, rM_LO; \		mov rD_LO, rM_LO; \
mov rD_HI, rM_HI; \		mov rD_HI, rM_HI; \
itt cmp_kind; \		itt cmp_kind; \
mov##cmp_kind rD_LO, rN_LO; \		mov##cmp_kind rD_LO, rN_LO; \
mov##cmp_kind rD_HI, rN_HI		mov##cmp_kind rD_HI, rN_HI