This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libc/src/__support/RPC/
-
src/
-
__support/
-
RPC/
-
CMakeLists.txt
1
rpc.h
1
rpc_util.h

Differential D147238

[libc] Support suspending threads during RPC spin loops
ClosedPublic

Authored by jhuber6 on Mar 30 2023, 7:58 AM.

Download Raw Diff

Details

Reviewers

JonChesterfield
tra
jdoerfert
tianshilei1992
sivachandra

Commits

rGdff3909c3ed9: [libc] Support suspending threads during RPC spin loops

Summary

The RPC interface relies on waiting on atomic signals to coordinate
which side of the protocol is in control of the shared buffer. The GPU client
supports briefly suspending the executing thread group. This is used by the
thread scheduler to identify which thread groups can be switched out so that
others may execute. This allows us to ensure that other threads get a chance
to make forward progress while these threads wait on the atomic signal.

This is currently only relevant on the client-side. We could use an
alternative implementation on the server that uses the standard
nanosleep on supported hosts.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Mar 30 2023, 7:58 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 30 2023, 7:58 AM

Herald added subscribers: libc-commits, ecnelises, tschuett. · View Herald Transcript

jhuber6 requested review of this revision.Mar 30 2023, 7:58 AM

It looks like the two platforms are using different sleep duration? Other than that, LGTM.

This revision is now accepted and ready to land.Mar 30 2023, 8:00 AM

In D147238#4233699, @tianshilei1992 wrote:

It looks like the two platforms are using different sleep duration? Other than that, LGTM.

The documentation is a little different. The AMDGCN implementation states that a value of 2 will sleep between 64 and 128 cycles so assuming a 2 GHz clock it'll sleep ~60 ns in the worst case. These numbers are just guesses, we could probably refine them. I think PCI(e) atomic access is in the order of microseconds.

Harbormaster completed remote builds in B222756: Diff 509683.Mar 30 2023, 8:05 AM

jhuber6 edited the summary of this revision. (Show Details)Mar 30 2023, 8:06 AM

jhuber6 edited the summary of this revision. (Show Details)Mar 30 2023, 8:55 AM

Thanks. Couple of nits above.

libc/src/__support/RPC/rpc.h
105–106	this probably gets rotated, might be better written do_while sleep before or after load? the fence above probably takes time
libc/src/__support/RPC/rpc_util.h
21	probably worth leaving a comment here about how the magic numbers were chosen

Closed by commit rGdff3909c3ed9: [libc] Support suspending threads during RPC spin loops (authored by jhuber6). · Explain WhyMar 30 2023, 9:40 AM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rGdff3909c3ed9: [libc] Support suspending threads during RPC spin loops.

Revision Contents

Path

Size

libc/

src/

__support/

RPC/

CMakeLists.txt

1 line

rpc.h

9 lines

rpc_util.h

32 lines

Diff 509706

libc/src/__support/RPC/CMakeLists.txt

	add_header_library(			add_header_library(
	rpc			rpc
	HDRS			HDRS
	rpc.h			rpc.h
				rpc_util.h
	DEPENDS			DEPENDS
	libc.src.__support.common			libc.src.__support.common
	libc.src.__support.CPP.atomic			libc.src.__support.CPP.atomic
	)			)

	add_object_library(			add_object_library(
	rpc_client			rpc_client
	SRCS			SRCS
	rpc_client.cpp			rpc_client.cpp
	HDRS			HDRS
	rpc_client.h			rpc_client.h
	DEPENDS			DEPENDS
	.rpc			.rpc
	)			)

libc/src/__support/RPC/rpc.h

Show All 12 Lines
// signals to indicate which side, the client or the server is in ownership of		// signals to indicate which side, the client or the server is in ownership of
// the buffer.		// the buffer.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIBC_SRC_SUPPORT_RPC_RPC_H		#ifndef LLVM_LIBC_SRC_SUPPORT_RPC_RPC_H
#define LLVM_LIBC_SRC_SUPPORT_RPC_RPC_H		#define LLVM_LIBC_SRC_SUPPORT_RPC_RPC_H

		#include "rpc_util.h"
#include "src/__support/CPP/atomic.h"		#include "src/__support/CPP/atomic.h"

#include <stdint.h>		#include <stdint.h>

namespace __llvm_libc {		namespace __llvm_libc {
namespace rpc {		namespace rpc {

/// A list of opcodes that we use to invoke certain actions on the server. We		/// A list of opcodes that we use to invoke certain actions on the server. We
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	template <typename F, typename U> LIBC_INLINE void Client::run(F fill, U use) {
if (!in & !out) {		if (!in & !out) {
fill(buffer);		fill(buffer);
atomic_thread_fence(cpp::MemoryOrder::RELEASE);		atomic_thread_fence(cpp::MemoryOrder::RELEASE);
outbox->store(1, cpp::MemoryOrder::RELAXED);		outbox->store(1, cpp::MemoryOrder::RELAXED);
out = 1;		out = 1;
}		}
// Wait for the server to work on the buffer and respond.		// Wait for the server to work on the buffer and respond.
if (!in & out) {		if (!in & out) {
while (!in)		while (!in) {
		sleep_briefly();
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions this probably gets rotated, might be better written do_while sleep before or after load? the fence above probably takes time JonChesterfield: this probably gets rotated, might be better written do_while sleep before or after load? the…
in = inbox->load(cpp::MemoryOrder::RELAXED);		in = inbox->load(cpp::MemoryOrder::RELAXED);
		}
atomic_thread_fence(cpp::MemoryOrder::ACQUIRE);		atomic_thread_fence(cpp::MemoryOrder::ACQUIRE);
}		}
// Apply \p use to the buffer and signal the server.		// Apply \p use to the buffer and signal the server.
if (in & out) {		if (in & out) {
use(buffer);		use(buffer);
atomic_thread_fence(cpp::MemoryOrder::RELEASE);		atomic_thread_fence(cpp::MemoryOrder::RELEASE);
outbox->store(0, cpp::MemoryOrder::RELAXED);		outbox->store(0, cpp::MemoryOrder::RELAXED);
out = 0;		out = 0;
}		}
// Wait for the server to signal the end of the protocol.		// Wait for the server to signal the end of the protocol.
if (in & !out) {		if (in & !out) {
while (in)		while (in) {
		sleep_briefly();
in = inbox->load(cpp::MemoryOrder::RELAXED);		in = inbox->load(cpp::MemoryOrder::RELAXED);
		}
atomic_thread_fence(cpp::MemoryOrder::ACQUIRE);		atomic_thread_fence(cpp::MemoryOrder::ACQUIRE);
}		}
}		}

/// Run the RPC server protocol to communicate with the client. This is		/// Run the RPC server protocol to communicate with the client. This is
/// non-blocking and only checks the server a single time. We perform the		/// non-blocking and only checks the server a single time. We perform the
/// following high level actions to complete a communication:		/// following high level actions to complete a communication:
/// - Query if the inbox is 1 and exit if there is no work to do.		/// - Query if the inbox is 1 and exit if there is no work to do.
Show All 39 Lines

libc/src/__support/RPC/rpc_util.h

This file was added.

				//===-- Shared memory RPC client / server utilities -------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIBC_SRC_SUPPORT_RPC_RPC_UTILS_H
				#define LLVM_LIBC_SRC_SUPPORT_RPC_RPC_UTILS_H

				#include "src/__support/macros/attributes.h"
				#include "src/__support/macros/properties/architectures.h"

				namespace __llvm_libc {
				namespace rpc {

				/// Suspend the thread briefly to assist the thread scheduler during busy loops.
				LIBC_INLINE void sleep_briefly() {
				#if defined(LIBC_TARGET_ARCH_IS_NVPTX) && __CUDA_ARCH__ >= 700
				asm("nanosleep.u32 64;" ::: "memory");
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions probably worth leaving a comment here about how the magic numbers were chosen JonChesterfield: probably worth leaving a comment here about how the magic numbers were chosen
				#elif defined(LIBC_TARGET_ARCH_IS_AMDGPU)
				__builtin_amdgcn_s_sleep(2);
				#else
				// Simply do nothing if sleeping isn't supported on this platform.
				#endif
				}

				} // namespace rpc
				} // namespace __llvm_libc

				#endif