This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Fix initializer not working on AMDGPU
ClosedPublic

Authored by jhuber6 on Nov 15 2021, 8:06 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
JonChesterfield

Commits

rG374cd0fb6102: [OpenMP] Fix initializer not working on AMDGPU

Summary

The RAII class used for debugging RTL entry used a shared variable to
keep track of the current depth. This used a global initializer, which
isn't supported on AMDGPU. This patch removes the initializer and
instead sets it to zero when the state is initialized in the runtime.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Nov 15 2021, 8:06 PM

Herald added subscribers: guansong, t-tye, tpr and 3 others. · View Herald TranscriptNov 15 2021, 8:06 PM

jhuber6 requested review of this revision.Nov 15 2021, 8:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2021, 8:06 PM

Herald added subscribers: openmp-commits, sstefan1, wdng. · View Herald Transcript

Harbormaster completed remote builds in B134420: Diff 387483.Nov 15 2021, 8:12 PM

Put the debug init call in the wrong init function.

Harbormaster completed remote builds in B134421: Diff 387484.Nov 15 2021, 8:19 PM

LG, though racy, see comment.

openmp/libomptarget/DeviceRTL/src/State.cpp
370–372	Put it under this guard.

jdoerfert accepted this revision.Nov 15 2021, 8:22 PM

This revision is now accepted and ready to land.Nov 15 2021, 8:22 PM

There's a recent (amdgpu specific I think) llvm pass that gathers ctor/dtor into one or more kernels that expect to be executed with one wavefront, possibly one thread active at the entry point. I'm aware of two bugs in it, one fixed and one with a patch outstanding. These aren't actually run by the amdgpu plugin yet but probably will be shortly. At least the constructors, I'm not clear how we would know when to run the destructors. At that point we could optionally revert this, looks good for now.

Moving one-off initialisation to a global ctor, once said ctor is actually executed by the plugin, is probably a minor performance win. Do some work once instead of once per kernel.

Closed by commit rG374cd0fb6102: [OpenMP] Fix initializer not working on AMDGPU (authored by jhuber6). · Explain WhyNov 16 2021, 5:17 AM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG374cd0fb6102: [OpenMP] Fix initializer not working on AMDGPU.

Revision Contents

Path

Size

openmp/

libomptarget/

DeviceRTL/

include/

Debug.h

2 lines

src/

Debug.cpp

4 lines

State.cpp

4 lines

Diff 387582

openmp/libomptarget/DeviceRTL/include/Debug.h

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	#define FunctionTracingRAII() \
DebugEntryRAII Entry(__FILE__, __LINE__, __PRETTY_FUNCTION__);		DebugEntryRAII Entry(__FILE__, __LINE__, __PRETTY_FUNCTION__);

/// An RAII class for handling entries to debug locations. The current location		/// An RAII class for handling entries to debug locations. The current location
/// and function will be printed on entry. Nested levels increase the		/// and function will be printed on entry. Nested levels increase the
/// indentation shown in the debugging output.		/// indentation shown in the debugging output.
struct DebugEntryRAII {		struct DebugEntryRAII {
DebugEntryRAII(const char File, const unsigned Line, const char Function);		DebugEntryRAII(const char File, const unsigned Line, const char Function);
~DebugEntryRAII();		~DebugEntryRAII();

		static void init();
};		};

#endif		#endif

openmp/libomptarget/DeviceRTL/src/Debug.cpp

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	#pragma omp end declare variant			#pragma omp end declare variant

	int32_t __llvm_omp_vprintf(const char Format, void Arguments, uint32_t Size) {			int32_t __llvm_omp_vprintf(const char Format, void Arguments, uint32_t Size) {
	return impl::omp_vprintf(Format, Arguments, Size);			return impl::omp_vprintf(Format, Arguments, Size);
	}			}
	}			}

	/// Current indentation level for the function trace. Only accessed by thread 0.			/// Current indentation level for the function trace. Only accessed by thread 0.
	static uint32_t Level = 0;			static uint32_t Level;
	#pragma omp allocate(Level) allocator(omp_pteam_mem_alloc)			#pragma omp allocate(Level) allocator(omp_pteam_mem_alloc)

	DebugEntryRAII::DebugEntryRAII(const char *File, const unsigned Line,			DebugEntryRAII::DebugEntryRAII(const char *File, const unsigned Line,
	const char *Function) {			const char *Function) {
	if (config::isDebugMode(config::DebugKind::FunctionTracing) &&			if (config::isDebugMode(config::DebugKind::FunctionTracing) &&
	mapping::getThreadIdInBlock() == 0 && mapping::getBlockId() == 0) {			mapping::getThreadIdInBlock() == 0 && mapping::getBlockId() == 0) {

	for (int I = 0; I < Level; ++I)			for (int I = 0; I < Level; ++I)
	PRINTF("%s", " ");			PRINTF("%s", " ");

	PRINTF("%s:%u: Thread %u Entering %s\n", File, Line,			PRINTF("%s:%u: Thread %u Entering %s\n", File, Line,
	mapping::getThreadIdInBlock(), Function);			mapping::getThreadIdInBlock(), Function);
	Level++;			Level++;
	}			}
	}			}

	DebugEntryRAII::~DebugEntryRAII() {			DebugEntryRAII::~DebugEntryRAII() {
	if (config::isDebugMode(config::DebugKind::FunctionTracing) &&			if (config::isDebugMode(config::DebugKind::FunctionTracing) &&
	mapping::getThreadIdInBlock() == 0 && mapping::getBlockId() == 0)			mapping::getThreadIdInBlock() == 0 && mapping::getBlockId() == 0)
	Level--;			Level--;
	}			}

				void DebugEntryRAII::init() { Level = 0; }

	#pragma omp end declare target			#pragma omp end declare target

openmp/libomptarget/DeviceRTL/src/State.cpp

Show First 20 Lines • Show All 360 Lines • ▼ Show 20 Lines	void *&state::lookupPtr(ValueKind Kind, bool IsReadonly) {
default:		default:
break;		break;
}		}
__builtin_unreachable();		__builtin_unreachable();
}		}

void state::init(bool IsSPMD) {		void state::init(bool IsSPMD) {
SharedMemorySmartStack.init(IsSPMD);		SharedMemorySmartStack.init(IsSPMD);
if (mapping::isInitialThreadInLevel0(IsSPMD))		if (mapping::isInitialThreadInLevel0(IsSPMD)) {
TeamState.init(IsSPMD);		TeamState.init(IsSPMD);
		DebugEntryRAII::init();
		}
		jdoerfertUnsubmitted Not Done Reply Inline Actions Put it under this guard. jdoerfert: Put it under this guard.

ThreadStates[mapping::getThreadIdInBlock()] = nullptr;		ThreadStates[mapping::getThreadIdInBlock()] = nullptr;
}		}

void state::enterDataEnvironment() {		void state::enterDataEnvironment() {
unsigned TId = mapping::getThreadIdInBlock();		unsigned TId = mapping::getThreadIdInBlock();
ThreadStateTy *NewThreadState =		ThreadStateTy *NewThreadState =
static_cast<ThreadStateTy *>(__kmpc_alloc_shared(sizeof(ThreadStateTy)));		static_cast<ThreadStateTy *>(__kmpc_alloc_shared(sizeof(ThreadStateTy)));
▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines