This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/IPO/
-
llvm/
-
Transforms/
-
IPO/
-
OpenMPOpt.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
1/2
OpenMPOpt.cpp
-
test/Transforms/OpenMP/
-
Transforms/
-
OpenMP/
-
gpu_kernel_detection_remarks.ll

Differential D83269

[OpenMP] Identify GPU kernels (aka. OpenMP target regions)
ClosedPublic

Authored by jdoerfert on Jul 6 2020, 6:07 PM.

Download Raw Diff

Details

Reviewers

jhuber6
fghanim
JonChesterfield
grokos
AndreyChurbanov
ye-luo
tianshilei1992
ggeorgakoudis
sstefan1
baziotis

Commits

rGe8039ad4def0: [OpenMP] Identify GPU kernels (aka. OpenMP target regions)

Summary

We now identify GPU kernels, that is entry points into the GPU code.
These kernels (can) correspond to OpenMP target regions. With this patch
we identify and on request print them via remarks.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jdoerfert created this revision.Jul 6 2020, 6:07 PM

Herald added a reviewer: sstefan1. · View Herald TranscriptJul 6 2020, 6:07 PM

Herald added a reviewer: baziotis. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, okura, bbn and 5 others. · View Herald Transcript

Harbormaster failed remote builds in B63107: Diff 275873!Jul 6 2020, 6:12 PM

okura removed a subscriber: okura.Jul 6 2020, 9:47 PM

LGTM but I would like others to take a look.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
1186	This line of change looks not related to this patch.

jdoerfert marked an inline comment as done.Jul 7 2020, 8:46 AM

jdoerfert added inline comments.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
1186	It is. I needed to get rid of the return statements and I wanted to keep the "early exit" out of the if-cascade. Entrance: `else`.

I think there's slightly more code here than is necessary.

Specifically, I think identifyKernels should return SmallPtrSetImpl<Kernel> instead of populating a member variable which can later be accessed. With a rename, proposing:
SmallPtrSetImpl<Kernel> getKernels(Module &M){/*roughly contents of current identifyKernels */}

The cache then stores the set by value instead of by reference. Less state lying around, can't accidentally add multiple copies of the name to a single set. Depending on the control flow we might look up the metadata more than once, but that seems fine given it usually goes in a cache.

Thoughts?

jdoerfert added a child revision: D83271: [OpenMP] Replace function pointer uses in GPU state machine.Jul 10 2020, 8:13 AM

jdoerfert added a parent revision: D83270: [OpenMP] Compute a proper module slice for the CGSCCC pass.

In D83269#2137745, @JonChesterfield wrote:

I think there's slightly more code here than is necessary.

Specifically, I think identifyKernels should return SmallPtrSetImpl<Kernel> instead of populating a member variable which can later be accessed. With a rename, proposing:
SmallPtrSetImpl<Kernel> getKernels(Module &M){/*roughly contents of current identifyKernels */}

The cache then stores the set by value instead of by reference. Less state lying around, can't accidentally add multiple copies of the name to a single set. Depending on the control flow we might look up the metadata more than once, but that seems fine given it usually goes in a cache.

Thoughts?

We will end up looking at it once per SCC in the program, per invocation of the pass. I would prefer to cache module wide information explicitly and this was the "smallest" solution for this for now.
I can do recompute but the nvvm.annotations has ~100 (non-kernel) entries from the device runtime we'll have to go through every time.

Fair enough, stateful it is then.

This revision is now accepted and ready to land.Jul 10 2020, 8:28 AM

Closed by commit rGe8039ad4def0: [OpenMP] Identify GPU kernels (aka. OpenMP target regions) (authored by jdoerfert). · Explain WhyJul 10 2020, 11:50 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

OpenMPOpt.h

12 lines

lib/

Transforms/

IPO/

OpenMPOpt.cpp

127 lines

test/

Transforms/

OpenMP/

gpu_kernel_detection_remarks.ll

27 lines

Diff 277220

llvm/include/llvm/Transforms/IPO/OpenMPOpt.h

	Show All 11 Lines
	#include "llvm/Analysis/CGSCCPassManager.h"			#include "llvm/Analysis/CGSCCPassManager.h"
	#include "llvm/Analysis/LazyCallGraph.h"			#include "llvm/Analysis/LazyCallGraph.h"
	#include "llvm/IR/PassManager.h"			#include "llvm/IR/PassManager.h"

	namespace llvm {			namespace llvm {

	namespace omp {			namespace omp {

				/// Summary of a kernel (=entry point for target offloading).
				using Kernel = Function *;

	/// Helper to remember if the module contains OpenMP (runtime calls), to be used			/// Helper to remember if the module contains OpenMP (runtime calls), to be used
	/// foremost with containsOpenMP.			/// foremost with containsOpenMP.
	struct OpenMPInModule {			struct OpenMPInModule {
	OpenMPInModule &operator=(bool Found) {			OpenMPInModule &operator=(bool Found) {
	if (Found)			if (Found)
	Value = OpenMPInModule::OpenMP::FOUND;			Value = OpenMPInModule::OpenMP::FOUND;
	else			else
	Value = OpenMPInModule::OpenMP::NOT_FOUND;			Value = OpenMPInModule::OpenMP::NOT_FOUND;
	return *this;			return *this;
	}			}
	bool isKnown() { return Value != OpenMP::UNKNOWN; }			bool isKnown() { return Value != OpenMP::UNKNOWN; }
	operator bool() { return Value != OpenMP::NOT_FOUND; }			operator bool() { return Value != OpenMP::NOT_FOUND; }

				/// Return the known kernels (=GPU entry points) in the module.
				SmallPtrSetImpl<Kernel> &getKernels() { return Kernels; }

				/// Identify kernels in the module and populate the Kernels set.
				void identifyKernels(Module &M);

	private:			private:
	enum class OpenMP { FOUND, NOT_FOUND, UNKNOWN } Value = OpenMP::UNKNOWN;			enum class OpenMP { FOUND, NOT_FOUND, UNKNOWN } Value = OpenMP::UNKNOWN;

				/// Collection of known kernels (=GPU entry points) in the module.
				SmallPtrSet<Kernel, 8> Kernels;
	};			};

	/// Helper to determine if \p M contains OpenMP (runtime calls).			/// Helper to determine if \p M contains OpenMP (runtime calls).
	bool containsOpenMP(Module &M, OpenMPInModule &OMPInModule);			bool containsOpenMP(Module &M, OpenMPInModule &OMPInModule);

	} // namespace omp			} // namespace omp

	/// OpenMP optimizations pass.			/// OpenMP optimizations pass.
	Show All 12 Lines

llvm/lib/Transforms/IPO/OpenMPOpt.cpp

Show All 33 Lines

static cl::opt<bool> DisableOpenMPOptimizations(		static cl::opt<bool> DisableOpenMPOptimizations(
"openmp-opt-disable", cl::ZeroOrMore,		"openmp-opt-disable", cl::ZeroOrMore,
cl::desc("Disable OpenMP specific optimizations."), cl::Hidden,		cl::desc("Disable OpenMP specific optimizations."), cl::Hidden,
cl::init(false));		cl::init(false));

static cl::opt<bool> PrintICVValues("openmp-print-icv-values", cl::init(false),		static cl::opt<bool> PrintICVValues("openmp-print-icv-values", cl::init(false),
cl::Hidden);		cl::Hidden);
		static cl::opt<bool> PrintOpenMPKernels("openmp-print-gpu-kernels",
		cl::init(false), cl::Hidden);

STATISTIC(NumOpenMPRuntimeCallsDeduplicated,		STATISTIC(NumOpenMPRuntimeCallsDeduplicated,
"Number of OpenMP runtime calls deduplicated");		"Number of OpenMP runtime calls deduplicated");
STATISTIC(NumOpenMPParallelRegionsDeleted,		STATISTIC(NumOpenMPParallelRegionsDeleted,
"Number of OpenMP parallel regions deleted");		"Number of OpenMP parallel regions deleted");
STATISTIC(NumOpenMPRuntimeFunctionsIdentified,		STATISTIC(NumOpenMPRuntimeFunctionsIdentified,
"Number of OpenMP runtime functions identified");		"Number of OpenMP runtime functions identified");
STATISTIC(NumOpenMPRuntimeFunctionUsesIdentified,		STATISTIC(NumOpenMPRuntimeFunctionUsesIdentified,
"Number of OpenMP runtime function uses identified");		"Number of OpenMP runtime function uses identified");
		STATISTIC(NumOpenMPTargetRegionKernels,
		"Number of OpenMP target region entry points (=kernels) identified");

#if !defined(NDEBUG)		#if !defined(NDEBUG)
static constexpr auto TAG = "[" DEBUG_TYPE "]";		static constexpr auto TAG = "[" DEBUG_TYPE "]";
#endif		#endif

/// Helper struct to store tracked ICV values at specif instructions.		/// Helper struct to store tracked ICV values at specif instructions.
struct ICVValue {		struct ICVValue {
Instruction *Inst;		Instruction *Inst;
Show All 35 Lines

struct AAICVTracker;		struct AAICVTracker;

/// OpenMP specific information. For now, stores RFIs and ICVs also needed for		/// OpenMP specific information. For now, stores RFIs and ICVs also needed for
/// Attributor runs.		/// Attributor runs.
struct OMPInformationCache : public InformationCache {		struct OMPInformationCache : public InformationCache {
OMPInformationCache(Module &M, AnalysisGetter &AG,		OMPInformationCache(Module &M, AnalysisGetter &AG,
BumpPtrAllocator &Allocator, SetVector<Function > CGSCC,		BumpPtrAllocator &Allocator, SetVector<Function > CGSCC,
SmallPtrSetImpl<Function *> &ModuleSlice)		SmallPtrSetImpl<Function *> &ModuleSlice,
		SmallPtrSetImpl<Kernel> &Kernels)
: InformationCache(M, AG, Allocator, CGSCC), ModuleSlice(ModuleSlice),		: InformationCache(M, AG, Allocator, CGSCC), ModuleSlice(ModuleSlice),
OMPBuilder(M) {		OMPBuilder(M), Kernels(Kernels) {
OMPBuilder.initialize();		OMPBuilder.initialize();
initializeRuntimeFunctions();		initializeRuntimeFunctions();
initializeInternalControlVars();		initializeInternalControlVars();
}		}

/// Generic information that describes an internal control variable.		/// Generic information that describes an internal control variable.
struct InternalControlVarInfo {		struct InternalControlVarInfo {
/// The kind, as described by InternalControlVar enum.		/// The kind, as described by InternalControlVar enum.
▲ Show 20 Lines • Show All 281 Lines • ▼ Show 20 Lines	if (declMatchesRTFTypes(F, OMPBuilder._ReturnType, ArgsTypes)) { \
<< " different functions.\n"; \		<< " different functions.\n"; \
}); \		}); \
} \		} \
}		}
#include "llvm/Frontend/OpenMP/OMPKinds.def"		#include "llvm/Frontend/OpenMP/OMPKinds.def"

// TODO: We should attach the attributes defined in OMPKinds.def.		// TODO: We should attach the attributes defined in OMPKinds.def.
}		}

		/// Collection of known kernels (\see Kernel) in the module.
		SmallPtrSetImpl<Kernel> &Kernels;
};		};

struct OpenMPOpt {		struct OpenMPOpt {

using OptimizationRemarkGetter =		using OptimizationRemarkGetter =
function_ref<OptimizationRemarkEmitter &(Function *)>;		function_ref<OptimizationRemarkEmitter &(Function *)>;

OpenMPOpt(SmallVectorImpl<Function *> &SCC, CallGraphUpdater &CGUpdater,		OpenMPOpt(SmallVectorImpl<Function *> &SCC, CallGraphUpdater &CGUpdater,
OptimizationRemarkGetter OREGetter,		OptimizationRemarkGetter OREGetter,
OMPInformationCache &OMPInfoCache, Attributor &A)		OMPInformationCache &OMPInfoCache, Attributor &A)
: M((SCC.begin())->getParent()), SCC(SCC), CGUpdater(CGUpdater),		: M((SCC.begin())->getParent()), SCC(SCC), CGUpdater(CGUpdater),
OREGetter(OREGetter), OMPInfoCache(OMPInfoCache), A(A) {}		OREGetter(OREGetter), OMPInfoCache(OMPInfoCache), A(A) {}

/// Run all OpenMP optimizations on the underlying SCC/ModuleSlice.		/// Run all OpenMP optimizations on the underlying SCC/ModuleSlice.
bool run() {		bool run() {
if (SCC.empty())		if (SCC.empty())
return false;		return false;

bool Changed = false;		bool Changed = false;

LLVM_DEBUG(dbgs() << TAG << "Run on SCC with " << SCC.size()		LLVM_DEBUG(dbgs() << TAG << "Run on SCC with " << SCC.size()
<< " functions in a slice with "		<< " functions in a slice with "
<< OMPInfoCache.ModuleSlice.size() << " functions\n");		<< OMPInfoCache.ModuleSlice.size() << " functions\n");

		if (PrintICVValues)
		printICVs();
		if (PrintOpenMPKernels)
		printKernels();

		Changed \|= runAttributor();

		// Recollect uses, in case Attributor deleted any.
		OMPInfoCache.recollectUses();

		Changed \|= deduplicateRuntimeCalls();
		Changed \|= deleteParallelRegions();

		return Changed;
		}

/// Print initial ICV values for testing.		/// Print initial ICV values for testing.
/// FIXME: This should be done from the Attributor once it is added.		/// FIXME: This should be done from the Attributor once it is added.
if (PrintICVValues) {		void printICVs() const {
InternalControlVar ICVs[] = {ICV_nthreads, ICV_active_levels, ICV_cancel};		InternalControlVar ICVs[] = {ICV_nthreads, ICV_active_levels, ICV_cancel};

for (Function *F : OMPInfoCache.ModuleSlice) {		for (Function *F : OMPInfoCache.ModuleSlice) {
for (auto ICV : ICVs) {		for (auto ICV : ICVs) {
auto ICVInfo = OMPInfoCache.ICVs[ICV];		auto ICVInfo = OMPInfoCache.ICVs[ICV];
auto Remark = [&](OptimizationRemark OR) {		auto Remark = [&](OptimizationRemark OR) {
return OR << "OpenMP ICV " << ore::NV("OpenMPICV", ICVInfo.Name)		return OR << "OpenMP ICV " << ore::NV("OpenMPICV", ICVInfo.Name)
<< " Value: "		<< " Value: "
<< (ICVInfo.InitValue		<< (ICVInfo.InitValue
? ICVInfo.InitValue->getValue().toString(10, true)		? ICVInfo.InitValue->getValue().toString(10, true)
: "IMPLEMENTATION_DEFINED");		: "IMPLEMENTATION_DEFINED");
};		};

emitRemarkOnFunction(F, "OpenMPICVTracker", Remark);		emitRemarkOnFunction(F, "OpenMPICVTracker", Remark);
}		}
}		}
}		}

Changed \|= runAttributor();		/// Print OpenMP GPU kernels for testing.
		void printKernels() const {
// Recollect uses, in case Attributor deleted any.		for (Function *F : SCC) {
OMPInfoCache.recollectUses();		if (!OMPInfoCache.Kernels.count(F))
		continue;

Changed \|= deduplicateRuntimeCalls();		auto Remark = [&](OptimizationRemark OR) {
Changed \|= deleteParallelRegions();		return OR << "OpenMP GPU kernel "
		<< ore::NV("OpenMPGPUKernel", F->getName()) << "\n";
		};

return Changed;		emitRemarkOnFunction(F, "OpenMPGPU", Remark);
		}
}		}

/// Return the call if \p U is a callee use in a regular call. If \p RFI is		/// Return the call if \p U is a callee use in a regular call. If \p RFI is
/// given it has to be the callee or a nullptr is returned.		/// given it has to be the callee or a nullptr is returned.
static CallInst *getCallIfRegularCall(		static CallInst *getCallIfRegularCall(
Use &U, OMPInformationCache::RuntimeFunctionInfo *RFI = nullptr) {		Use &U, OMPInformationCache::RuntimeFunctionInfo *RFI = nullptr) {
CallInst *CI = dyn_cast<CallInst>(U.getUser());		CallInst *CI = dyn_cast<CallInst>(U.getUser());
if (CI && CI->isCallee(&U) && !CI->hasOperandBundles() &&		if (CI && CI->isCallee(&U) && !CI->hasOperandBundles() &&
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	private:
/// - OptimizationRemarkAnalysis to provide additional information about an		/// - OptimizationRemarkAnalysis to provide additional information about an
/// optimization attempt		/// optimization attempt
///		///
/// The remark is built using a callback function provided by the caller that		/// The remark is built using a callback function provided by the caller that
/// takes a RemarkKind as input and returns a RemarkKind.		/// takes a RemarkKind as input and returns a RemarkKind.
template <typename RemarkKind,		template <typename RemarkKind,
typename RemarkCallBack = function_ref<RemarkKind(RemarkKind &&)>>		typename RemarkCallBack = function_ref<RemarkKind(RemarkKind &&)>>
void emitRemark(Instruction *Inst, StringRef RemarkName,		void emitRemark(Instruction *Inst, StringRef RemarkName,
RemarkCallBack &&RemarkCB) {		RemarkCallBack &&RemarkCB) const {
Function *F = Inst->getParent()->getParent();		Function *F = Inst->getParent()->getParent();
auto &ORE = OREGetter(F);		auto &ORE = OREGetter(F);

ORE.emit(		ORE.emit(
[&]() { return RemarkCB(RemarkKind(DEBUG_TYPE, RemarkName, Inst)); });		[&]() { return RemarkCB(RemarkKind(DEBUG_TYPE, RemarkName, Inst)); });
}		}

/// Emit a remark on a function. Since only OptimizationRemark is supporting		/// Emit a remark on a function. Since only OptimizationRemark is supporting
/// this, it can't be made generic.		/// this, it can't be made generic.
void emitRemarkOnFunction(		void
Function *F, StringRef RemarkName,		emitRemarkOnFunction(Function *F, StringRef RemarkName,
function_ref<OptimizationRemark(OptimizationRemark &&)> &&RemarkCB) {		function_ref<OptimizationRemark(OptimizationRemark &&)>
		&&RemarkCB) const {
auto &ORE = OREGetter(F);		auto &ORE = OREGetter(F);

ORE.emit([&]() {		ORE.emit([&]() {
return RemarkCB(OptimizationRemark(DEBUG_TYPE, RemarkName, F));		return RemarkCB(OptimizationRemark(DEBUG_TYPE, RemarkName, F));
});		});
}		}

/// The underlying module.		/// The underlying module.
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	PreservedAnalyses OpenMPOptPass::run(LazyCallGraph::SCC &C,
};		};

CallGraphUpdater CGUpdater;		CallGraphUpdater CGUpdater;
CGUpdater.initialize(CG, C, AM, UR);		CGUpdater.initialize(CG, C, AM, UR);

SetVector<Function *> Functions(SCC.begin(), SCC.end());		SetVector<Function *> Functions(SCC.begin(), SCC.end());
BumpPtrAllocator Allocator;		BumpPtrAllocator Allocator;
OMPInformationCache InfoCache(*(Functions.back()->getParent()), AG, Allocator,		OMPInformationCache InfoCache(*(Functions.back()->getParent()), AG, Allocator,
/CGSCC/ &Functions, ModuleSlice);		/CGSCC/ &Functions, ModuleSlice,
		OMPInModule.getKernels());

Attributor A(Functions, InfoCache, CGUpdater);		Attributor A(Functions, InfoCache, CGUpdater);

// TODO: Compute the module slice we are allowed to look at.		// TODO: Compute the module slice we are allowed to look at.
OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache, A);		OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache, A);
bool Changed = OMPOpt.run();		bool Changed = OMPOpt.run();
(void)Changed;		(void)Changed;
return PreservedAnalyses::all();		return PreservedAnalyses::all();
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	auto OREGetter = [&OREMap](Function *F) -> OptimizationRemarkEmitter & {
if (!ORE)		if (!ORE)
ORE = std::make_unique<OptimizationRemarkEmitter>(F);		ORE = std::make_unique<OptimizationRemarkEmitter>(F);
return *ORE;		return *ORE;
};		};

AnalysisGetter AG;		AnalysisGetter AG;
SetVector<Function *> Functions(SCC.begin(), SCC.end());		SetVector<Function *> Functions(SCC.begin(), SCC.end());
BumpPtrAllocator Allocator;		BumpPtrAllocator Allocator;
OMPInformationCache InfoCache(*(Functions.back()->getParent()), AG,		OMPInformationCache InfoCache(
Allocator,		*(Functions.back()->getParent()), AG, Allocator,
/CGSCC/ &Functions, ModuleSlice);		/CGSCC/ &Functions, ModuleSlice, OMPInModule.getKernels());

Attributor A(Functions, InfoCache, CGUpdater);		Attributor A(Functions, InfoCache, CGUpdater);

// TODO: Compute the module slice we are allowed to look at.		// TODO: Compute the module slice we are allowed to look at.
OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache, A);		OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache, A);
return OMPOpt.run();		return OMPOpt.run();
}		}

bool doFinalization(CallGraph &CG) override { return CGUpdater.finalize(); }		bool doFinalization(CallGraph &CG) override { return CGUpdater.finalize(); }
};		};

} // end anonymous namespace		} // end anonymous namespace

		void OpenMPInModule::identifyKernels(Module &M) {

		NamedMDNode *MD = M.getOrInsertNamedMetadata("nvvm.annotations");
		if (!MD)
		return;

		for (auto *Op : MD->operands()) {
		if (Op->getNumOperands() < 2)
		continue;
		MDString *KindID = dyn_cast<MDString>(Op->getOperand(1));
		if (!KindID \|\| KindID->getString() != "kernel")
		continue;

		Function *KernelFn =
		mdconst::dyn_extract_or_null<Function>(Op->getOperand(0));
		if (!KernelFn)
		continue;

		++NumOpenMPTargetRegionKernels;

		Kernels.insert(KernelFn);
		}
		}

bool llvm::omp::containsOpenMP(Module &M, OpenMPInModule &OMPInModule) {		bool llvm::omp::containsOpenMP(Module &M, OpenMPInModule &OMPInModule) {
if (OMPInModule.isKnown())		if (OMPInModule.isKnown())
return OMPInModule;		return OMPInModule;

#define OMP_RTL(_Enum, _Name, ...) \		#define OMP_RTL(_Enum, _Name, ...) \
if (M.getFunction(_Name)) \		else if (M.getFunction(_Name)) OMPInModule = true;
		tianshilei1992Unsubmitted Not Done Reply Inline Actions This line of change looks not related to this patch. tianshilei1992: This line of change looks not related to this patch.
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions It is. I needed to get rid of the return statements and I wanted to keep the "early exit" out of the if-cascade. Entrance: `else`. jdoerfert: It is. I needed to get rid of the return statements and I wanted to keep the "early exit" out…
return OMPInModule = true;
#include "llvm/Frontend/OpenMP/OMPKinds.def"		#include "llvm/Frontend/OpenMP/OMPKinds.def"

		// Identify kernels once. TODO: We should split the OMPInformationCache into a
		// module and an SCC part. The kernel information, among other things, could
		// go into the module part.
		if (OMPInModule.isKnown() && OMPInModule) {
		OMPInModule.identifyKernels(M);
		return true;
		}

return OMPInModule = false;		return OMPInModule = false;
}		}

char OpenMPOptLegacyPass::ID = 0;		char OpenMPOptLegacyPass::ID = 0;

INITIALIZE_PASS_BEGIN(OpenMPOptLegacyPass, "openmpopt",		INITIALIZE_PASS_BEGIN(OpenMPOptLegacyPass, "openmpopt",
"OpenMP specific optimizations", false, false)		"OpenMP specific optimizations", false, false)
INITIALIZE_PASS_DEPENDENCY(CallGraphWrapperPass)		INITIALIZE_PASS_DEPENDENCY(CallGraphWrapperPass)
INITIALIZE_PASS_END(OpenMPOptLegacyPass, "openmpopt",		INITIALIZE_PASS_END(OpenMPOptLegacyPass, "openmpopt",
"OpenMP specific optimizations", false, false)		"OpenMP specific optimizations", false, false)

Pass *llvm::createOpenMPOptLegacyPass() { return new OpenMPOptLegacyPass(); }		Pass *llvm::createOpenMPOptLegacyPass() { return new OpenMPOptLegacyPass(); }

llvm/test/Transforms/OpenMP/gpu_kernel_detection_remarks.ll

This file was added.

				; RUN: opt -passes=openmpopt -pass-remarks=openmp-opt -openmp-print-gpu-kernels -disable-output < %s 2>&1 \| FileCheck %s --implicit-check-not=non_kernel
				; RUN: opt -openmpopt -pass-remarks=openmp-opt -openmp-print-gpu-kernels -disable-output < %s 2>&1 \| FileCheck %s --implicit-check-not=non_kernel

				; CHECK-DAG: remark: <unknown>:0:0: OpenMP GPU kernel kernel1
				; CHECK-DAG: remark: <unknown>:0:0: OpenMP GPU kernel kernel2

				define void @kernel1() {
				ret void
				}

				define void @kernel2() {
				ret void
				}

				define void @non_kernel() {
				ret void
				}

				; Needed to trigger the openmp-opt pass
				declare dso_local void @__kmpc_kernel_prepare_parallel(i8*)

				!nvvm.annotations = !{!2, !0, !1, !3, !1, !2}

				!0 = !{void ()* @kernel1, !"kernel", i32 1}
				!1 = !{void ()* @non_kernel, !"non_kernel", i32 1}
				!2 = !{null, !"align", i32 1}
				!3 = !{void ()* @kernel2, !"kernel", i32 1}

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Identify GPU kernels (aka. OpenMP target regions)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 277220

llvm/include/llvm/Transforms/IPO/OpenMPOpt.h

llvm/lib/Transforms/IPO/OpenMPOpt.cpp

llvm/test/Transforms/OpenMP/gpu_kernel_detection_remarks.ll

[OpenMP] Identify GPU kernels (aka. OpenMP target regions)
ClosedPublic