This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
polly/trunk/
-
trunk/
-
lib/
-
CMakeLists.txt
-
CodeGen/
-
PPCGCodeGeneration.cpp
-
test/
-
GPGPU/
-
Inputs/
-
libdevice-functions-copied-into-kernel_libdevice.ll
-
libdevice-functions-copied-into-kernel.ll
-
lit.site.cfg.in

Differential D35703

[GPGPU] Add support for NVIDIA libdevice
ClosedPublic

Authored by grosser on Jul 20 2017, 3:44 PM.

Download Raw Diff

Details

Reviewers

bollu
singam-sanjay

Commits

rG8fc6cdfb1cbd: [GPGPU] Add support for NVIDIA libdevice
rPLO309560: [GPGPU] Add support for NVIDIA libdevice
rL309560: [GPGPU] Add support for NVIDIA libdevice

Summary

This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp* and cos functions. Other functions
will be enabled as needed.

Diff Detail

Repository: rL LLVM

Event Timeline

grosser created this revision.Jul 20 2017, 3:44 PM

Herald added subscribers: kbarton, mgorny, nemanjai. · View Herald TranscriptJul 20 2017, 3:44 PM

bollu added inline comments.Jul 20 2017, 3:54 PM

lib/CodeGen/PPCGCodeGeneration.cpp
1314 ↗	(On Diff #107597)	Consider changing to `SmallSet`? That feels correct in terms of semantics (a `set` of functions.)
1418 ↗	(On Diff #107597)	`const bool`? :)
2131 ↗	(On Diff #107597)	consider moving computing `RequiresLibDevice` to a pure function?
2138 ↗	(On Diff #107597)	I believe we can `assert` at this point, since this is not a "mis-compile" in the strictest sense of the word?.
2148 ↗	(On Diff #107597)	nit: `trible` -> `triple`.

tra added a subscriber: tra.Jul 20 2017, 3:59 PM

tra added inline comments.

lib/CodeGen/PPCGCodeGeneration.cpp
107–111 ↗	(On Diff #107597)	This is something that is useful for all NVPTX users and should probably live there and it should not have any hardcoded path -- it's too easy to end up silently picking wrong library otherwise. Hardcoded compute_20 is also problematic because it should depend on particular GPU arch we're compiling for. Considering that LLVM has no idea about CUDA SDL location, this is sommething that should always be explicitly specified. Either base path + libdevice name derived from GPU arch, or complete path to specific libdevice variant (i.e. it's completely up to the user to provide correct libdevice).
3020 ↗	(On Diff #107597)	Bllock -> Block
test/GPGPU/libdevice-functions-copied-into-kernel.ll
38–69 ↗	(On Diff #107597)	Can the test be reduced to just expf() call?
test/GPGPU/libdevice-functions-copied-into-kernel_libdevice.bc
1–6 ↗	(On Diff #107597)	This file should be under Inputs/ directory (see NVPTX tests for example) and have .ll extension.

bollu added a reviewer: singam-sanjay.Jul 21 2017, 7:06 AM

Is there some way to test this without having libdevice? The tests break on my mac.

bollu added inline comments.Jul 24 2017, 3:18 AM

lib/CodeGen/PPCGCodeGeneration.cpp
1314 ↗	(On Diff #107597)	could you also please add `sqrt` to the list? this is present in the `COSMO` kernel.

singam-sanjay added inline comments.Jul 24 2017, 8:09 AM

lib/CodeGen/PPCGCodeGeneration.cpp
107 ↗	(On Diff #107597)	Would it be better to call this CUDALibDevice or CULibDevice instead ? since this applies only to NVPTX
109 ↗	(On Diff #107597)	Consider changing this to "/usr/local/cuda/nvvm/libdevice/libdevice.compute_20_10.bc". That would work on most Linux platforms by default. I'm not sure if PTX code for a compute capability 2 device would run on any newer device. Is it possible to initialize this after figuring out the CC of device 0 ? Also, I heard that CUDA SDK 8 would be the last to support CC 2.x. CUDA 9 supports all CCs from 3.0.
609 ↗	(On Diff #107597)	Would addCULibDevice be a better name ?

Address review comments

Thank you for all the good reviews. I tried to address all of them.

Best,
Tobias

lib/CodeGen/PPCGCodeGeneration.cpp
107 ↗	(On Diff #107597)	I use CUDALibDevice
107–111 ↗	(On Diff #107597)	@tra: Thank you for your comment. This is the very first commit to introduce this feature. We currently are in early beta tests. The library location is supposed to be provided by the user by setting the path with polly-acc-libdevice. I set the option to a very basic default. For now I expect the user to adjust this default. In the future we can add some generic infrastructure to LLVM to derive this path automatically. If a specific fixed path is too confusing I can also use an empty default and always prompt for a path.
109 ↗	(On Diff #107597)	Changed. using /usr/local/cuda is indeed a good idea. I would like to start with the oldest library. We can later add support for different library versions. I don't think we can query device 0, as we might compile on different platform as where we run the final code. However, we can make this depend on polly-acc-cuda-version.
609 ↗	(On Diff #107597)	Changed.
1314 ↗	(On Diff #107597)	Done.!
1314 ↗	(On Diff #107597)	I use a std::set. That should be good enough for now.
2131 ↗	(On Diff #107597)	Done.
2138 ↗	(On Diff #107597)	I use report_fatal_error as suggested by Michael.
2148 ↗	(On Diff #107597)	Done!
3020 ↗	(On Diff #107597)	Fixed in r308715.
test/GPGPU/libdevice-functions-copied-into-kernel.ll
38–69 ↗	(On Diff #107597)	Unlikely. This is a test for Polly-ACC, where we auto-offload to CUDA. For this we need at least some parallelism, which means some loop.
test/GPGPU/libdevice-functions-copied-into-kernel_libdevice.bc
1–6 ↗	(On Diff #107597)	Very good idea. I adopted it.

How is this different from passing libdevice to either of the -mlink-cuda-bitcode or -mlink-bticode-file options ?

lib/CodeGen/PPCGCodeGeneration.cpp
2309 ↗	(On Diff #107971)	Could this be avoided by implementing Triple::isCompatibleWith() for nvptx?

tra added inline comments.Jul 24 2017, 3:51 PM

lib/CodeGen/PPCGCodeGeneration.cpp
113 ↗	(On Diff #107971)	The variable appears to be unused. On a side note, please consider that there's already a way to specify GPU variant for NVPTX back-end. This option is either going to be redundant or you'll need a good explanation for what's supposed to happen when its value conflicts with whatever GPU variant NVPTX back-end thinks it's supposed to generate the code for.
118 ↗	(On Diff #107971)	This also appears to be unused. Please remove them from this patch and re-introduce them in subsequent patches that really need them.
2277 ↗	(On Diff #107971)	!empty()
2288 ↗	(On Diff #107971)	What's supposed to happen in this case? Consider adding a diagnostic message.
2293 ↗	(On Diff #107971)	But why are you still printing the libdevice name on stderr?
2323 ↗	(On Diff #107971)	I'm curious -- when would it be OK to proceed if verifier has failed? Should this option be other way around -- and make LLVM fail by default on verifier failure? That would be my expectation of normal behavior. One could conceivably use this option for debugging purposes to force LLVM to proceed even when verifier has failed, but in general I believe that by default the errors should be reported as early as possible.
107–111 ↗	(On Diff #107597)	I believe no default would be a better option in this case as it minimizes possibility for things to go wrong silently.
test/GPGPU/libdevice-functions-copied-into-kernel.ll
38–69 ↗	(On Diff #107597)	It may be worth reconsidering your approach and making this functionality generic to NVPTX so it can benefit all users of NVPTX back-end. The functionality is generic enough to benefit CUDA (and possibly OpenCL) which currently fail miserably if any of standard library functions sneak into IR.

Other than nits, LGTM

lib/CodeGen/PPCGCodeGeneration.cpp
1341 ↗	(On Diff #107971)	this can be `const std::set<std::string> ...` ? I do not see us mutating this, and it would be nice to communicate this fact.
test/lit.site.cfg.in
37 ↗	(On Diff #107971)	Can we be more explicit and mention that these are for the `libdevice` tests?

This revision is now accepted and ready to land.Jul 25 2017, 12:50 AM

bollu added inline comments.Jul 28 2017, 5:13 AM

lib/CodeGen/PPCGCodeGeneration.cpp
1341 ↗	(On Diff #107971)	Could you also add `copysign`, please?

Closed by commit rL309560: [GPGPU] Add support for NVIDIA libdevice (authored by grosser). · Explain WhyJul 31 2017, 7:04 AM

This revision was automatically updated to reflect the committed changes.

grosser marked an inline comment as done.

Revision Contents

Path

Size

polly/

trunk/

lib/

CMakeLists.txt

2 lines

CodeGen/

PPCGCodeGeneration.cpp

110 lines

test/

GPGPU/

Inputs/

libdevice-functions-copied-into-kernel_libdevice.ll

3 lines

libdevice-functions-copied-into-kernel.ll

74 lines

lit.site.cfg.in

5 lines

Diff 108918

polly/trunk/lib/CMakeLists.txt

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	target_link_libraries(Polly
LLVMCore		LLVMCore
LLVMScalarOpts		LLVMScalarOpts
LLVMInstCombine		LLVMInstCombine
LLVMTransformUtils		LLVMTransformUtils
LLVMAnalysis		LLVMAnalysis
LLVMipo		LLVMipo
LLVMMC		LLVMMC
LLVMPasses		LLVMPasses
		LLVMLinker
		LLVMIRReader
${nvptx_libs}		${nvptx_libs}
# The libraries below are required for darwin: http://PR26392		# The libraries below are required for darwin: http://PR26392
LLVMBitReader		LLVMBitReader
LLVMMCParser		LLVMMCParser
LLVMObject		LLVMObject
LLVMProfileData		LLVMProfileData
LLVMTarget		LLVMTarget
LLVMVectorize		LLVMVectorize
Show All 39 Lines

polly/trunk/lib/CodeGen/PPCGCodeGeneration.cpp

Show All 25 Lines
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"		#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
		#include "llvm/IRReader/IRReader.h"
		#include "llvm/Linker/Linker.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"		#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"

#include "isl/union_map.h"		#include "isl/union_map.h"

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
static cl::opt<bool>		static cl::opt<bool>
FailOnVerifyModuleFailure("polly-acc-fail-on-verify-module-failure",		FailOnVerifyModuleFailure("polly-acc-fail-on-verify-module-failure",
cl::desc("Fail and generate a backtrace if"		cl::desc("Fail and generate a backtrace if"
" verifyModule fails on the GPU "		" verifyModule fails on the GPU "
" kernel module."),		" kernel module."),
cl::Hidden, cl::init(false), cl::ZeroOrMore,		cl::Hidden, cl::init(false), cl::ZeroOrMore,
cl::cat(PollyCategory));		cl::cat(PollyCategory));

		static cl::opt<std::string> CUDALibDevice(
		"polly-acc-libdevice", cl::desc("Path to CUDA libdevice"), cl::Hidden,
		cl::init("/usr/local/cuda/nvvm/libdevice/libdevice.compute_20.10.ll"),
		cl::ZeroOrMore, cl::cat(PollyCategory));

static cl::opt<std::string>		static cl::opt<std::string>
CudaVersion("polly-acc-cuda-version",		CudaVersion("polly-acc-cuda-version",
cl::desc("The CUDA version to compile for"), cl::Hidden,		cl::desc("The CUDA version to compile for"), cl::Hidden,
cl::init("sm_30"), cl::ZeroOrMore, cl::cat(PollyCategory));		cl::init("sm_30"), cl::ZeroOrMore, cl::cat(PollyCategory));

static cl::opt<int>		static cl::opt<int>
MinCompute("polly-acc-mincompute",		MinCompute("polly-acc-mincompute",
cl::desc("Minimal number of compute statements to run on GPU."),		cl::desc("Minimal number of compute statements to run on GPU."),
▲ Show 20 Lines • Show All 487 Lines • ▼ Show 20 Lines	private:
/// @param F The function to remove references to.		/// @param F The function to remove references to.
void clearScalarEvolution(Function *F);		void clearScalarEvolution(Function *F);

/// Remove references from loop info to the kernel function @p F.		/// Remove references from loop info to the kernel function @p F.
///		///
/// @param F The function to remove references to.		/// @param F The function to remove references to.
void clearLoops(Function *F);		void clearLoops(Function *F);

		/// Check if the scop requires to be linked with CUDA's libdevice.
		bool requiresCUDALibDevice();

		/// Link with the NVIDIA libdevice library (if needed and available).
		void addCUDALibDevice();

/// Finalize the generation of the kernel function.		/// Finalize the generation of the kernel function.
///		///
/// Free the LLVM-IR module corresponding to the kernel and -- if requested --		/// Free the LLVM-IR module corresponding to the kernel and -- if requested --
/// dump its IR to stderr.		/// dump its IR to stderr.
///		///
/// @returns The Assembly string of the kernel.		/// @returns The Assembly string of the kernel.
std::string finalizeKernelFunction();		std::string finalizeKernelFunction();

▲ Show 20 Lines • Show All 703 Lines • ▼ Show 20 Lines	isl_bool collectReferencesInGPUStmt(__isl_keep isl_ast_node Node, void User) {
auto Stmt = (ScopStmt *)KernelStmt->u.d.stmt->stmt;		auto Stmt = (ScopStmt *)KernelStmt->u.d.stmt->stmt;
isl_id_free(Id);		isl_id_free(Id);

addReferencesFromStmt(Stmt, User, false /* CreateScalarRefs */);		addReferencesFromStmt(Stmt, User, false /* CreateScalarRefs */);

return isl_bool_true;		return isl_bool_true;
}		}

		/// A list of functions that are available in NVIDIA's libdevice.
		const std::set<std::string> CUDALibDeviceFunctions = {
		"exp", "expf", "expl", "cos", "cosf",
		"sqrt", "sqrtf", "copysign", "copysignf", "copysignl"};

		/// Return the corresponding CUDA libdevice function name for @p F.
		///
		/// Return "" if we are not compiling for CUDA.
		std::string getCUDALibDeviceFuntion(Function *F) {
		if (CUDALibDeviceFunctions.count(F->getName()))
		return std::string("__nv_") + std::string(F->getName());

		return "";
		}

/// Check if F is a function that we can code-generate in a GPU kernel.		/// Check if F is a function that we can code-generate in a GPU kernel.
static bool isValidFunctionInKernel(llvm::Function *F) {		static bool isValidFunctionInKernel(llvm::Function *F, bool AllowLibDevice) {
assert(F && "F is an invalid pointer");		assert(F && "F is an invalid pointer");
// We string compare against the name of the function to allow		// We string compare against the name of the function to allow
// all variants of the intrinsic "llvm.sqrt.*", "llvm.fabs", and		// all variants of the intrinsic "llvm.sqrt.*", "llvm.fabs", and
// "llvm.copysign".		// "llvm.copysign".
const StringRef Name = F->getName();		const StringRef Name = F->getName();

		if (AllowLibDevice && getCUDALibDeviceFuntion(F).length() > 0)
		return true;

return F->isIntrinsic() &&		return F->isIntrinsic() &&
(Name.startswith("llvm.sqrt") \|\| Name.startswith("llvm.fabs") \|\|		(Name.startswith("llvm.sqrt") \|\| Name.startswith("llvm.fabs") \|\|
Name.startswith("llvm.copysign"));		Name.startswith("llvm.copysign"));
}		}

/// Do not take `Function` as a subtree value.		/// Do not take `Function` as a subtree value.
///		///
/// We try to take the reference of all subtree values and pass them along		/// We try to take the reference of all subtree values and pass them along
/// to the kernel from the host. Taking an address of any function and		/// to the kernel from the host. Taking an address of any function and
/// trying to pass along is nonsensical. Only allow `Value`s that are not		/// trying to pass along is nonsensical. Only allow `Value`s that are not
/// `Function`s.		/// `Function`s.
static bool isValidSubtreeValue(llvm::Value *V) { return !isa<Function>(V); }		static bool isValidSubtreeValue(llvm::Value *V) { return !isa<Function>(V); }

/// Return `Function`s from `RawSubtreeValues`.		/// Return `Function`s from `RawSubtreeValues`.
static SetVector<Function *>		static SetVector<Function *>
getFunctionsFromRawSubtreeValues(SetVector<Value *> RawSubtreeValues) {		getFunctionsFromRawSubtreeValues(SetVector<Value *> RawSubtreeValues,
		bool AllowCUDALibDevice) {
SetVector<Function *> SubtreeFunctions;		SetVector<Function *> SubtreeFunctions;
for (Value *It : RawSubtreeValues) {		for (Value *It : RawSubtreeValues) {
Function *F = dyn_cast<Function>(It);		Function *F = dyn_cast<Function>(It);
if (F) {		if (F) {
assert(isValidFunctionInKernel(F) && "Code should have bailed out by "		assert(isValidFunctionInKernel(F, AllowCUDALibDevice) &&
		"Code should have bailed out by "
"this point if an invalid function "		"this point if an invalid function "
"were present in a kernel.");		"were present in a kernel.");
SubtreeFunctions.insert(F);		SubtreeFunctions.insert(F);
}		}
}		}
return SubtreeFunctions;		return SubtreeFunctions;
}		}

std::pair<SetVector<Value >, SetVector<Function >>		std::pair<SetVector<Value >, SetVector<Function >>
GPUNodeBuilder::getReferencesInKernel(ppcg_kernel *Kernel) {		GPUNodeBuilder::getReferencesInKernel(ppcg_kernel *Kernel) {
Show All 37 Lines	GPUNodeBuilder::getReferencesInKernel(ppcg_kernel *Kernel) {
// SubtreeValues. This is important, because we should not lose any		// SubtreeValues. This is important, because we should not lose any
// SubtreeValues in the process of constructing the		// SubtreeValues in the process of constructing the
// "ValidSubtree{Values, Functions} sets. Nor should the set		// "ValidSubtree{Values, Functions} sets. Nor should the set
// ValidSubtree{Values, Functions} have any common element.		// ValidSubtree{Values, Functions} have any common element.
auto ValidSubtreeValuesIt =		auto ValidSubtreeValuesIt =
make_filter_range(SubtreeValues, isValidSubtreeValue);		make_filter_range(SubtreeValues, isValidSubtreeValue);
SetVector<Value *> ValidSubtreeValues(ValidSubtreeValuesIt.begin(),		SetVector<Value *> ValidSubtreeValues(ValidSubtreeValuesIt.begin(),
ValidSubtreeValuesIt.end());		ValidSubtreeValuesIt.end());

		bool AllowCUDALibDevice = Arch == GPUArch::NVPTX64;

SetVector<Function *> ValidSubtreeFunctions(		SetVector<Function *> ValidSubtreeFunctions(
getFunctionsFromRawSubtreeValues(SubtreeValues));		getFunctionsFromRawSubtreeValues(SubtreeValues, AllowCUDALibDevice));

// @see IslNodeBuilder::getReferencesInSubtree		// @see IslNodeBuilder::getReferencesInSubtree
SetVector<Value *> ReplacedValues;		SetVector<Value *> ReplacedValues;
for (Value *V : ValidSubtreeValues) {		for (Value *V : ValidSubtreeValues) {
auto It = ValueMap.find(V);		auto It = ValueMap.find(V);
if (It == ValueMap.end())		if (It == ValueMap.end())
ReplacedValues.insert(V);		ReplacedValues.insert(V);
else		else
▲ Show 20 Lines • Show All 807 Lines • ▼ Show 20 Lines	if (TargetM->addPassesToEmitFile(
return "";		return "";
}		}

PM.run(*GPUModule);		PM.run(*GPUModule);

return ASMStream.str();		return ASMStream.str();
}		}

		bool GPUNodeBuilder::requiresCUDALibDevice() {
		for (Function &F : GPUModule->functions()) {
		if (!F.isDeclaration())
		continue;

		std::string CUDALibDeviceFunc = getCUDALibDeviceFuntion(&F);
		if (CUDALibDeviceFunc.length() != 0) {
		F.setName(CUDALibDeviceFunc);
		return true;
		}
		}

		return false;
		}

		void GPUNodeBuilder::addCUDALibDevice() {
		if (Arch != GPUArch::NVPTX64)
		return;

		if (requiresCUDALibDevice()) {
		SMDiagnostic Error;

		errs() << CUDALibDevice << "\n";
		auto LibDeviceModule =
		parseIRFile(CUDALibDevice, Error, GPUModule->getContext());

		if (!LibDeviceModule) {
		BuildSuccessful = false;
		report_fatal_error("Could not find or load libdevice. Skipping GPU "
		"kernel generation. Please set -polly-acc-libdevice "
		"accordingly.\n");
		return;
		}

		Linker L(*GPUModule);

		// Set an nvptx64 target triple to avoid linker warnings. The original
		// triple of the libdevice files are nvptx-unknown-unknown.
		LibDeviceModule->setTargetTriple(Triple::normalize("nvptx64-nvidia-cuda"));
		L.linkInModule(std::move(LibDeviceModule), Linker::LinkOnlyNeeded);
		}
		}

std::string GPUNodeBuilder::finalizeKernelFunction() {		std::string GPUNodeBuilder::finalizeKernelFunction() {

if (verifyModule(*GPUModule)) {		if (verifyModule(*GPUModule)) {
DEBUG(dbgs() << "verifyModule failed on module:\n";		DEBUG(dbgs() << "verifyModule failed on module:\n";
GPUModule->print(dbgs(), nullptr); dbgs() << "\n";);		GPUModule->print(dbgs(), nullptr); dbgs() << "\n";);
DEBUG(dbgs() << "verifyModule Error:\n";		DEBUG(dbgs() << "verifyModule Error:\n";
verifyModule(*GPUModule, &dbgs()););		verifyModule(*GPUModule, &dbgs()););

if (FailOnVerifyModuleFailure)		if (FailOnVerifyModuleFailure)
llvm_unreachable("VerifyModule failed.");		llvm_unreachable("VerifyModule failed.");

BuildSuccessful = false;		BuildSuccessful = false;
return "";		return "";
}		}

		addCUDALibDevice();

if (DumpKernelIR)		if (DumpKernelIR)
outs() << *GPUModule << "\n";		outs() << *GPUModule << "\n";

if (Arch != GPUArch::SPIR32 && Arch != GPUArch::SPIR64) {		if (Arch != GPUArch::SPIR32 && Arch != GPUArch::SPIR64) {
// Optimize module.		// Optimize module.
llvm::legacy::PassManager OptPasses;		llvm::legacy::PassManager OptPasses;
PassManagerBuilder PassBuilder;		PassManagerBuilder PassBuilder;
PassBuilder.OptLevel = 3;		PassBuilder.OptLevel = 3;
▲ Show 20 Lines • Show All 853 Lines • ▼ Show 20 Lines	createSufficientComputeCheck(Scop &S, __isl_keep isl_ast_build *Build) {
return isl_ast_expr_ge(Iterations, MinComputeExpr);		return isl_ast_expr_ge(Iterations, MinComputeExpr);
}		}

/// Check if the basic block contains a function we cannot codegen for GPU		/// Check if the basic block contains a function we cannot codegen for GPU
/// kernels.		/// kernels.
///		///
/// If this basic block does something with a `Function` other than calling		/// If this basic block does something with a `Function` other than calling
/// a function that we support in a kernel, return true.		/// a function that we support in a kernel, return true.
bool containsInvalidKernelFunctionInBlock(const BasicBlock *BB) {		bool containsInvalidKernelFunctionInBlock(const BasicBlock *BB,
		bool AllowCUDALibDevice) {
for (const Instruction &Inst : *BB) {		for (const Instruction &Inst : *BB) {
const CallInst *Call = dyn_cast<CallInst>(&Inst);		const CallInst *Call = dyn_cast<CallInst>(&Inst);
if (Call && isValidFunctionInKernel(Call->getCalledFunction())) {		if (Call && isValidFunctionInKernel(Call->getCalledFunction(),
		AllowCUDALibDevice)) {
continue;		continue;
}		}

for (Value *SrcVal : Inst.operands()) {		for (Value *SrcVal : Inst.operands()) {
PointerType *p = dyn_cast<PointerType>(SrcVal->getType());		PointerType *p = dyn_cast<PointerType>(SrcVal->getType());
if (!p)		if (!p)
continue;		continue;
if (isa<FunctionType>(p->getElementType()))		if (isa<FunctionType>(p->getElementType()))
return true;		return true;
}		}
}		}
return false;		return false;
}		}

/// Return whether the Scop S uses functions in a way that we do not support.		/// Return whether the Scop S uses functions in a way that we do not support.
bool containsInvalidKernelFunction(const Scop &S) {		bool containsInvalidKernelFunction(const Scop &S, bool AllowCUDALibDevice) {
for (auto &Stmt : S) {		for (auto &Stmt : S) {
if (Stmt.isBlockStmt()) {		if (Stmt.isBlockStmt()) {
if (containsInvalidKernelFunctionInBlock(Stmt.getBasicBlock()))		if (containsInvalidKernelFunctionInBlock(Stmt.getBasicBlock(),
		AllowCUDALibDevice))
return true;		return true;
} else {		} else {
assert(Stmt.isRegionStmt() &&		assert(Stmt.isRegionStmt() &&
"Stmt was neither block nor region statement");		"Stmt was neither block nor region statement");
for (const BasicBlock *BB : Stmt.getRegion()->blocks())		for (const BasicBlock *BB : Stmt.getRegion()->blocks())
if (containsInvalidKernelFunctionInBlock(BB))		if (containsInvalidKernelFunctionInBlock(BB, AllowCUDALibDevice))
return true;		return true;
}		}
}		}
return false;		return false;
}		}

/// Generate code for a given GPU AST described by @p Root.		/// Generate code for a given GPU AST described by @p Root.
///		///
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	bool runOnScop(Scop &CurrentScop) override {
DL = &S->getRegion().getEntry()->getModule()->getDataLayout();		DL = &S->getRegion().getEntry()->getModule()->getDataLayout();
RI = &getAnalysis<RegionInfoPass>().getRegionInfo();		RI = &getAnalysis<RegionInfoPass>().getRegionInfo();

// We currently do not support functions other than intrinsics inside		// We currently do not support functions other than intrinsics inside
// kernels, as code generation will need to offload function calls to the		// kernels, as code generation will need to offload function calls to the
// kernel. This may lead to a kernel trying to call a function on the host.		// kernel. This may lead to a kernel trying to call a function on the host.
// This also allows us to prevent codegen from trying to take the		// This also allows us to prevent codegen from trying to take the
// address of an intrinsic function to send to the kernel.		// address of an intrinsic function to send to the kernel.
if (containsInvalidKernelFunction(CurrentScop)) {		if (containsInvalidKernelFunction(CurrentScop,
		Architecture == GPUArch::NVPTX64)) {
DEBUG(		DEBUG(
dbgs()		dbgs()
<< "Scop contains function which cannot be materialised in a GPU "		<< "Scop contains function which cannot be materialised in a GPU "
"kernel. Bailing out.\n";);		"kernel. Bailing out.\n";);
return false;		return false;
}		}

auto PPCGScop = createPPCGScop();		auto PPCGScop = createPPCGScop();
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

polly/trunk/test/GPGPU/Inputs/libdevice-functions-copied-into-kernel_libdevice.ll

				define float @__nv_expf(float %a) {
				ret float %a
				}

polly/trunk/test/GPGPU/libdevice-functions-copied-into-kernel.ll

				; RUN: opt %loadPolly -analyze -polly-scops < %s \
				; RUN: -polly-acc-libdevice=%S/Inputs/libdevice-functions-copied-into-kernel_libdevice.ll \
				; RUN: \| FileCheck %s --check-prefix=SCOP
				; RUN: opt %loadPolly -analyze -polly-codegen-ppcg -polly-acc-dump-kernel-ir \
				; RUN: -polly-acc-libdevice=%S/Inputs/libdevice-functions-copied-into-kernel_libdevice.ll \
				; RUN: < %s \| FileCheck %s --check-prefix=KERNEL-IR
				; RUN: opt %loadPolly -S -polly-codegen-ppcg < %s \
				; RUN: -polly-acc-libdevice=%S/Inputs/libdevice-functions-copied-into-kernel_libdevice.ll \
				; RUN: \| FileCheck %s --check-prefix=HOST-IR

				; Test that we do recognise and codegen a kernel that has functions that can
				; be mapped to NVIDIA's libdevice

				; REQUIRES: pollyacc

				; Check that we model the kernel as a scop.
				; SCOP: Function: f
				; SCOP-NEXT: Region: %entry.split---%for.end

				; Check that the intrinsic call is present in the kernel IR.
				; KERNEL-IR: %p_expf = tail call float @__nv_expf(float %A.arr.i.val_p_scalar_)

				; Check that kernel launch is generated in host IR.
				; the declare would not be generated unless a call to a kernel exists.
				; HOST-IR: declare void @polly_launchKernel(i8, i32, i32, i32, i32, i32, i8)


				; void f(float A, float B, int N) {
				; for(int i = 0; i < N; i++) {
				; float tmp0 = A[i];
				; float tmp1 = expf(tmp1);
				; B[i] = tmp1;
				; }
				; }

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(float* %A, float* %B, i32 %N) {
				entry:
				br label %entry.split

				entry.split: ; preds = %entry
				%cmp1 = icmp sgt i32 %N, 0
				br i1 %cmp1, label %for.body.lr.ph, label %for.end

				for.body.lr.ph: ; preds = %entry.split
				br label %for.body

				for.body: ; preds = %for.body.lr.ph, %for.body
				%indvars.iv = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next, %for.body ]
				%A.arr.i = getelementptr inbounds float, float* %A, i64 %indvars.iv
				%A.arr.i.val = load float, float* %A.arr.i, align 4
				; Call to intrinsics that should be part of the kernel.
				%expf = tail call float @expf(float %A.arr.i.val)
				%B.arr.i = getelementptr inbounds float, float* %B, i64 %indvars.iv
				store float %expf, float* %B.arr.i, align 4

				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%wide.trip.count = zext i32 %N to i64
				%exitcond = icmp ne i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond, label %for.body, label %for.cond.for.end_crit_edge

				for.cond.for.end_crit_edge: ; preds = %for.body
				br label %for.end

				for.end: ; preds = %for.cond.for.end_crit_edge, %entry.split
				ret void
				}

				; Function Attrs: nounwind readnone
				declare float @expf(float) #0

				attributes #0 = { nounwind readnone }

polly/trunk/test/lit.site.cfg.in

	Show All 25 Lines
	try:			try:
	config.llvm_tools_dir = config.llvm_tools_dir % lit_config.params			config.llvm_tools_dir = config.llvm_tools_dir % lit_config.params
	config.llvm_libs_dir = config.llvm_libs_dir % lit_config.params			config.llvm_libs_dir = config.llvm_libs_dir % lit_config.params
	except KeyError:			except KeyError:
	e = sys.exc_info()[1]			e = sys.exc_info()[1]
	key, = e.args			key, = e.args
	lit_config.fatal("unable to find %r parameter, use '--param=%s=VALUE'" % (key,key))			lit_config.fatal("unable to find %r parameter, use '--param=%s=VALUE'" % (key,key))

				# excludes: A list of directories to exclude from the testsuite. The 'Inputs'
				# subdirectories contain auxiliary inputs for various tests in their parent
				# directories.
				config.excludes = ['Inputs']

	if config.link_polly_into_tools == '' or \			if config.link_polly_into_tools == '' or \
	config.link_polly_into_tools.lower() == '0' or \			config.link_polly_into_tools.lower() == '0' or \
	config.link_polly_into_tools.lower() == 'n' or \			config.link_polly_into_tools.lower() == 'n' or \
	config.link_polly_into_tools.lower() == 'no' or \			config.link_polly_into_tools.lower() == 'no' or \
	config.link_polly_into_tools.lower() == 'off' or \			config.link_polly_into_tools.lower() == 'off' or \
	config.link_polly_into_tools.lower() == 'false' or \			config.link_polly_into_tools.lower() == 'false' or \
	config.link_polly_into_tools.lower() == 'notfound' or \			config.link_polly_into_tools.lower() == 'notfound' or \
	config.link_polly_into_tools.lower() == 'link_polly_into_tools-notfound':			config.link_polly_into_tools.lower() == 'link_polly_into_tools-notfound':
	Show All 20 Lines