This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/polly/CodeGen/
-
polly/
-
CodeGen/
-
PPCGCodeGeneration.h
-
lib/
-
CodeGen/
32/35
PPCGCodeGeneration.cpp
-
Support/
-
RegisterPasses.cpp
-
test/GPGPU/
-
GPGPU/
-
spir-codegen.ll
-
tools/GPURuntime/
-
GPURuntime/
21/29
GPUJIT.c

Differential D35185

[Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for Intel
ClosedPublic

Authored by PhilippSchaad on Jul 9 2017, 11:51 AM.

Download Raw Diff

Details

Reviewers

bollu
grosser
Meinersbur
singam-sanjay

Commits

rG2f3073b5cb88: [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for…
rPLO308751: [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for…
rL308751: [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for…

Summary

Added SPIR Code Generation to the PPCG Code Generator. This can be invoked using
the polly-gpu-arch flag value 'spir32' or 'spir64' for 32 and 64 bit code respectively.
In addition to that, runtime support has been added to execute said SPIR code on Intel
GPU's, where the system is equipped with Intel's open source driver Beignet (development
version). This requires the cmake flag 'USE_INTEL_OCL' to be turned on, and the polly-gpu-runtime
flag value to be 'libopencl'.
The transformation of LLVM IR to SPIR is currently quite a hack, consisting in part of regex
string transformations.
Has been tested (working) with Polybench 3.2 on an Intel i7-5500U (integrated graphics chip).

Diff Detail

Build Status

Buildable 8468
Build 8468: arc lint + arc unit

Event Timeline

PhilippSchaad created this revision.Jul 9 2017, 11:51 AM

Herald added a reviewer: bollu. · View Herald TranscriptJul 9 2017, 11:51 AM

Herald added subscribers: kbarton, Anastasia, mgorny, nemanjai. · View Herald Transcript

Harbormaster completed remote builds in B8083: Diff 105789.Jul 9 2017, 11:51 AM

PhilippSchaad added reviewers: grosser, Meinersbur.Jul 9 2017, 11:52 AM

PhilippSchaad set the repository for this revision to rL LLVM.

PhilippSchaad added a project: Restricted Project.

PhilippSchaad edited the summary of this revision. (Show Details)

PhilippSchaad added a subscriber: pollydev.

Hi Philipp,

this looks indeed already surprisingly good. As you noted, there are still some hacks, but I have good hopes that we can work around most of them. I added some comments. Let me know what you think!

Best,
Tobias

lib/CodeGen/PPCGCodeGeneration.cpp
40	After having removed the hack below, this should hopefully not be needed any more.
1281	Why do you emit NVIDIA intrinsics for SPIR code? Can you not emit an opencl barrier? If you don't know what intrinsic corresponds to this, create a simple .cl file, compile it with clang to SPIR and see which intrinsic gets emitted.
1819	Instead of spreading the SPIR code all over the place, can we create one function addSPIRMetadata, that takes 'Args' as argument, emits all the metadata needed and also emits the metadata you add a little further below?
1919	Why do you emit NVIDIA intrinsics. As discussed above, this there no way to emit the OpenCL intrinsics instead?
2253	Interesting. So the optimization break stuff. Would be great to understand what exactly we brake!
2265	It seems all these regexp hacks are needed because we don't generate the right intrinsics. What is holding us back from generating them directly?
tools/GPURuntime/GPUJIT.c
155	Why do these declarations need to be optional? I think they should compile easily in any situation, no?
223	I am surprised. Intel's OpenCL uses such a different path. In fact, I would assume that libOpenCL.so as a generic OpenCL library would take care of loading beignet. Is this not the case?
288	Does this break or does this just return nullptr if the function is not available. If we can always try and just detect if it is not around, this would be great!
519	Why do we read the file from "kernel.ll" and not use the file passed to us in BinaryBuffer?

Set this to request changes, that I get notified in case you update this revision.

This revision now requires changes to proceed.Jul 10 2017, 1:51 PM

PhilippSchaad added inline comments.Jul 10 2017, 4:06 PM

lib/CodeGen/PPCGCodeGeneration.cpp
40	That is correct, yes. This should only be needed as long as we have to manually change the string occurrences for barriers and global/local IDs.
1281	The generated intrinsics are the ones declared in the SPIR reference, so for example @_Z13get_global_idj(i32 0) . I cannot find any corresponding intrinsics in LLVM's TableGen/Intrinsics, did I overlook some? This is essentially the only real point where the hack is needed. If we can insert correct intrinsics, we can ditch the regex. Because fixing this is all the regex does.
1819	Will look into that.
1919	Dito above.
2253	Looking into that. What is clear, is that the optimized version skips a few 'dummy' computations, because the optimized kernels expect a smaller workgroup size in OpenCL. Not sure if and how we could account for that. It seems very tricky to me atm, have played around with it quite a bit.
2265	Dito above.
tools/GPURuntime/GPUJIT.c
155	Oh yeah, my bad. Only the LLVMIntel one has to be conditional.
223	The funny thing is: it does, somewhat. If I only dlopen libOpenCL and then call for example clCreateProgramWithBinary, looking at that in gdb reveals that the beignet routines get loaded by libOpenCL. However as soon as a function is not present in the standard OpenCL API, eg. clCreateProgramWithLLVMIntel, it does not like that.
288	I'm gonna look into that, but iirc it didn't work. Might be wrong, will try it out again!
519	Well the problem is that clCreateProgramWithLLVMIntel wants to read from a file, and requests a filename/filepath as an argument there. BinaryBuffer however holds the complete binary. If you know a simpler solution, that would of course be great, because this is another hacky-part.

PhilippSchaad updated this revision to Diff 105946.Jul 10 2017, 5:29 PM

PhilippSchaad edited edge metadata.

Adapted dynamic method loading for intel

grosser added inline comments.Jul 10 2017, 10:08 PM

CMakeLists.txt
178 ↗	(On Diff #105946)	Forgot to mention, but would be great if we could get away without a configure test here.
lib/CodeGen/PPCGCodeGeneration.cpp
1281	No, I think they may not exist. However, you can just create declarations for such functions using code as in "GPUNodeBuilder::createCallGetKernel(".
1819	I just see this is not only about the llvm::Type, but also carries an integer which is 0 for local/shared memory and 1 otherwise. So moving this into a function does not seem so easy. If you don't find an easy way, just leave it as it is.
2253	It's OK. Let's get the remainder of the patch updated, then we can look at this issue later on.
tools/GPURuntime/GPUJIT.c
519	I see. Maybe that's fine then. Just add a comment for now. I would prefer if we could use clCreateProgramWithBinary and just pass an LLVM bitcode file, but this likely only works if the LLVM versions of beignet is very modern. Let's leave it for now like this. It's good to have something working!

grosser requested changes to this revision.Jul 10 2017, 10:15 PM

This revision now requires changes to proceed.Jul 10 2017, 10:15 PM

Inserting SPIR barriers with custom function.

CMakeLists.txt
178 ↗	(On Diff #105946)	Yes I agree. I don't know of a method for detecting a Beignet installation with CMake yet, but if that exists, that would of course be great.
tools/GPURuntime/GPUJIT.c
519	Totally agreed, that would be the better option. As you pointed out, it is currently not yet possible.

Added comment clarification for workaround.

Removed Regexp hack

PhilippSchaad marked 6 inline comments as done.Jul 11 2017, 4:58 AM

Nice. Can you potentially also add a test case?

CMakeLists.txt
178 ↗	(On Diff #105946)	I don't think we want to add this to cmake at all. At best, we just compile the beignet support in unconditionally, if this is possibile.
lib/CodeGen/PPCGCodeGeneration.cpp
1986	If you store the group names in an array, you can just index this array with "i".
2153	Maybe call this insertCUDAKernelIntrinsics and insertSPIRKernelIntrincics for consistency?
tools/GPURuntime/GPUJIT.c
223	OK. Can you then unconditionally try to load beignet/libcl.so into another variable, e.g. HandleOpenCLBeignet? I assume this handle should be nullptr if the dlopen fails, right? Can you then check if this handle is zero and use this to decide if you want to call clCreateProgramWithLLVMIntel? This should allow us to get rid of all the conditional compilation.

grosser requested changes to this revision.Jul 11 2017, 6:26 AM

This revision now requires changes to proceed.Jul 11 2017, 6:26 AM

grosser added inline comments.Jul 11 2017, 6:29 AM

tools/GPURuntime/GPUJIT.c
519	Another change. Could you possibly remove the conditional compilation here and just check if clCreateProgramWithLLVMIntelFcnPtr != nullptr?

Simplification
Removed compile-conditional CMake dependency

I am not yet too familiar with how to implement such a case and what exactly to check for, but if that is desired, I can look into it.

This should now address all concerns, minus the file-workaround.

lib/CodeGen/PPCGCodeGeneration.cpp
2153	The reason I kept this separated the way it is now, without specifying "CUDA" in the earlier version, is that SPIR should be the only example. When AMD support follows, or anything where there's a registered LLVM target with corresponding intrinsics (AMDGPU in that case), the original function could be used with just an additional switch case, changing the intrinsics IDs. Do you agree with that?

PhilippSchaad added inline comments.Jul 11 2017, 7:54 AM

lib/CodeGen/PPCGCodeGeneration.cpp
2153	And with 'example' I mean 'exception', my bad.

Great. Yes, a test case would be great. It is not very complicated. You already updated test cases in your last patch. I would suggest you copy one of the simple double nested loop kernels from the PTX test and just check that the right SPIR intrinsics are generated! That should be enough.

Otherwise, I think this is good to go. Very nice work!

lib/CodeGen/PPCGCodeGeneration.cpp
2153	Sure. Still, the function names could be more similar. What about insertKernelCallsSPIR?

This revision now requires changes to proceed.Jul 11 2017, 8:47 AM

Added testcase for SPIR
Fixed naming inconsistencies

LGTM from my side.

This revision is now accepted and ready to land.Jul 14 2017, 11:55 AM

@singam-sanjay can you have a final look and commit it if you are OK with this patch.

Sure. Will do in sometime.

@grosser Thanks for adding me as a reviewer ! It's helped me acquaint myself with the SPIR V codegen and understand GPUJIT better.

@PhilippSchaad I've made suggestions to restructure code, a possibly unintentional debug print statement and added some questions about the SPIR. Do let me know if (and why) some of my suggestions are unnecessary.

General Question: I was following a discussion about integrating the SPIR-V backend as a part of LLVM, i.e. moving it from /tools to lib/Target. If that were to happen, would we be able to generate and feed SPIR IR in the same way as NVPTX ?

lib/CodeGen/PPCGCodeGeneration.cpp
1831	@PhilippSchaad I'm not yet acquainted with the SPIR annotations. Could you please explain why it matters to let the SPIR backend know whether a kernel argument is a read only ?
1833	Why are you setting these metadata to empty strings ? Is this a feature yet to be implemented ?
2179	nitpick: Do we need this ?
2205	nitpick: Do we need this ?
2255	Would it be better to move the SPIR specific code into createKernelASM, as well ? You might not need the switch cases to handle buggy control flow like, case GPUArch::SPIR64: case GPUArch::SPIR32: llvm_unreachable("Cannot generate ASM for SPIR architecture");
tools/GPURuntime/GPUJIT.c
148	This is new to me. Why are you providing runtime support for Intel to JIT SPIR code ? Is it to run with the integrated GPU on an Intel CPU? Guessing from https://github.com/intel/beignet.
249	`void *Handle = ( HandleOpenCLBeignet ? HandleOpenCLBeignet : HandleOpenCL );` maybe ? Let me know if this isn't recommended.
251	Is it that we prefer to use libraries provided by Intel's SDK and inegrated GPU over other OpenCL drivers and devices ? I'm assuming that this leads the runtime to use the integrated GPU even when a more powerful Radeon card is present. Please correct me if I'm wrong.
510	Is this file opened inside <llvm_build>/bin or $PWD ? If it's $PWD, we should to take precautions to not open a file that the user needs. You could instead open a file in /tmp or %TEMP% in Windows, like "/tmp/R@ηdηдm3.ll". Anyways, you could implement this later.
516	Was this supposed to a message letting the user know that the runtime's not using the "beignet" OpenCL implementation ?
519	@grosser What is your rationale to avoid conditional compilation ?

Will address the points mentioned asap. As for your SPIR-V question: The goal, should that back-end be added, would be to add SPIR-V compilation as an additional GPU target option (In the future). This would allow kernel execution on any SPIR-V supporting target device. This SPIR solution is essentially a 'quickfix' to get it to work on Intel, as long as SPIR-V is not an option.

lib/CodeGen/PPCGCodeGeneration.cpp
1831	The issue here is the Intel Beignet driver. Beignet expects a very specific set of kernel annotations when parsing the Kernel IR. The metadata is simply required to be there, but for the kernels we generate here, it does not matter what is actually contained in there. This is entirely Beignet dependent, maybe Intel will adapt that in the future.
1833	Dito above.
2179	We can change it to an if-clause if that is desired. The compiler just complains when we have a switch and do not check every possible value of `Arch`. An alternative would be a default clause handling that `llvm_unreachable`.
2205	Dito above.
2255	Will do, good point. Originally moved it out because we thought we had to do a lot more hacking on the resulting IR.
tools/GPURuntime/GPUJIT.c
148	Your guess is exactly correct. This also corresponds to some of the points mentioned above.
249	Definitely the prettier solution. Will change.
251	This is actually a good point. We don't want to prefer the Intel provided SDK, the choice of selecting Intel's Beignet should be dependent on what -polly-gpu-arch value has been selected. AMD is in the pipeline.
510	I will add this later, but yes it is $PWD. The solution with using /tmp and %TEMP% seem reasonable.
516	Oops, thanks for pointing this out, this was unintentionally left in and should never be reached! Used it for debugging. Removing it.

singam-sanjay added inline comments.Jul 17 2017, 12:56 AM

lib/CodeGen/PPCGCodeGeneration.cpp
1831	Ohk. Thanks for the info !
1833	Alright. I guess beignet isn't worried about the names of kernel arguments in "kernel_arg_name".
2179	I was talking about the `break;` after `llvm_unreachable`. If `llvm_unreachable` would call `abort`, the `break` would be redundant.
2205	Ditto above ;-).
2255	Alright.
tools/GPURuntime/GPUJIT.c
148	Ohk.
249	`void *Handle = ( HandleOpenCLBeignet!=NULL ? HandleOpenCLBeignet : HandleOpenCL );` would be better.
251	Alright. Any ideas on how you'd differentiate between the integrated and dedicated GPU at command line ?
510	Alright.
516	👍

Removed left over debug print, moved SPIR creation into createASM, fixed minor issues addressed in comments.

PhilippSchaad marked 24 inline comments as done.Jul 17 2017, 5:25 PM

PhilippSchaad added inline comments.

tools/GPURuntime/GPUJIT.c
251	The goal is to run on AMD GPUs with the AMDGPU backend, which would mean the differentiation would be done with -polly-gpu-arch=spir32/amdgcn32 for example. I am not yet sure how to transport that choice to here, but I will look into that as I am working on the AMD side of things. Thanks for pointing out this flaw.

Ping do the latest changes address your concerns @singam-sanjay ? Can I land this?

@PhilippSchaad Please go ahead.

Rebase for commit

Harbormaster completed remote builds in B8468: Diff 107678.Jul 21 2017, 8:52 AM

Closed by commit rL308751: [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for… (authored by phschaad). · Explain WhyJul 21 2017, 9:11 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

polly/

CodeGen/

PPCGCodeGeneration.h

2 lines

lib/

CodeGen/

PPCGCodeGeneration.cpp

169 lines

Support/

RegisterPasses.cpp

6 lines

test/

GPGPU/

spir-codegen.ll

118 lines

tools/

GPURuntime/

GPUJIT.c

101 lines

Diff 107678

include/polly/CodeGen/PPCGCodeGeneration.h

	Show All 10 Lines
	// GPU mapping strategy.			// GPU mapping strategy.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef POLLY_PPCGCODEGENERATION_H			#ifndef POLLY_PPCGCODEGENERATION_H
	#define POLLY_PPCGCODEGENERATION_H			#define POLLY_PPCGCODEGENERATION_H

	/// The GPU Architecture to target.			/// The GPU Architecture to target.
	enum GPUArch { NVPTX64 };			enum GPUArch { NVPTX64, SPIR32, SPIR64 };

	/// The GPU Runtime implementation to use.			/// The GPU Runtime implementation to use.
	enum GPURuntime { CUDA, OpenCL };			enum GPURuntime { CUDA, OpenCL };

	#endif // POLLY_PPCGCODEGENERATION_H			#endif // POLLY_PPCGCODEGENERATION_H

lib/CodeGen/PPCGCodeGeneration.cpp

Show All 31 Lines
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"		#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"

#include "isl/union_map.h"		#include "isl/union_map.h"
		grosserUnsubmitted Done Reply Inline Actions After having removed the hack below, this should hopefully not be needed any more. grosser: After having removed the hack below, this should hopefully not be needed any more.
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions That is correct, yes. This should only be needed as long as we have to manually change the string occurrences for barriers and global/local IDs. PhilippSchaad: That is correct, yes. This should only be needed as long as we have to manually change the…

extern "C" {		extern "C" {
#include "ppcg/cuda.h"		#include "ppcg/cuda.h"
#include "ppcg/gpu.h"		#include "ppcg/gpu.h"
#include "ppcg/gpu_print.h"		#include "ppcg/gpu_print.h"
#include "ppcg/ppcg.h"		#include "ppcg/ppcg.h"
#include "ppcg/schedule.h"		#include "ppcg/schedule.h"
}		}
▲ Show 20 Lines • Show All 491 Lines • ▼ Show 20 Lines	private:
Function createKernelFunctionDecl(ppcg_kernel Kernel,		Function createKernelFunctionDecl(ppcg_kernel Kernel,
SetVector<Value *> &SubtreeValues);		SetVector<Value *> &SubtreeValues);

/// Insert intrinsic functions to obtain thread and block ids.		/// Insert intrinsic functions to obtain thread and block ids.
///		///
/// @param The kernel to generate the intrinsic functions for.		/// @param The kernel to generate the intrinsic functions for.
void insertKernelIntrinsics(ppcg_kernel *Kernel);		void insertKernelIntrinsics(ppcg_kernel *Kernel);

		/// Insert function calls to retrieve the SPIR group/local ids.
		///
		/// @param The kernel to generate the function calls for.
		void insertKernelCallsSPIR(ppcg_kernel *Kernel);

/// Setup the creation of functions referenced by the GPU kernel.		/// Setup the creation of functions referenced by the GPU kernel.
///		///
/// 1. Create new function declarations in GPUModule which are the same as		/// 1. Create new function declarations in GPUModule which are the same as
/// SubtreeFunctions.		/// SubtreeFunctions.
///		///
/// 2. Populate IslNodeBuilder::ValueMap with mappings from		/// 2. Populate IslNodeBuilder::ValueMap with mappings from
/// old functions (that come from the original module) to new functions		/// old functions (that come from the original module) to new functions
/// (that are created within GPUModule). That way, we generate references		/// (that are created within GPUModule). That way, we generate references
▲ Show 20 Lines • Show All 693 Lines • ▼ Show 20 Lines	void GPUNodeBuilder::createScopStmt(isl_ast_expr *Expr,
if (Stmt->isBlockStmt())		if (Stmt->isBlockStmt())
BlockGen.copyStmt(*Stmt, LTS, Indexes);		BlockGen.copyStmt(*Stmt, LTS, Indexes);
else		else
RegionGen.copyStmt(*Stmt, LTS, Indexes);		RegionGen.copyStmt(*Stmt, LTS, Indexes);
}		}

void GPUNodeBuilder::createKernelSync() {		void GPUNodeBuilder::createKernelSync() {
Module *M = Builder.GetInsertBlock()->getParent()->getParent();		Module *M = Builder.GetInsertBlock()->getParent()->getParent();
		const char *SpirName = "__gen_ocl_barrier_global";

Function *Sync;		Function *Sync;

switch (Arch) {		switch (Arch) {
		case GPUArch::SPIR64:
		case GPUArch::SPIR32:
		Sync = M->getFunction(SpirName);

		// If Sync is not available, declare it.
		if (!Sync) {
		GlobalValue::LinkageTypes Linkage = Function::ExternalLinkage;
		std::vector<Type *> Args;
		FunctionType *Ty = FunctionType::get(Builder.getVoidTy(), Args, false);
		Sync = Function::Create(Ty, Linkage, SpirName, M);
		Sync->setCallingConv(CallingConv::SPIR_FUNC);
		}
		break;
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
Sync = Intrinsic::getDeclaration(M, Intrinsic::nvvm_barrier0);		Sync = Intrinsic::getDeclaration(M, Intrinsic::nvvm_barrier0);
		grosserUnsubmitted Done Reply Inline Actions Why do you emit NVIDIA intrinsics for SPIR code? Can you not emit an opencl barrier? If you don't know what intrinsic corresponds to this, create a simple .cl file, compile it with clang to SPIR and see which intrinsic gets emitted. grosser: Why do you emit NVIDIA intrinsics for SPIR code? Can you not emit an opencl barrier? If you…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions The generated intrinsics are the ones declared in the SPIR reference, so for example @_Z13get_global_idj(i32 0) . I cannot find any corresponding intrinsics in LLVM's TableGen/Intrinsics, did I overlook some? This is essentially the only real point where the hack is needed. If we can insert correct intrinsics, we can ditch the regex. Because fixing this is all the regex does. PhilippSchaad: The generated intrinsics are the ones declared in the SPIR reference, so for example…
		grosserUnsubmitted Done Reply Inline Actions No, I think they may not exist. However, you can just create declarations for such functions using code as in "GPUNodeBuilder::createCallGetKernel(". grosser: No, I think they may not exist. However, you can just create declarations for such functions…
break;		break;
}		}

Builder.CreateCall(Sync, {});		Builder.CreateCall(Sync, {});
}		}

/// Collect llvm::Values referenced from @p Node		/// Collect llvm::Values referenced from @p Node
///		///
▲ Show 20 Lines • Show All 392 Lines • ▼ Show 20 Lines	void GPUNodeBuilder::createKernel(__isl_take isl_ast_node *KernelStmt) {

createKernelFunction(Kernel, SubtreeValues, SubtreeFunctions);		createKernelFunction(Kernel, SubtreeValues, SubtreeFunctions);
setupKernelSubtreeFunctions(SubtreeFunctions);		setupKernelSubtreeFunctions(SubtreeFunctions);

create(isl_ast_node_copy(Kernel->tree));		create(isl_ast_node_copy(Kernel->tree));

finalizeKernelArguments(Kernel);		finalizeKernelArguments(Kernel);
Function *F = Builder.GetInsertBlock()->getParent();		Function *F = Builder.GetInsertBlock()->getParent();
		if (Arch == GPUArch::NVPTX64)
addCUDAAnnotations(F->getParent(), BlockDimX, BlockDimY, BlockDimZ);		addCUDAAnnotations(F->getParent(), BlockDimX, BlockDimY, BlockDimZ);
clearDominators(F);		clearDominators(F);
clearScalarEvolution(F);		clearScalarEvolution(F);
clearLoops(F);		clearLoops(F);

IDToValue = HostIDs;		IDToValue = HostIDs;

ValueMap = std::move(HostValueMap);		ValueMap = std::move(HostValueMap);
ScalarMap = std::move(HostScalarMap);		ScalarMap = std::move(HostScalarMap);
Show All 40 Lines	if (!is64Bit) {
Ret += "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:"		Ret += "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:"
"64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:"		"64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:"
"64-v128:128:128-n16:32:64";		"64-v128:128:128-n16:32:64";
}		}

return Ret;		return Ret;
}		}

		/// Compute the DataLayout string for a SPIR kernel.
		///
		/// @param is64Bit Are we looking for a 64 bit architecture?
		static std::string computeSPIRDataLayout(bool is64Bit) {
		std::string Ret = "";

		if (!is64Bit) {
		Ret += "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:"
		"64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:"
		"32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:"
		"256:256-v256:256:256-v512:512:512-v1024:1024:1024";
		} else {
		Ret += "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:"
		"64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:"
		"32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:"
		"256:256-v256:256:256-v512:512:512-v1024:1024:1024";
		}

		return Ret;
		}

Function *		Function *
GPUNodeBuilder::createKernelFunctionDecl(ppcg_kernel *Kernel,		GPUNodeBuilder::createKernelFunctionDecl(ppcg_kernel *Kernel,
SetVector<Value *> &SubtreeValues) {		SetVector<Value *> &SubtreeValues) {
std::vector<Type *> Args;		std::vector<Type *> Args;
std::string Identifier = getKernelFuncName(Kernel->id);		std::string Identifier = getKernelFuncName(Kernel->id);

		std::vector<Metadata *> MemoryType;

for (long i = 0; i < Prog->n_array; i++) {		for (long i = 0; i < Prog->n_array; i++) {
if (!ppcg_kernel_requires_array_argument(Kernel, i))		if (!ppcg_kernel_requires_array_argument(Kernel, i))
continue;		continue;

if (gpu_array_is_read_only_scalar(&Prog->array[i])) {		if (gpu_array_is_read_only_scalar(&Prog->array[i])) {
isl_id *Id = isl_space_get_tuple_id(Prog->array[i].space, isl_dim_set);		isl_id *Id = isl_space_get_tuple_id(Prog->array[i].space, isl_dim_set);
const ScopArrayInfo *SAI = ScopArrayInfo::getFromId(Id);		const ScopArrayInfo *SAI = ScopArrayInfo::getFromId(Id);
Args.push_back(SAI->getElementType());		Args.push_back(SAI->getElementType());
		MemoryType.push_back(
		ConstantAsMetadata::get(ConstantInt::get(Builder.getInt32Ty(), 0)));
} else {		} else {
static const int UseGlobalMemory = 1;		static const int UseGlobalMemory = 1;
Args.push_back(Builder.getInt8PtrTy(UseGlobalMemory));		Args.push_back(Builder.getInt8PtrTy(UseGlobalMemory));
		MemoryType.push_back(
		ConstantAsMetadata::get(ConstantInt::get(Builder.getInt32Ty(), 1)));
}		}
}		}

int NumHostIters = isl_space_dim(Kernel->space, isl_dim_set);		int NumHostIters = isl_space_dim(Kernel->space, isl_dim_set);

for (long i = 0; i < NumHostIters; i++)		for (long i = 0; i < NumHostIters; i++) {
Args.push_back(Builder.getInt64Ty());		Args.push_back(Builder.getInt64Ty());
		MemoryType.push_back(
		ConstantAsMetadata::get(ConstantInt::get(Builder.getInt32Ty(), 0)));
		}

int NumVars = isl_space_dim(Kernel->space, isl_dim_param);		int NumVars = isl_space_dim(Kernel->space, isl_dim_param);

for (long i = 0; i < NumVars; i++) {		for (long i = 0; i < NumVars; i++) {
isl_id *Id = isl_space_get_dim_id(Kernel->space, isl_dim_param, i);		isl_id *Id = isl_space_get_dim_id(Kernel->space, isl_dim_param, i);
Value *Val = IDToValue[Id];		Value *Val = IDToValue[Id];
isl_id_free(Id);		isl_id_free(Id);
Args.push_back(Val->getType());		Args.push_back(Val->getType());
		MemoryType.push_back(
		ConstantAsMetadata::get(ConstantInt::get(Builder.getInt32Ty(), 0)));
}		}

for (auto *V : SubtreeValues)		for (auto *V : SubtreeValues) {
Args.push_back(V->getType());		Args.push_back(V->getType());
		MemoryType.push_back(
		ConstantAsMetadata::get(ConstantInt::get(Builder.getInt32Ty(), 0)));
		}

		grosserUnsubmitted Done Reply Inline Actions Instead of spreading the SPIR code all over the place, can we create one function addSPIRMetadata, that takes 'Args' as argument, emits all the metadata needed and also emits the metadata you add a little further below? grosser: Instead of spreading the SPIR code all over the place, can we create one function…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Will look into that. PhilippSchaad: Will look into that.
		grosserUnsubmitted Done Reply Inline Actions I just see this is not only about the llvm::Type, but also carries an integer which is 0 for local/shared memory and 1 otherwise. So moving this into a function does not seem so easy. If you don't find an easy way, just leave it as it is. grosser: I just see this is not only about the llvm::Type, but also carries an integer which is 0 for…
auto *FT = FunctionType::get(Builder.getVoidTy(), Args, false);		auto *FT = FunctionType::get(Builder.getVoidTy(), Args, false);
auto *FN = Function::Create(FT, Function::ExternalLinkage, Identifier,		auto *FN = Function::Create(FT, Function::ExternalLinkage, Identifier,
GPUModule.get());		GPUModule.get());

		std::vector<Metadata *> EmptyStrings;

		for (unsigned int i = 0; i < MemoryType.size(); i++) {
		EmptyStrings.push_back(MDString::get(FN->getContext(), ""));
		}

		if (Arch == GPUArch::SPIR32 \|\| Arch == GPUArch::SPIR64) {
		FN->setMetadata("kernel_arg_addr_space",
		singam-sanjayUnsubmitted Done Reply Inline Actions @PhilippSchaad I'm not yet acquainted with the SPIR annotations. Could you please explain why it matters to let the SPIR backend know whether a kernel argument is a read only ? singam-sanjay: @PhilippSchaad I'm not yet acquainted with the SPIR annotations. Could you please explain why…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions The issue here is the Intel Beignet driver. Beignet expects a very specific set of kernel annotations when parsing the Kernel IR. The metadata is simply required to be there, but for the kernels we generate here, it does not matter what is actually contained in there. This is entirely Beignet dependent, maybe Intel will adapt that in the future. PhilippSchaad: The issue here is the Intel Beignet driver. Beignet expects a very specific set of kernel…
		singam-sanjayUnsubmitted Done Reply Inline Actions Ohk. Thanks for the info ! singam-sanjay: Ohk. Thanks for the info !
		MDNode::get(FN->getContext(), MemoryType));
		FN->setMetadata("kernel_arg_name",
		singam-sanjayUnsubmitted Done Reply Inline Actions Why are you setting these metadata to empty strings ? Is this a feature yet to be implemented ? singam-sanjay: Why are you setting these metadata to empty strings ? Is this a feature yet to be implemented ?
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Dito above. PhilippSchaad: Dito above.
		singam-sanjayUnsubmitted Done Reply Inline Actions Alright. I guess beignet isn't worried about the names of kernel arguments in "kernel_arg_name". singam-sanjay: Alright. I guess beignet isn't worried about the names of kernel arguments in "kernel_arg_name".
		MDNode::get(FN->getContext(), EmptyStrings));
		FN->setMetadata("kernel_arg_access_qual",
		MDNode::get(FN->getContext(), EmptyStrings));
		FN->setMetadata("kernel_arg_type",
		MDNode::get(FN->getContext(), EmptyStrings));
		FN->setMetadata("kernel_arg_type_qual",
		MDNode::get(FN->getContext(), EmptyStrings));
		FN->setMetadata("kernel_arg_base_type",
		MDNode::get(FN->getContext(), EmptyStrings));
		}

switch (Arch) {		switch (Arch) {
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
FN->setCallingConv(CallingConv::PTX_Kernel);		FN->setCallingConv(CallingConv::PTX_Kernel);
break;		break;
		case GPUArch::SPIR32:
		case GPUArch::SPIR64:
		FN->setCallingConv(CallingConv::SPIR_KERNEL);
		break;
}		}

auto Arg = FN->arg_begin();		auto Arg = FN->arg_begin();
for (long i = 0; i < Kernel->n_array; i++) {		for (long i = 0; i < Kernel->n_array; i++) {
if (!ppcg_kernel_requires_array_argument(Kernel, i))		if (!ppcg_kernel_requires_array_argument(Kernel, i))
continue;		continue;

Arg->setName(Kernel->array[i].array->name);		Arg->setName(Kernel->array[i].array->name);
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	GPUNodeBuilder::createKernelFunctionDecl(ppcg_kernel *Kernel,
return FN;		return FN;
}		}

void GPUNodeBuilder::insertKernelIntrinsics(ppcg_kernel *Kernel) {		void GPUNodeBuilder::insertKernelIntrinsics(ppcg_kernel *Kernel) {
Intrinsic::ID IntrinsicsBID[2];		Intrinsic::ID IntrinsicsBID[2];
Intrinsic::ID IntrinsicsTID[3];		Intrinsic::ID IntrinsicsTID[3];

switch (Arch) {		switch (Arch) {
		case GPUArch::SPIR64:
		case GPUArch::SPIR32:
		grosserUnsubmitted Done Reply Inline Actions Why do you emit NVIDIA intrinsics. As discussed above, this there no way to emit the OpenCL intrinsics instead? grosser: Why do you emit NVIDIA intrinsics. As discussed above, this there no way to emit the OpenCL…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Dito above. PhilippSchaad: Dito above.
		llvm_unreachable("Cannot generate NVVM intrinsics for SPIR");
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
IntrinsicsBID[0] = Intrinsic::nvvm_read_ptx_sreg_ctaid_x;		IntrinsicsBID[0] = Intrinsic::nvvm_read_ptx_sreg_ctaid_x;
IntrinsicsBID[1] = Intrinsic::nvvm_read_ptx_sreg_ctaid_y;		IntrinsicsBID[1] = Intrinsic::nvvm_read_ptx_sreg_ctaid_y;

IntrinsicsTID[0] = Intrinsic::nvvm_read_ptx_sreg_tid_x;		IntrinsicsTID[0] = Intrinsic::nvvm_read_ptx_sreg_tid_x;
IntrinsicsTID[1] = Intrinsic::nvvm_read_ptx_sreg_tid_y;		IntrinsicsTID[1] = Intrinsic::nvvm_read_ptx_sreg_tid_y;
IntrinsicsTID[2] = Intrinsic::nvvm_read_ptx_sreg_tid_z;		IntrinsicsTID[2] = Intrinsic::nvvm_read_ptx_sreg_tid_z;
break;		break;
Show All 15 Lines	void GPUNodeBuilder::insertKernelIntrinsics(ppcg_kernel *Kernel) {
}		}

for (int i = 0; i < Kernel->n_block; ++i) {		for (int i = 0; i < Kernel->n_block; ++i) {
isl_id *Id = isl_id_list_get_id(Kernel->thread_ids, i);		isl_id *Id = isl_id_list_get_id(Kernel->thread_ids, i);
addId(Id, IntrinsicsTID[i]);		addId(Id, IntrinsicsTID[i]);
}		}
}		}

		void GPUNodeBuilder::insertKernelCallsSPIR(ppcg_kernel *Kernel) {
		const char *GroupName[3] = {"__gen_ocl_get_group_id0",
		"__gen_ocl_get_group_id1",
		"__gen_ocl_get_group_id2"};

		const char *LocalName[3] = {"__gen_ocl_get_local_id0",
		"__gen_ocl_get_local_id1",
		"__gen_ocl_get_local_id2"};

		auto createFunc = [this](const char Name, __isl_take isl_id Id) mutable {
		Module *M = Builder.GetInsertBlock()->getParent()->getParent();
		Function *FN = M->getFunction(Name);

		// If FN is not available, declare it.
		if (!FN) {
		GlobalValue::LinkageTypes Linkage = Function::ExternalLinkage;
		std::vector<Type *> Args;
		FunctionType *Ty = FunctionType::get(Builder.getInt32Ty(), Args, false);
		FN = Function::Create(Ty, Linkage, Name, M);
		FN->setCallingConv(CallingConv::SPIR_FUNC);
		}

		Value *Val = Builder.CreateCall(FN, {});
		Val = Builder.CreateIntCast(Val, Builder.getInt64Ty(), false, Name);
		IDToValue[Id] = Val;
		KernelIDs.insert(std::unique_ptr<isl_id, IslIdDeleter>(Id));
		};

		for (int i = 0; i < Kernel->n_grid; ++i)
		createFunc(GroupName[i], isl_id_list_get_id(Kernel->block_ids, i));

		for (int i = 0; i < Kernel->n_block; ++i)
		createFunc(LocalName[i], isl_id_list_get_id(Kernel->thread_ids, i));
		}

		grosserUnsubmitted Done Reply Inline Actions If you store the group names in an array, you can just index this array with "i". grosser: If you store the group names in an array, you can just index this array with "i".
void GPUNodeBuilder::prepareKernelArguments(ppcg_kernel Kernel, Function FN) {		void GPUNodeBuilder::prepareKernelArguments(ppcg_kernel Kernel, Function FN) {
auto Arg = FN->arg_begin();		auto Arg = FN->arg_begin();
for (long i = 0; i < Kernel->n_array; i++) {		for (long i = 0; i < Kernel->n_array; i++) {
if (!ppcg_kernel_requires_array_argument(Kernel, i))		if (!ppcg_kernel_requires_array_argument(Kernel, i))
continue;		continue;

isl_id *Id = isl_space_get_tuple_id(Prog->array[i].space, isl_dim_set);		isl_id *Id = isl_space_get_tuple_id(Prog->array[i].space, isl_dim_set);
const ScopArrayInfo *SAI = ScopArrayInfo::getFromId(isl_id_copy(Id));		const ScopArrayInfo *SAI = ScopArrayInfo::getFromId(isl_id_copy(Id));
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	void GPUNodeBuilder::createKernelFunction(
switch (Arch) {		switch (Arch) {
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
if (Runtime == GPURuntime::CUDA)		if (Runtime == GPURuntime::CUDA)
GPUModule->setTargetTriple(Triple::normalize("nvptx64-nvidia-cuda"));		GPUModule->setTargetTriple(Triple::normalize("nvptx64-nvidia-cuda"));
else if (Runtime == GPURuntime::OpenCL)		else if (Runtime == GPURuntime::OpenCL)
GPUModule->setTargetTriple(Triple::normalize("nvptx64-nvidia-nvcl"));		GPUModule->setTargetTriple(Triple::normalize("nvptx64-nvidia-nvcl"));
GPUModule->setDataLayout(computeNVPTXDataLayout(true /* is64Bit */));		GPUModule->setDataLayout(computeNVPTXDataLayout(true /* is64Bit */));
break;		break;
		case GPUArch::SPIR32:
		GPUModule->setTargetTriple(Triple::normalize("spir-unknown-unknown"));
		GPUModule->setDataLayout(computeSPIRDataLayout(false /* is64Bit */));
		break;
		case GPUArch::SPIR64:
		GPUModule->setTargetTriple(Triple::normalize("spir64-unknown-unknown"));
		GPUModule->setDataLayout(computeSPIRDataLayout(true /* is64Bit */));
		break;
}		}

Function *FN = createKernelFunctionDecl(Kernel, SubtreeValues);		Function *FN = createKernelFunctionDecl(Kernel, SubtreeValues);

BasicBlock *PrevBlock = Builder.GetInsertBlock();		BasicBlock *PrevBlock = Builder.GetInsertBlock();
auto EntryBlock = BasicBlock::Create(Builder.getContext(), "entry", FN);		auto EntryBlock = BasicBlock::Create(Builder.getContext(), "entry", FN);

DT.addNewBlock(EntryBlock, PrevBlock);		DT.addNewBlock(EntryBlock, PrevBlock);

Builder.SetInsertPoint(EntryBlock);		Builder.SetInsertPoint(EntryBlock);
Builder.CreateRetVoid();		Builder.CreateRetVoid();
Builder.SetInsertPoint(EntryBlock, EntryBlock->begin());		Builder.SetInsertPoint(EntryBlock, EntryBlock->begin());

ScopDetection::markFunctionAsInvalid(FN);		ScopDetection::markFunctionAsInvalid(FN);

prepareKernelArguments(Kernel, FN);		prepareKernelArguments(Kernel, FN);
createKernelVariables(Kernel, FN);		createKernelVariables(Kernel, FN);

		switch (Arch) {
		case GPUArch::NVPTX64:
insertKernelIntrinsics(Kernel);		insertKernelIntrinsics(Kernel);
		grosserUnsubmitted Done Reply Inline Actions Maybe call this insertCUDAKernelIntrinsics and insertSPIRKernelIntrincics for consistency? grosser: Maybe call this insertCUDAKernelIntrinsics and insertSPIRKernelIntrincics for consistency?
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions The reason I kept this separated the way it is now, without specifying "CUDA" in the earlier version, is that SPIR should be the only example. When AMD support follows, or anything where there's a registered LLVM target with corresponding intrinsics (AMDGPU in that case), the original function could be used with just an additional switch case, changing the intrinsics IDs. Do you agree with that? PhilippSchaad: The reason I kept this separated the way it is now, without specifying "CUDA" in the earlier…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions And with 'example' I mean 'exception', my bad. PhilippSchaad: And with 'example' I mean 'exception', my bad.
		grosserUnsubmitted Done Reply Inline Actions Sure. Still, the function names could be more similar. What about insertKernelCallsSPIR? grosser: Sure. Still, the function names could be more similar. What about insertKernelCallsSPIR?
		break;
		case GPUArch::SPIR32:
		case GPUArch::SPIR64:
		insertKernelCallsSPIR(Kernel);
		break;
		}
}		}

std::string GPUNodeBuilder::createKernelASM() {		std::string GPUNodeBuilder::createKernelASM() {
llvm::Triple GPUTriple;		llvm::Triple GPUTriple;

switch (Arch) {		switch (Arch) {
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
switch (Runtime) {		switch (Runtime) {
case GPURuntime::CUDA:		case GPURuntime::CUDA:
GPUTriple = llvm::Triple(Triple::normalize("nvptx64-nvidia-cuda"));		GPUTriple = llvm::Triple(Triple::normalize("nvptx64-nvidia-cuda"));
break;		break;
case GPURuntime::OpenCL:		case GPURuntime::OpenCL:
GPUTriple = llvm::Triple(Triple::normalize("nvptx64-nvidia-nvcl"));		GPUTriple = llvm::Triple(Triple::normalize("nvptx64-nvidia-nvcl"));
break;		break;
}		}
break;		break;
		case GPUArch::SPIR64:
		case GPUArch::SPIR32:
		std::string SPIRAssembly;
		raw_string_ostream IROstream(SPIRAssembly);
		singam-sanjayUnsubmitted Done Reply Inline Actions nitpick: Do we need this ? singam-sanjay: nitpick: Do we need this ?
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions We can change it to an if-clause if that is desired. The compiler just complains when we have a switch and do not check every possible value of `Arch`. An alternative would be a default clause handling that `llvm_unreachable`. PhilippSchaad: We can change it to an if-clause if that is desired. The compiler just complains when we have a…
		singam-sanjayUnsubmitted Done Reply Inline Actions I was talking about the `break;` after `llvm_unreachable`. If `llvm_unreachable` would call `abort`, the `break` would be redundant. singam-sanjay: I was talking about the `break;` after `llvm_unreachable`. If `llvm_unreachable` would call…
		IROstream << *GPUModule;
		IROstream.flush();
		return SPIRAssembly;
}		}

std::string ErrMsg;		std::string ErrMsg;
auto GPUTarget = TargetRegistry::lookupTarget(GPUTriple.getTriple(), ErrMsg);		auto GPUTarget = TargetRegistry::lookupTarget(GPUTriple.getTriple(), ErrMsg);

if (!GPUTarget) {		if (!GPUTarget) {
errs() << ErrMsg << "\n";		errs() << ErrMsg << "\n";
return "";		return "";
}		}

TargetOptions Options;		TargetOptions Options;
Options.UnsafeFPMath = FastMath;		Options.UnsafeFPMath = FastMath;

std::string subtarget;		std::string subtarget;

switch (Arch) {		switch (Arch) {
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
subtarget = CudaVersion;		subtarget = CudaVersion;
break;		break;
		case GPUArch::SPIR32:
		case GPUArch::SPIR64:
		llvm_unreachable("No subtarget for SPIR architecture");
}		}
		singam-sanjayUnsubmitted Done Reply Inline Actions nitpick: Do we need this ? singam-sanjay: nitpick: Do we need this ?
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Dito above. PhilippSchaad: Dito above.
		singam-sanjayUnsubmitted Done Reply Inline Actions Ditto above ;-). singam-sanjay: Ditto above ;-).

std::unique_ptr<TargetMachine> TargetM(GPUTarget->createTargetMachine(		std::unique_ptr<TargetMachine> TargetM(GPUTarget->createTargetMachine(
GPUTriple.getTriple(), subtarget, "", Options, Optional<Reloc::Model>()));		GPUTriple.getTriple(), subtarget, "", Options, Optional<Reloc::Model>()));

SmallString<0> ASMString;		SmallString<0> ASMString;
raw_svector_ostream ASMStream(ASMString);		raw_svector_ostream ASMStream(ASMString);
llvm::legacy::PassManager PM;		llvm::legacy::PassManager PM;

Show All 23 Lines	if (verifyModule(*GPUModule)) {

BuildSuccessful = false;		BuildSuccessful = false;
return "";		return "";
}		}

if (DumpKernelIR)		if (DumpKernelIR)
outs() << *GPUModule << "\n";		outs() << *GPUModule << "\n";

		if (Arch != GPUArch::SPIR32 && Arch != GPUArch::SPIR64) {
// Optimize module.		// Optimize module.
llvm::legacy::PassManager OptPasses;		llvm::legacy::PassManager OptPasses;
PassManagerBuilder PassBuilder;		PassManagerBuilder PassBuilder;
PassBuilder.OptLevel = 3;		PassBuilder.OptLevel = 3;
PassBuilder.SizeLevel = 0;		PassBuilder.SizeLevel = 0;
PassBuilder.populateModulePassManager(OptPasses);		PassBuilder.populateModulePassManager(OptPasses);
OptPasses.run(*GPUModule);		OptPasses.run(*GPUModule);
		}
		grosserUnsubmitted Not Done Reply Inline Actions Interesting. So the optimization break stuff. Would be great to understand what exactly we brake! grosser: Interesting. So the optimization break stuff. Would be great to understand what exactly we…
		PhilippSchaadAuthorUnsubmitted Not Done Reply Inline Actions Looking into that. What is clear, is that the optimized version skips a few 'dummy' computations, because the optimized kernels expect a smaller workgroup size in OpenCL. Not sure if and how we could account for that. It seems very tricky to me atm, have played around with it quite a bit. PhilippSchaad: Looking into that. What is clear, is that the optimized version skips a few 'dummy'…
		grosserUnsubmitted Not Done Reply Inline Actions It's OK. Let's get the remainder of the patch updated, then we can look at this issue later on. grosser: It's OK. Let's get the remainder of the patch updated, then we can look at this issue later on.

std::string Assembly = createKernelASM();		std::string Assembly = createKernelASM();
		singam-sanjayUnsubmitted Done Reply Inline Actions Would it be better to move the SPIR specific code into createKernelASM, as well ? You might not need the switch cases to handle buggy control flow like, case GPUArch::SPIR64: case GPUArch::SPIR32: llvm_unreachable("Cannot generate ASM for SPIR architecture"); singam-sanjay: Would it be better to move the SPIR specific code into createKernelASM, as well ? You might not…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Will do, good point. Originally moved it out because we thought we had to do a lot more hacking on the resulting IR. PhilippSchaad: Will do, good point. Originally moved it out because we thought we had to do a lot more hacking…
		singam-sanjayUnsubmitted Done Reply Inline Actions Alright. singam-sanjay: Alright.

if (DumpKernelASM)		if (DumpKernelASM)
outs() << Assembly << "\n";		outs() << Assembly << "\n";

GPUModule.release();		GPUModule.release();
KernelIDs.clear();		KernelIDs.clear();

return Assembly;		return Assembly;
}		}

		grosserUnsubmitted Done Reply Inline Actions It seems all these regexp hacks are needed because we don't generate the right intrinsics. What is holding us back from generating them directly? grosser: It seems all these regexp hacks are needed because we don't generate the right intrinsics. What…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Dito above. PhilippSchaad: Dito above.
namespace {		namespace {
class PPCGCodeGeneration : public ScopPass {		class PPCGCodeGeneration : public ScopPass {
public:		public:
static char ID;		static char ID;

GPURuntime Runtime = GPURuntime::CUDA;		GPURuntime Runtime = GPURuntime::CUDA;

GPUArch Architecture = GPUArch::NVPTX64;		GPUArch Architecture = GPUArch::NVPTX64;
▲ Show 20 Lines • Show All 1,005 Lines • Show Last 20 Lines

lib/Support/RegisterPasses.cpp

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	cl::values(clEnumValN(GPURuntime::CUDA, "libcudart",
"use the CUDA Runtime API"),		"use the CUDA Runtime API"),
clEnumValN(GPURuntime::OpenCL, "libopencl",		clEnumValN(GPURuntime::OpenCL, "libopencl",
"use the OpenCL Runtime API")),		"use the OpenCL Runtime API")),
cl::init(GPURuntime::CUDA), cl::ZeroOrMore, cl::cat(PollyCategory));		cl::init(GPURuntime::CUDA), cl::ZeroOrMore, cl::cat(PollyCategory));

static cl::opt<GPUArch>		static cl::opt<GPUArch>
GPUArchChoice("polly-gpu-arch", cl::desc("The GPU Architecture to target"),		GPUArchChoice("polly-gpu-arch", cl::desc("The GPU Architecture to target"),
cl::values(clEnumValN(GPUArch::NVPTX64, "nvptx64",		cl::values(clEnumValN(GPUArch::NVPTX64, "nvptx64",
"target NVIDIA 64-bit architecture")),		"target NVIDIA 64-bit architecture"),
		clEnumValN(GPUArch::SPIR32, "spir32",
		"target SPIR 32-bit architecture"),
		clEnumValN(GPUArch::SPIR64, "spir64",
		"target SPIR 64-bit architecture")),
cl::init(GPUArch::NVPTX64), cl::ZeroOrMore,		cl::init(GPUArch::NVPTX64), cl::ZeroOrMore,
cl::cat(PollyCategory));		cl::cat(PollyCategory));
#endif		#endif

VectorizerChoice polly::PollyVectorizerChoice;		VectorizerChoice polly::PollyVectorizerChoice;
static cl::opt<polly::VectorizerChoice, true> Vectorizer(		static cl::opt<polly::VectorizerChoice, true> Vectorizer(
"polly-vectorizer", cl::desc("Select the vectorization strategy"),		"polly-vectorizer", cl::desc("Select the vectorization strategy"),
cl::values(		cl::values(
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

test/GPGPU/spir-codegen.ll

This file was added.

				; RUN: opt -O3 -polly -polly-target=gpu \
				; RUN: -polly-gpu-arch=spir32 \
				; RUN: -polly-acc-dump-kernel-ir -polly-process-unprofitable -disable-output < %s \| \
				; RUN: FileCheck %s

				; REQUIRES: pollyacc

				; CHECK: target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024"
				; CHECK-NEXT: target triple = "spir-unknown-unknown"

				; CHECK-LABEL: define spir_kernel void @FUNC_double_parallel_loop_SCOP_0_KERNEL_0(i8 addrspace(1)* %MemRef0) #0 !kernel_arg_addr_space !0 !kernel_arg_name !1 !kernel_arg_access_qual !1 !kernel_arg_type !1 !kernel_arg_type_qual !1 !kernel_arg_base_type !1 {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %0 = call i32 @__gen_ocl_get_group_id0()
				; CHECK-NEXT: %__gen_ocl_get_group_id0 = zext i32 %0 to i64
				; CHECK-NEXT: %1 = call i32 @__gen_ocl_get_group_id1()
				; CHECK-NEXT: %__gen_ocl_get_group_id1 = zext i32 %1 to i64
				; CHECK-NEXT: %2 = call i32 @__gen_ocl_get_local_id0()
				; CHECK-NEXT: %__gen_ocl_get_local_id0 = zext i32 %2 to i64
				; CHECK-NEXT: %3 = call i32 @__gen_ocl_get_local_id1()
				; CHECK-NEXT: %__gen_ocl_get_local_id1 = zext i32 %3 to i64
				; CHECK-NEXT: br label %polly.loop_preheader

				; CHECK-LABEL: polly.loop_exit: ; preds = %polly.stmt.bb5
				; CHECK-NEXT: ret void

				; CHECK-LABEL: polly.loop_header: ; preds = %polly.stmt.bb5, %polly.loop_preheader
				; CHECK-NEXT: %polly.indvar = phi i64 [ 0, %polly.loop_preheader ], [ %polly.indvar_next, %polly.stmt.bb5 ]
				; CHECK-NEXT: %4 = mul nsw i64 32, %__gen_ocl_get_group_id0
				; CHECK-NEXT: %5 = add nsw i64 %4, %__gen_ocl_get_local_id0
				; CHECK-NEXT: %6 = mul nsw i64 32, %__gen_ocl_get_group_id1
				; CHECK-NEXT: %7 = add nsw i64 %6, %__gen_ocl_get_local_id1
				; CHECK-NEXT: %8 = mul nsw i64 16, %polly.indvar
				; CHECK-NEXT: %9 = add nsw i64 %7, %8
				; CHECK-NEXT: br label %polly.stmt.bb5

				; CHECK-LABEL: polly.stmt.bb5: ; preds = %polly.loop_header
				; CHECK-NEXT: %10 = mul i64 %5, %9
				; CHECK-NEXT: %p_tmp6 = sitofp i64 %10 to float
				; CHECK-NEXT: %polly.access.cast.MemRef0 = bitcast i8 addrspace(1)* %MemRef0 to float addrspace(1)*
				; CHECK-NEXT: %11 = mul nsw i64 32, %__gen_ocl_get_group_id0
				; CHECK-NEXT: %12 = add nsw i64 %11, %__gen_ocl_get_local_id0
				; CHECK-NEXT: %polly.access.mul.MemRef0 = mul nsw i64 %12, 1024
				; CHECK-NEXT: %13 = mul nsw i64 32, %__gen_ocl_get_group_id1
				; CHECK-NEXT: %14 = add nsw i64 %13, %__gen_ocl_get_local_id1
				; CHECK-NEXT: %15 = mul nsw i64 16, %polly.indvar
				; CHECK-NEXT: %16 = add nsw i64 %14, %15
				; CHECK-NEXT: %polly.access.add.MemRef0 = add nsw i64 %polly.access.mul.MemRef0, %16
				; CHECK-NEXT: %polly.access.MemRef0 = getelementptr float, float addrspace(1)* %polly.access.cast.MemRef0, i64 %polly.access.add.MemRef0
				; CHECK-NEXT: %tmp8_p_scalar_ = load float, float addrspace(1)* %polly.access.MemRef0, align 4
				; CHECK-NEXT: %p_tmp9 = fadd float %tmp8_p_scalar_, %p_tmp6
				; CHECK-NEXT: %polly.access.cast.MemRef01 = bitcast i8 addrspace(1)* %MemRef0 to float addrspace(1)*
				; CHECK-NEXT: %17 = mul nsw i64 32, %__gen_ocl_get_group_id0
				; CHECK-NEXT: %18 = add nsw i64 %17, %__gen_ocl_get_local_id0
				; CHECK-NEXT: %polly.access.mul.MemRef02 = mul nsw i64 %18, 1024
				; CHECK-NEXT: %19 = mul nsw i64 32, %__gen_ocl_get_group_id1
				; CHECK-NEXT: %20 = add nsw i64 %19, %__gen_ocl_get_local_id1
				; CHECK-NEXT: %21 = mul nsw i64 16, %polly.indvar
				; CHECK-NEXT: %22 = add nsw i64 %20, %21
				; CHECK-NEXT: %polly.access.add.MemRef03 = add nsw i64 %polly.access.mul.MemRef02, %22
				; CHECK-NEXT: %polly.access.MemRef04 = getelementptr float, float addrspace(1)* %polly.access.cast.MemRef01, i64 %polly.access.add.MemRef03
				; CHECK-NEXT: store float %p_tmp9, float addrspace(1)* %polly.access.MemRef04, align 4
				; CHECK-NEXT: %polly.indvar_next = add nsw i64 %polly.indvar, 1
				; CHECK-NEXT: %polly.loop_cond = icmp sle i64 %polly.indvar_next, 1
				; CHECK-NEXT: br i1 %polly.loop_cond, label %polly.loop_header, label %polly.loop_exit

				; CHECK-LABEL: polly.loop_preheader: ; preds = %entry
				; CHECK-NEXT: br label %polly.loop_header

				; CHECK: attributes #0 = { "polly.skip.fn" }

				; void double_parallel_loop(float A[][1024]) {
				; for (long i = 0; i < 1024; i++)
				; for (long j = 0; j < 1024; j++)
				; A[i][j] += i * j;
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @double_parallel_loop([1024 x float]* %A) {
				bb:
				br label %bb2

				bb2: ; preds = %bb13, %bb
				%i.0 = phi i64 [ 0, %bb ], [ %tmp14, %bb13 ]
				%exitcond1 = icmp ne i64 %i.0, 1024
				br i1 %exitcond1, label %bb3, label %bb15

				bb3: ; preds = %bb2
				br label %bb4

				bb4: ; preds = %bb10, %bb3
				%j.0 = phi i64 [ 0, %bb3 ], [ %tmp11, %bb10 ]
				%exitcond = icmp ne i64 %j.0, 1024
				br i1 %exitcond, label %bb5, label %bb12

				bb5: ; preds = %bb4
				%tmp = mul nuw nsw i64 %i.0, %j.0
				%tmp6 = sitofp i64 %tmp to float
				%tmp7 = getelementptr inbounds [1024 x float], [1024 x float]* %A, i64 %i.0, i64 %j.0
				%tmp8 = load float, float* %tmp7, align 4
				%tmp9 = fadd float %tmp8, %tmp6
				store float %tmp9, float* %tmp7, align 4
				br label %bb10

				bb10: ; preds = %bb5
				%tmp11 = add nuw nsw i64 %j.0, 1
				br label %bb4

				bb12: ; preds = %bb4
				br label %bb13

				bb13: ; preds = %bb12
				%tmp14 = add nuw nsw i64 %i.0, 1
				br label %bb2

				bb15: ; preds = %bb2
				ret void
				}

tools/GPURuntime/GPUJIT.c

Show All 17 Lines
#include <cuda_runtime.h>		#include <cuda_runtime.h>
#endif /* HAS_LIBCUDART */		#endif /* HAS_LIBCUDART */

#ifdef HAS_LIBOPENCL		#ifdef HAS_LIBOPENCL
#ifdef __APPLE__		#ifdef __APPLE__
#include <OpenCL/opencl.h>		#include <OpenCL/opencl.h>
#else		#else
#include <CL/cl.h>		#include <CL/cl.h>
#endif		#endif /* __APPLE__ */
#endif /* HAS_LIBOPENCL */		#endif /* HAS_LIBOPENCL */

#include <dlfcn.h>		#include <dlfcn.h>
#include <stdarg.h>		#include <stdarg.h>
#include <stdio.h>		#include <stdio.h>
#include <string.h>		#include <string.h>
		#include <unistd.h>

static int DebugMode;		static int DebugMode;
static int CacheMode;		static int CacheMode;

static PollyGPURuntime Runtime = RUNTIME_NONE;		static PollyGPURuntime Runtime = RUNTIME_NONE;

static void debug_print(const char *format, ...) {		static void debug_print(const char *format, ...) {
if (!DebugMode)		if (!DebugMode)
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
};		};

struct OpenCLDevicePtrT {		struct OpenCLDevicePtrT {
cl_mem MemObj;		cl_mem MemObj;
};		};

/* Dynamic library handles for the OpenCL runtime library. */		/* Dynamic library handles for the OpenCL runtime library. */
static void *HandleOpenCL;		static void *HandleOpenCL;
		static void *HandleOpenCLBeignet;

/* Type-defines of function pointer to OpenCL Runtime API. */		/* Type-defines of function pointer to OpenCL Runtime API. */
typedef cl_int clGetPlatformIDsFcnTy(cl_uint NumEntries,		typedef cl_int clGetPlatformIDsFcnTy(cl_uint NumEntries,
cl_platform_id *Platforms,		cl_platform_id *Platforms,
cl_uint *NumPlatforms);		cl_uint *NumPlatforms);
static clGetPlatformIDsFcnTy *clGetPlatformIDsFcnPtr;		static clGetPlatformIDsFcnTy *clGetPlatformIDsFcnPtr;

typedef cl_int clGetDeviceIDsFcnTy(cl_platform_id Platform,		typedef cl_int clGetDeviceIDsFcnTy(cl_platform_id Platform,
Show All 34 Lines

typedef cl_int		typedef cl_int
clEnqueueWriteBufferFcnTy(cl_command_queue CommandQueue, cl_mem Buffer,		clEnqueueWriteBufferFcnTy(cl_command_queue CommandQueue, cl_mem Buffer,
cl_bool BlockingWrite, size_t Offset, size_t Size,		cl_bool BlockingWrite, size_t Offset, size_t Size,
const void *Ptr, cl_uint NumEventsInWaitList,		const void *Ptr, cl_uint NumEventsInWaitList,
const cl_event EventWaitList, cl_event Event);		const cl_event EventWaitList, cl_event Event);
static clEnqueueWriteBufferFcnTy *clEnqueueWriteBufferFcnPtr;		static clEnqueueWriteBufferFcnTy *clEnqueueWriteBufferFcnPtr;

		typedef cl_program
		clCreateProgramWithLLVMIntelFcnTy(cl_context Context, cl_uint NumDevices,
		const cl_device_id *DeviceList,
		const char Filename, cl_int ErrcodeRet);
		static clCreateProgramWithLLVMIntelFcnTy *clCreateProgramWithLLVMIntelFcnPtr;
		singam-sanjayUnsubmitted Done Reply Inline Actions This is new to me. Why are you providing runtime support for Intel to JIT SPIR code ? Is it to run with the integrated GPU on an Intel CPU? Guessing from https://github.com/intel/beignet. singam-sanjay: This is new to me. Why are you providing runtime support for Intel to JIT SPIR code ? Is it to…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Your guess is exactly correct. This also corresponds to some of the points mentioned above. PhilippSchaad: Your guess is exactly correct. This also corresponds to some of the points mentioned above.
		singam-sanjayUnsubmitted Done Reply Inline Actions Ohk. singam-sanjay: Ohk.

typedef cl_program clCreateProgramWithBinaryFcnTy(		typedef cl_program clCreateProgramWithBinaryFcnTy(
cl_context Context, cl_uint NumDevices, const cl_device_id *DeviceList,		cl_context Context, cl_uint NumDevices, const cl_device_id *DeviceList,
const size_t Lengths, const unsigned char Binaries, cl_int BinaryStatus,		const size_t Lengths, const unsigned char Binaries, cl_int BinaryStatus,
cl_int *ErrcodeRet);		cl_int *ErrcodeRet);
static clCreateProgramWithBinaryFcnTy *clCreateProgramWithBinaryFcnPtr;		static clCreateProgramWithBinaryFcnTy *clCreateProgramWithBinaryFcnPtr;

		grosserUnsubmitted Done Reply Inline Actions Why do these declarations need to be optional? I think they should compile easily in any situation, no? grosser: Why do these declarations need to be optional? I think they should compile easily in any…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Oh yeah, my bad. Only the LLVMIntel one has to be conditional. PhilippSchaad: Oh yeah, my bad. Only the LLVMIntel one has to be conditional.
typedef cl_int clBuildProgramFcnTy(		typedef cl_int clBuildProgramFcnTy(
cl_program Program, cl_uint NumDevices, const cl_device_id *DeviceList,		cl_program Program, cl_uint NumDevices, const cl_device_id *DeviceList,
const char *Options,		const char *Options,
void(CL_CALLBACK pfn_notify)(cl_program Program, void UserData),		void(CL_CALLBACK pfn_notify)(cl_program Program, void UserData),
void *UserData);		void *UserData);
static clBuildProgramFcnTy *clBuildProgramFcnPtr;		static clBuildProgramFcnTy *clBuildProgramFcnPtr;

typedef cl_kernel clCreateKernelFcnTy(cl_program Program,		typedef cl_kernel clCreateKernelFcnTy(cl_program Program,
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	static void getAPIHandleCL(void Handle, const char *FuncName) {
if ((Err = dlerror()) != 0) {		if ((Err = dlerror()) != 0) {
fprintf(stderr, "Load OpenCL Runtime API failed: %s. \n", Err);		fprintf(stderr, "Load OpenCL Runtime API failed: %s. \n", Err);
return 0;		return 0;
}		}
return FuncPtr;		return FuncPtr;
}		}

static int initialDeviceAPILibrariesCL() {		static int initialDeviceAPILibrariesCL() {
		HandleOpenCLBeignet = dlopen("/usr/local/lib/beignet/libcl.so", RTLD_LAZY);
HandleOpenCL = dlopen("libOpenCL.so", RTLD_LAZY);		HandleOpenCL = dlopen("libOpenCL.so", RTLD_LAZY);
if (!HandleOpenCL) {		if (!HandleOpenCL) {
		grosserUnsubmitted Done Reply Inline Actions I am surprised. Intel's OpenCL uses such a different path. In fact, I would assume that libOpenCL.so as a generic OpenCL library would take care of loading beignet. Is this not the case? grosser: I am surprised. Intel's OpenCL uses such a different path. In fact, I would assume that…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions The funny thing is: it does, somewhat. If I only dlopen libOpenCL and then call for example clCreateProgramWithBinary, looking at that in gdb reveals that the beignet routines get loaded by libOpenCL. However as soon as a function is not present in the standard OpenCL API, eg. clCreateProgramWithLLVMIntel, it does not like that. PhilippSchaad: The funny thing is: it does, somewhat. If I only dlopen libOpenCL and then call for example…
		grosserUnsubmitted Done Reply Inline Actions OK. Can you then unconditionally try to load beignet/libcl.so into another variable, e.g. HandleOpenCLBeignet? I assume this handle should be nullptr if the dlopen fails, right? Can you then check if this handle is zero and use this to decide if you want to call clCreateProgramWithLLVMIntel? This should allow us to get rid of all the conditional compilation. grosser: OK. Can you then unconditionally try to load beignet/libcl.so into another variable, e.g.
fprintf(stderr, "Cannot open library: %s. \n", dlerror());		fprintf(stderr, "Cannot open library: %s. \n", dlerror());
return 0;		return 0;
}		}
return 1;		return 1;
}		}

/* Get function pointer to OpenCL Runtime API.		/* Get function pointer to OpenCL Runtime API.
*		*
Show All 9 Lines
* http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html		* http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html
*/		*/
#pragma GCC diagnostic push		#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpedantic"		#pragma GCC diagnostic ignored "-Wpedantic"
static int initialDeviceAPIsCL() {		static int initialDeviceAPIsCL() {
if (initialDeviceAPILibrariesCL() == 0)		if (initialDeviceAPILibrariesCL() == 0)
return 0;		return 0;

		// FIXME: We are now always selecting the Intel Beignet driver if it is
		singam-sanjayUnsubmitted Done Reply Inline Actions `void Handle = ( HandleOpenCLBeignet ? HandleOpenCLBeignet : HandleOpenCL );` maybe ? Let me know if this isn't recommended. singam-sanjay:* `void *Handle = ( HandleOpenCLBeignet ? HandleOpenCLBeignet : HandleOpenCL );` maybe ? Let me…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Definitely the prettier solution. Will change. PhilippSchaad: Definitely the prettier solution. Will change.
		singam-sanjayUnsubmitted Done Reply Inline Actions `void Handle = ( HandleOpenCLBeignet!=NULL ? HandleOpenCLBeignet : HandleOpenCL );` would be better. singam-sanjay:* `void *Handle = ( HandleOpenCLBeignet!=NULL ? HandleOpenCLBeignet : HandleOpenCL );` would be…
		// available on the system, instead of a possible NVIDIA or AMD OpenCL
		// API. This selection should occurr based on the target architecture
		singam-sanjayUnsubmitted Not Done Reply Inline Actions Is it that we prefer to use libraries provided by Intel's SDK and inegrated GPU over other OpenCL drivers and devices ? I'm assuming that this leads the runtime to use the integrated GPU even when a more powerful Radeon card is present. Please correct me if I'm wrong. singam-sanjay: Is it that we prefer to use libraries provided by Intel's SDK and inegrated GPU over other…
		PhilippSchaadAuthorUnsubmitted Not Done Reply Inline Actions This is actually a good point. We don't want to prefer the Intel provided SDK, the choice of selecting Intel's Beignet should be dependent on what -polly-gpu-arch value has been selected. AMD is in the pipeline. PhilippSchaad: This is actually a good point. We don't want to prefer the Intel provided SDK, the choice of…
		singam-sanjayUnsubmitted Not Done Reply Inline Actions Alright. Any ideas on how you'd differentiate between the integrated and dedicated GPU at command line ? singam-sanjay: Alright. Any ideas on how you'd differentiate between the integrated and dedicated GPU at…
		PhilippSchaadAuthorUnsubmitted Not Done Reply Inline Actions The goal is to run on AMD GPUs with the AMDGPU backend, which would mean the differentiation would be done with -polly-gpu-arch=spir32/amdgcn32 for example. I am not yet sure how to transport that choice to here, but I will look into that as I am working on the AMD side of things. Thanks for pointing out this flaw. PhilippSchaad: The goal is to run on AMD GPUs with the AMDGPU backend, which would mean the differentiation…
		// chosen when compiling.
		void *Handle =
		(HandleOpenCLBeignet != NULL ? HandleOpenCLBeignet : HandleOpenCL);

clGetPlatformIDsFcnPtr =		clGetPlatformIDsFcnPtr =
(clGetPlatformIDsFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetPlatformIDs");		(clGetPlatformIDsFcnTy *)getAPIHandleCL(Handle, "clGetPlatformIDs");

clGetDeviceIDsFcnPtr =		clGetDeviceIDsFcnPtr =
(clGetDeviceIDsFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetDeviceIDs");		(clGetDeviceIDsFcnTy *)getAPIHandleCL(Handle, "clGetDeviceIDs");

clGetDeviceInfoFcnPtr =		clGetDeviceInfoFcnPtr =
(clGetDeviceInfoFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetDeviceInfo");		(clGetDeviceInfoFcnTy *)getAPIHandleCL(Handle, "clGetDeviceInfo");

clGetKernelInfoFcnPtr =		clGetKernelInfoFcnPtr =
(clGetKernelInfoFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetKernelInfo");		(clGetKernelInfoFcnTy *)getAPIHandleCL(Handle, "clGetKernelInfo");

clCreateContextFcnPtr =		clCreateContextFcnPtr =
(clCreateContextFcnTy *)getAPIHandleCL(HandleOpenCL, "clCreateContext");		(clCreateContextFcnTy *)getAPIHandleCL(Handle, "clCreateContext");

clCreateCommandQueueFcnPtr = (clCreateCommandQueueFcnTy *)getAPIHandleCL(		clCreateCommandQueueFcnPtr = (clCreateCommandQueueFcnTy *)getAPIHandleCL(
HandleOpenCL, "clCreateCommandQueue");		Handle, "clCreateCommandQueue");

clCreateBufferFcnPtr =		clCreateBufferFcnPtr =
(clCreateBufferFcnTy *)getAPIHandleCL(HandleOpenCL, "clCreateBuffer");		(clCreateBufferFcnTy *)getAPIHandleCL(Handle, "clCreateBuffer");

clEnqueueWriteBufferFcnPtr = (clEnqueueWriteBufferFcnTy *)getAPIHandleCL(		clEnqueueWriteBufferFcnPtr = (clEnqueueWriteBufferFcnTy *)getAPIHandleCL(
HandleOpenCL, "clEnqueueWriteBuffer");		Handle, "clEnqueueWriteBuffer");

		if (HandleOpenCLBeignet)
		clCreateProgramWithLLVMIntelFcnPtr =
		(clCreateProgramWithLLVMIntelFcnTy *)getAPIHandleCL(
		Handle, "clCreateProgramWithLLVMIntel");

clCreateProgramWithBinaryFcnPtr =		clCreateProgramWithBinaryFcnPtr =
(clCreateProgramWithBinaryFcnTy *)getAPIHandleCL(		(clCreateProgramWithBinaryFcnTy *)getAPIHandleCL(
HandleOpenCL, "clCreateProgramWithBinary");		Handle, "clCreateProgramWithBinary");

		grosserUnsubmitted Done Reply Inline Actions Does this break or does this just return nullptr if the function is not available. If we can always try and just detect if it is not around, this would be great! grosser: Does this break or does this just return nullptr if the function is not available. If we can…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions I'm gonna look into that, but iirc it didn't work. Might be wrong, will try it out again! PhilippSchaad: I'm gonna look into that, but iirc it didn't work. Might be wrong, will try it out again!
clBuildProgramFcnPtr =		clBuildProgramFcnPtr =
(clBuildProgramFcnTy *)getAPIHandleCL(HandleOpenCL, "clBuildProgram");		(clBuildProgramFcnTy *)getAPIHandleCL(Handle, "clBuildProgram");

clCreateKernelFcnPtr =		clCreateKernelFcnPtr =
(clCreateKernelFcnTy *)getAPIHandleCL(HandleOpenCL, "clCreateKernel");		(clCreateKernelFcnTy *)getAPIHandleCL(Handle, "clCreateKernel");

clSetKernelArgFcnPtr =		clSetKernelArgFcnPtr =
(clSetKernelArgFcnTy *)getAPIHandleCL(HandleOpenCL, "clSetKernelArg");		(clSetKernelArgFcnTy *)getAPIHandleCL(Handle, "clSetKernelArg");

clEnqueueNDRangeKernelFcnPtr = (clEnqueueNDRangeKernelFcnTy *)getAPIHandleCL(		clEnqueueNDRangeKernelFcnPtr = (clEnqueueNDRangeKernelFcnTy *)getAPIHandleCL(
HandleOpenCL, "clEnqueueNDRangeKernel");		Handle, "clEnqueueNDRangeKernel");

clEnqueueReadBufferFcnPtr = (clEnqueueReadBufferFcnTy *)getAPIHandleCL(		clEnqueueReadBufferFcnPtr =
HandleOpenCL, "clEnqueueReadBuffer");		(clEnqueueReadBufferFcnTy *)getAPIHandleCL(Handle, "clEnqueueReadBuffer");

clFlushFcnPtr = (clFlushFcnTy *)getAPIHandleCL(HandleOpenCL, "clFlush");		clFlushFcnPtr = (clFlushFcnTy *)getAPIHandleCL(Handle, "clFlush");

clFinishFcnPtr = (clFinishFcnTy *)getAPIHandleCL(HandleOpenCL, "clFinish");		clFinishFcnPtr = (clFinishFcnTy *)getAPIHandleCL(Handle, "clFinish");

clReleaseKernelFcnPtr =		clReleaseKernelFcnPtr =
(clReleaseKernelFcnTy *)getAPIHandleCL(HandleOpenCL, "clReleaseKernel");		(clReleaseKernelFcnTy *)getAPIHandleCL(Handle, "clReleaseKernel");

clReleaseProgramFcnPtr =		clReleaseProgramFcnPtr =
(clReleaseProgramFcnTy *)getAPIHandleCL(HandleOpenCL, "clReleaseProgram");		(clReleaseProgramFcnTy *)getAPIHandleCL(Handle, "clReleaseProgram");

clReleaseMemObjectFcnPtr = (clReleaseMemObjectFcnTy *)getAPIHandleCL(		clReleaseMemObjectFcnPtr =
HandleOpenCL, "clReleaseMemObject");		(clReleaseMemObjectFcnTy *)getAPIHandleCL(Handle, "clReleaseMemObject");

clReleaseCommandQueueFcnPtr = (clReleaseCommandQueueFcnTy *)getAPIHandleCL(		clReleaseCommandQueueFcnPtr = (clReleaseCommandQueueFcnTy *)getAPIHandleCL(
HandleOpenCL, "clReleaseCommandQueue");		Handle, "clReleaseCommandQueue");

clReleaseContextFcnPtr =		clReleaseContextFcnPtr =
(clReleaseContextFcnTy *)getAPIHandleCL(HandleOpenCL, "clReleaseContext");		(clReleaseContextFcnTy *)getAPIHandleCL(Handle, "clReleaseContext");

return 1;		return 1;
}		}
#pragma GCC diagnostic pop		#pragma GCC diagnostic pop

/* Context and Device. */		/* Context and Device. */
static PollyGPUContext *GlobalContext = NULL;		static PollyGPUContext *GlobalContext = NULL;
static cl_device_id GlobalDeviceID = NULL;		static cl_device_id GlobalDeviceID = NULL;
▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	static PollyGPUFunction getKernelCL(const char BinaryBuffer,
}		}

if (!GlobalDeviceID) {		if (!GlobalDeviceID) {
fprintf(stderr, "GPGPU-code generation not initialized correctly.\n");		fprintf(stderr, "GPGPU-code generation not initialized correctly.\n");
exit(-1);		exit(-1);
}		}

cl_int Ret;		cl_int Ret;

		if (HandleOpenCLBeignet) {
		// TODO: This is a workaround, since clCreateProgramWithLLVMIntel only
		// accepts a filename to a valid llvm-ir file as an argument, instead
		// of accepting the BinaryBuffer directly.
		FILE *fp = fopen("kernel.ll", "wb");
		singam-sanjayUnsubmitted Not Done Reply Inline Actions Is this file opened inside <llvm_build>/bin or $PWD ? If it's $PWD, we should to take precautions to not open a file that the user needs. You could instead open a file in /tmp or %TEMP% in Windows, like "/tmp/R@ηdηдm3.ll". Anyways, you could implement this later. singam-sanjay: Is this file opened inside <llvm_build>/bin or $PWD ? If it's $PWD, we should to take…
		PhilippSchaadAuthorUnsubmitted Not Done Reply Inline Actions I will add this later, but yes it is $PWD. The solution with using /tmp and %TEMP% seem reasonable. PhilippSchaad: I will add this later, but yes it is $PWD. The solution with using /tmp and %TEMP% seem…
		singam-sanjayUnsubmitted Not Done Reply Inline Actions Alright. singam-sanjay: Alright.
		if (fp != NULL) {
		fputs(BinaryBuffer, fp);
		fclose(fp);
		}

		((OpenCLKernel *)Function->Kernel)->Program =
		singam-sanjayUnsubmitted Done Reply Inline Actions Was this supposed to a message letting the user know that the runtime's not using the "beignet" OpenCL implementation ? singam-sanjay: Was this supposed to a message letting the user know that the runtime's not using the "beignet"…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Oops, thanks for pointing this out, this was unintentionally left in and should never be reached! Used it for debugging. Removing it. PhilippSchaad: Oops, thanks for pointing this out, this was unintentionally left in and should never be…
		singam-sanjayUnsubmitted Done Reply Inline Actions 👍 singam-sanjay: 👍
		clCreateProgramWithLLVMIntelFcnPtr(
		((OpenCLContext *)GlobalContext->Context)->Context, 1,
		&GlobalDeviceID, "kernel.ll", &Ret);
		grosserUnsubmitted Done Reply Inline Actions Why do we read the file from "kernel.ll" and not use the file passed to us in BinaryBuffer? grosser: Why do we read the file from "kernel.ll" and not use the file passed to us in BinaryBuffer?
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Well the problem is that clCreateProgramWithLLVMIntel wants to read from a file, and requests a filename/filepath as an argument there. BinaryBuffer however holds the complete binary. If you know a simpler solution, that would of course be great, because this is another hacky-part. PhilippSchaad: Well the problem is that clCreateProgramWithLLVMIntel wants to read from a file, and requests…
		grosserUnsubmitted Done Reply Inline Actions I see. Maybe that's fine then. Just add a comment for now. I would prefer if we could use clCreateProgramWithBinary and just pass an LLVM bitcode file, but this likely only works if the LLVM versions of beignet is very modern. Let's leave it for now like this. It's good to have something working! grosser: I see. Maybe that's fine then. Just add a comment for now. I would prefer if we could use…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Totally agreed, that would be the better option. As you pointed out, it is currently not yet possible. PhilippSchaad: Totally agreed, that would be the better option. As you pointed out, it is currently not yet…
		grosserUnsubmitted Done Reply Inline Actions Another change. Could you possibly remove the conditional compilation here and just check if clCreateProgramWithLLVMIntelFcnPtr != nullptr? grosser: Another change. Could you possibly remove the conditional compilation here and just check if…
		singam-sanjayUnsubmitted Not Done Reply Inline Actions @grosser What is your rationale to avoid conditional compilation ? singam-sanjay: @grosser What is your rationale to avoid conditional compilation ?
		checkOpenCLError(Ret, "Failed to create program from llvm.\n");
		unlink("kernel.ll");
		} else {
size_t BinarySize = strlen(BinaryBuffer);		size_t BinarySize = strlen(BinaryBuffer);
((OpenCLKernel *)Function->Kernel)->Program = clCreateProgramWithBinaryFcnPtr(		((OpenCLKernel *)Function->Kernel)->Program =
((OpenCLContext *)GlobalContext->Context)->Context, 1, &GlobalDeviceID,		clCreateProgramWithBinaryFcnPtr(
(const size_t )&BinarySize, (const unsigned char *)&BinaryBuffer, NULL,		((OpenCLContext *)GlobalContext->Context)->Context, 1,
&Ret);		&GlobalDeviceID, (const size_t *)&BinarySize,
		(const unsigned char **)&BinaryBuffer, NULL, &Ret);
checkOpenCLError(Ret, "Failed to create program from binary.\n");		checkOpenCLError(Ret, "Failed to create program from binary.\n");
		}

Ret = clBuildProgramFcnPtr(((OpenCLKernel *)Function->Kernel)->Program, 1,		Ret = clBuildProgramFcnPtr(((OpenCLKernel *)Function->Kernel)->Program, 1,
&GlobalDeviceID, NULL, NULL, NULL);		&GlobalDeviceID, NULL, NULL, NULL);
checkOpenCLError(Ret, "Failed to build program.\n");		checkOpenCLError(Ret, "Failed to build program.\n");

((OpenCLKernel *)Function->Kernel)->Kernel = clCreateKernelFcnPtr(		((OpenCLKernel *)Function->Kernel)->Kernel = clCreateKernelFcnPtr(
((OpenCLKernel *)Function->Kernel)->Program, KernelName, &Ret);		((OpenCLKernel *)Function->Kernel)->Program, KernelName, &Ret);
checkOpenCLError(Ret, "Failed to create kernel.\n");		checkOpenCLError(Ret, "Failed to create kernel.\n");
▲ Show 20 Lines • Show All 1,154 Lines • Show Last 20 Lines