This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

3/3
CMakeLists.txt
-
include/polly/CodeGen/
-
polly/
-
CodeGen/
-
PPCGCodeGeneration.h
-
lib/
-
CodeGen/
32/35
PPCGCodeGeneration.cpp
-
Support/
-
RegisterPasses.cpp
-
tools/GPURuntime/
-
GPURuntime/
21/29
GPUJIT.c

Differential D35185

[Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for Intel
ClosedPublic

Authored by PhilippSchaad on Jul 9 2017, 11:51 AM.

Download Raw Diff

Details

Reviewers

bollu
grosser
Meinersbur
singam-sanjay

Commits

rG2f3073b5cb88: [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for…
rPLO308751: [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for…
rL308751: [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for…

Summary

Added SPIR Code Generation to the PPCG Code Generator. This can be invoked using
the polly-gpu-arch flag value 'spir32' or 'spir64' for 32 and 64 bit code respectively.
In addition to that, runtime support has been added to execute said SPIR code on Intel
GPU's, where the system is equipped with Intel's open source driver Beignet (development
version). This requires the cmake flag 'USE_INTEL_OCL' to be turned on, and the polly-gpu-runtime
flag value to be 'libopencl'.
The transformation of LLVM IR to SPIR is currently quite a hack, consisting in part of regex
string transformations.
Has been tested (working) with Polybench 3.2 on an Intel i7-5500U (integrated graphics chip).

Diff Detail

Build Status

Buildable 8126
Build 8126: arc lint + arc unit

Event Timeline

PhilippSchaad created this revision.Jul 9 2017, 11:51 AM

Herald added a reviewer: bollu. · View Herald TranscriptJul 9 2017, 11:51 AM

Herald added subscribers: kbarton, Anastasia, mgorny, nemanjai. · View Herald Transcript

Harbormaster completed remote builds in B8083: Diff 105789.Jul 9 2017, 11:51 AM

PhilippSchaad added reviewers: grosser, Meinersbur.Jul 9 2017, 11:52 AM

PhilippSchaad set the repository for this revision to rL LLVM.

PhilippSchaad added a project: Restricted Project.

PhilippSchaad edited the summary of this revision. (Show Details)

PhilippSchaad added a subscriber: pollydev.

Hi Philipp,

this looks indeed already surprisingly good. As you noted, there are still some hacks, but I have good hopes that we can work around most of them. I added some comments. Let me know what you think!

Best,
Tobias

lib/CodeGen/PPCGCodeGeneration.cpp
40	After having removed the hack below, this should hopefully not be needed any more.
1259	Why do you emit NVIDIA intrinsics for SPIR code? Can you not emit an opencl barrier? If you don't know what intrinsic corresponds to this, create a simple .cl file, compile it with clang to SPIR and see which intrinsic gets emitted.
1782	Instead of spreading the SPIR code all over the place, can we create one function addSPIRMetadata, that takes 'Args' as argument, emits all the metadata needed and also emits the metadata you add a little further below?
1882	Why do you emit NVIDIA intrinsics. As discussed above, this there no way to emit the OpenCL intrinsics instead?
2181	It seems all these regexp hacks are needed because we don't generate the right intrinsics. What is holding us back from generating them directly?
2204	Interesting. So the optimization break stuff. Would be great to understand what exactly we brake!
tools/GPURuntime/GPUJIT.c
158	Why do these declarations need to be optional? I think they should compile easily in any situation, no?
228	I am surprised. Intel's OpenCL uses such a different path. In fact, I would assume that libOpenCL.so as a generic OpenCL library would take care of loading beignet. Is this not the case?
286	Does this break or does this just return nullptr if the function is not available. If we can always try and just detect if it is not around, this would be great!
517	Why do we read the file from "kernel.ll" and not use the file passed to us in BinaryBuffer?

Set this to request changes, that I get notified in case you update this revision.

This revision now requires changes to proceed.Jul 10 2017, 1:51 PM

PhilippSchaad added inline comments.Jul 10 2017, 4:06 PM

lib/CodeGen/PPCGCodeGeneration.cpp
40	That is correct, yes. This should only be needed as long as we have to manually change the string occurrences for barriers and global/local IDs.
1259	The generated intrinsics are the ones declared in the SPIR reference, so for example @_Z13get_global_idj(i32 0) . I cannot find any corresponding intrinsics in LLVM's TableGen/Intrinsics, did I overlook some? This is essentially the only real point where the hack is needed. If we can insert correct intrinsics, we can ditch the regex. Because fixing this is all the regex does.
1782	Will look into that.
1882	Dito above.
2181	Dito above.
2204	Looking into that. What is clear, is that the optimized version skips a few 'dummy' computations, because the optimized kernels expect a smaller workgroup size in OpenCL. Not sure if and how we could account for that. It seems very tricky to me atm, have played around with it quite a bit.
tools/GPURuntime/GPUJIT.c
158	Oh yeah, my bad. Only the LLVMIntel one has to be conditional.
228	The funny thing is: it does, somewhat. If I only dlopen libOpenCL and then call for example clCreateProgramWithBinary, looking at that in gdb reveals that the beignet routines get loaded by libOpenCL. However as soon as a function is not present in the standard OpenCL API, eg. clCreateProgramWithLLVMIntel, it does not like that.
286	I'm gonna look into that, but iirc it didn't work. Might be wrong, will try it out again!
517	Well the problem is that clCreateProgramWithLLVMIntel wants to read from a file, and requests a filename/filepath as an argument there. BinaryBuffer however holds the complete binary. If you know a simpler solution, that would of course be great, because this is another hacky-part.

PhilippSchaad updated this revision to Diff 105946.Jul 10 2017, 5:29 PM

PhilippSchaad edited edge metadata.

Adapted dynamic method loading for intel

grosser added inline comments.Jul 10 2017, 10:08 PM

CMakeLists.txt
178	Forgot to mention, but would be great if we could get away without a configure test here.
lib/CodeGen/PPCGCodeGeneration.cpp
1259	No, I think they may not exist. However, you can just create declarations for such functions using code as in "GPUNodeBuilder::createCallGetKernel(".
1782	I just see this is not only about the llvm::Type, but also carries an integer which is 0 for local/shared memory and 1 otherwise. So moving this into a function does not seem so easy. If you don't find an easy way, just leave it as it is.
2204	It's OK. Let's get the remainder of the patch updated, then we can look at this issue later on.
tools/GPURuntime/GPUJIT.c
517	I see. Maybe that's fine then. Just add a comment for now. I would prefer if we could use clCreateProgramWithBinary and just pass an LLVM bitcode file, but this likely only works if the LLVM versions of beignet is very modern. Let's leave it for now like this. It's good to have something working!

grosser requested changes to this revision.Jul 10 2017, 10:15 PM

This revision now requires changes to proceed.Jul 10 2017, 10:15 PM

Inserting SPIR barriers with custom function.

CMakeLists.txt
178	Yes I agree. I don't know of a method for detecting a Beignet installation with CMake yet, but if that exists, that would of course be great.
tools/GPURuntime/GPUJIT.c
517	Totally agreed, that would be the better option. As you pointed out, it is currently not yet possible.

Added comment clarification for workaround.

Removed Regexp hack

PhilippSchaad marked 6 inline comments as done.Jul 11 2017, 4:58 AM

Nice. Can you potentially also add a test case?

CMakeLists.txt
178	I don't think we want to add this to cmake at all. At best, we just compile the beignet support in unconditionally, if this is possibile.
lib/CodeGen/PPCGCodeGeneration.cpp
1948	If you store the group names in an array, you can just index this array with "i".
2080	Maybe call this insertCUDAKernelIntrinsics and insertSPIRKernelIntrincics for consistency?
tools/GPURuntime/GPUJIT.c
228	OK. Can you then unconditionally try to load beignet/libcl.so into another variable, e.g. HandleOpenCLBeignet? I assume this handle should be nullptr if the dlopen fails, right? Can you then check if this handle is zero and use this to decide if you want to call clCreateProgramWithLLVMIntel? This should allow us to get rid of all the conditional compilation.

grosser requested changes to this revision.Jul 11 2017, 6:26 AM

This revision now requires changes to proceed.Jul 11 2017, 6:26 AM

grosser added inline comments.Jul 11 2017, 6:29 AM

tools/GPURuntime/GPUJIT.c
517	Another change. Could you possibly remove the conditional compilation here and just check if clCreateProgramWithLLVMIntelFcnPtr != nullptr?

Simplification
Removed compile-conditional CMake dependency

I am not yet too familiar with how to implement such a case and what exactly to check for, but if that is desired, I can look into it.

This should now address all concerns, minus the file-workaround.

lib/CodeGen/PPCGCodeGeneration.cpp
2080	The reason I kept this separated the way it is now, without specifying "CUDA" in the earlier version, is that SPIR should be the only example. When AMD support follows, or anything where there's a registered LLVM target with corresponding intrinsics (AMDGPU in that case), the original function could be used with just an additional switch case, changing the intrinsics IDs. Do you agree with that?

PhilippSchaad added inline comments.Jul 11 2017, 7:54 AM

lib/CodeGen/PPCGCodeGeneration.cpp
2080	And with 'example' I mean 'exception', my bad.

Great. Yes, a test case would be great. It is not very complicated. You already updated test cases in your last patch. I would suggest you copy one of the simple double nested loop kernels from the PTX test and just check that the right SPIR intrinsics are generated! That should be enough.

Otherwise, I think this is good to go. Very nice work!

lib/CodeGen/PPCGCodeGeneration.cpp
2080	Sure. Still, the function names could be more similar. What about insertKernelCallsSPIR?

This revision now requires changes to proceed.Jul 11 2017, 8:47 AM

Added testcase for SPIR
Fixed naming inconsistencies

LGTM from my side.

This revision is now accepted and ready to land.Jul 14 2017, 11:55 AM

@singam-sanjay can you have a final look and commit it if you are OK with this patch.

Sure. Will do in sometime.

@grosser Thanks for adding me as a reviewer ! It's helped me acquaint myself with the SPIR V codegen and understand GPUJIT better.

@PhilippSchaad I've made suggestions to restructure code, a possibly unintentional debug print statement and added some questions about the SPIR. Do let me know if (and why) some of my suggestions are unnecessary.

General Question: I was following a discussion about integrating the SPIR-V backend as a part of LLVM, i.e. moving it from /tools to lib/Target. If that were to happen, would we be able to generate and feed SPIR IR in the same way as NVPTX ?

lib/CodeGen/PPCGCodeGeneration.cpp
1794	@PhilippSchaad I'm not yet acquainted with the SPIR annotations. Could you please explain why it matters to let the SPIR backend know whether a kernel argument is a read only ?
1796	Why are you setting these metadata to empty strings ? Is this a feature yet to be implemented ?
2097	nitpick: Do we need this ?
2120	nitpick: Do we need this ?
2208–2216	Would it be better to move the SPIR specific code into createKernelASM, as well ? You might not need the switch cases to handle buggy control flow like, case GPUArch::SPIR64: case GPUArch::SPIR32: llvm_unreachable("Cannot generate ASM for SPIR architecture");
tools/GPURuntime/GPUJIT.c
151	This is new to me. Why are you providing runtime support for Intel to JIT SPIR code ? Is it to run with the integrated GPU on an Intel CPU? Guessing from https://github.com/intel/beignet.
255	`void *Handle = ( HandleOpenCLBeignet ? HandleOpenCLBeignet : HandleOpenCL );` maybe ? Let me know if this isn't recommended.
257	Is it that we prefer to use libraries provided by Intel's SDK and inegrated GPU over other OpenCL drivers and devices ? I'm assuming that this leads the runtime to use the integrated GPU even when a more powerful Radeon card is present. Please correct me if I'm wrong.
517	@grosser What is your rationale to avoid conditional compilation ?
531	Is this file opened inside <llvm_build>/bin or $PWD ? If it's $PWD, we should to take precautions to not open a file that the user needs. You could instead open a file in /tmp or %TEMP% in Windows, like "/tmp/R@ηdηдm3.ll". Anyways, you could implement this later.
537	Was this supposed to a message letting the user know that the runtime's not using the "beignet" OpenCL implementation ?

Will address the points mentioned asap. As for your SPIR-V question: The goal, should that back-end be added, would be to add SPIR-V compilation as an additional GPU target option (In the future). This would allow kernel execution on any SPIR-V supporting target device. This SPIR solution is essentially a 'quickfix' to get it to work on Intel, as long as SPIR-V is not an option.

lib/CodeGen/PPCGCodeGeneration.cpp
1794	The issue here is the Intel Beignet driver. Beignet expects a very specific set of kernel annotations when parsing the Kernel IR. The metadata is simply required to be there, but for the kernels we generate here, it does not matter what is actually contained in there. This is entirely Beignet dependent, maybe Intel will adapt that in the future.
1796	Dito above.
2097	We can change it to an if-clause if that is desired. The compiler just complains when we have a switch and do not check every possible value of `Arch`. An alternative would be a default clause handling that `llvm_unreachable`.
2120	Dito above.
2208–2216	Will do, good point. Originally moved it out because we thought we had to do a lot more hacking on the resulting IR.
tools/GPURuntime/GPUJIT.c
151	Your guess is exactly correct. This also corresponds to some of the points mentioned above.
255	Definitely the prettier solution. Will change.
257	This is actually a good point. We don't want to prefer the Intel provided SDK, the choice of selecting Intel's Beignet should be dependent on what -polly-gpu-arch value has been selected. AMD is in the pipeline.
531	I will add this later, but yes it is $PWD. The solution with using /tmp and %TEMP% seem reasonable.
537	Oops, thanks for pointing this out, this was unintentionally left in and should never be reached! Used it for debugging. Removing it.

singam-sanjay added inline comments.Jul 17 2017, 12:56 AM

lib/CodeGen/PPCGCodeGeneration.cpp
1794	Ohk. Thanks for the info !
1796	Alright. I guess beignet isn't worried about the names of kernel arguments in "kernel_arg_name".
2097	I was talking about the `break;` after `llvm_unreachable`. If `llvm_unreachable` would call `abort`, the `break` would be redundant.
2120	Ditto above ;-).
2208–2216	Alright.
tools/GPURuntime/GPUJIT.c
151	Ohk.
255	`void *Handle = ( HandleOpenCLBeignet!=NULL ? HandleOpenCLBeignet : HandleOpenCL );` would be better.
257	Alright. Any ideas on how you'd differentiate between the integrated and dedicated GPU at command line ?
531	Alright.
537	👍

Removed left over debug print, moved SPIR creation into createASM, fixed minor issues addressed in comments.

PhilippSchaad marked 24 inline comments as done.Jul 17 2017, 5:25 PM

PhilippSchaad added inline comments.

tools/GPURuntime/GPUJIT.c
257	The goal is to run on AMD GPUs with the AMDGPU backend, which would mean the differentiation would be done with -polly-gpu-arch=spir32/amdgcn32 for example. I am not yet sure how to transport that choice to here, but I will look into that as I am working on the AMD side of things. Thanks for pointing out this flaw.

Ping do the latest changes address your concerns @singam-sanjay ? Can I land this?

@PhilippSchaad Please go ahead.

Rebase for commit

Harbormaster completed remote builds in B8468: Diff 107678.Jul 21 2017, 8:52 AM

Closed by commit rL308751: [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for… (authored by phschaad). · Explain WhyJul 21 2017, 9:11 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

CMakeLists.txt

5 lines

include/

polly/

CodeGen/

PPCGCodeGeneration.h

2 lines

lib/

CodeGen/

PPCGCodeGeneration.cpp

171 lines

Support/

RegisterPasses.cpp

6 lines

tools/

GPURuntime/

GPUJIT.c

40 lines

Diff 106000

CMakeLists.txt

Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	if (CUDA_FOUND)
add_definitions(-DHAS_LIBCUDART)		add_definitions(-DHAS_LIBCUDART)
INCLUDE_DIRECTORIES( ${CUDA_INCLUDE_DIRS} )		INCLUDE_DIRECTORIES( ${CUDA_INCLUDE_DIRS} )
endif(CUDA_FOUND)		endif(CUDA_FOUND)
if (OpenCL_FOUND)		if (OpenCL_FOUND)
add_definitions(-DHAS_LIBOPENCL)		add_definitions(-DHAS_LIBOPENCL)
INCLUDE_DIRECTORIES( ${OpenCL_INCLUDE_DIR} )		INCLUDE_DIRECTORIES( ${OpenCL_INCLUDE_DIR} )
endif(OpenCL_FOUND)		endif(OpenCL_FOUND)

		option(USE_INTEL_OCL "Uses the Intel Beignet driver for GPGPU code" OFF)
		if (USE_INTEL_OCL)
		add_definitions(-DHAS_INTEL_OCL)
		endif(USE_INTEL_OCL)
		grosserUnsubmitted Done Reply Inline Actions Forgot to mention, but would be great if we could get away without a configure test here. grosser: Forgot to mention, but would be great if we could get away without a configure test here.
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Yes I agree. I don't know of a method for detecting a Beignet installation with CMake yet, but if that exists, that would of course be great. PhilippSchaad: Yes I agree. I don't know of a method for detecting a Beignet installation with CMake yet, but…
		grosserUnsubmitted Done Reply Inline Actions I don't think we want to add this to cmake at all. At best, we just compile the beignet support in unconditionally, if this is possibile. grosser: I don't think we want to add this to cmake at all. At best, we just compile the beignet support…

option(POLLY_BUNDLED_ISL "Use the bundled version of libisl included in Polly" ON)		option(POLLY_BUNDLED_ISL "Use the bundled version of libisl included in Polly" ON)
if (NOT POLLY_BUNDLED_ISL)		if (NOT POLLY_BUNDLED_ISL)
find_package(ISL MODULE REQUIRED)		find_package(ISL MODULE REQUIRED)
message(STATUS "Using external libisl ${ISL_VERSION} in: ${ISL_PREFIX}")		message(STATUS "Using external libisl ${ISL_VERSION} in: ${ISL_PREFIX}")
set(ISL_TARGET ISL)		set(ISL_TARGET ISL)
else()		else()
set(ISL_INCLUDE_DIRS		set(ISL_INCLUDE_DIRS
${CMAKE_CURRENT_BINARY_DIR}/lib/External/isl/include		${CMAKE_CURRENT_BINARY_DIR}/lib/External/isl/include
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

include/polly/CodeGen/PPCGCodeGeneration.h

	Show All 10 Lines
	// GPU mapping strategy.			// GPU mapping strategy.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef POLLY_PPCGCODEGENERATION_H			#ifndef POLLY_PPCGCODEGENERATION_H
	#define POLLY_PPCGCODEGENERATION_H			#define POLLY_PPCGCODEGENERATION_H

	/// The GPU Architecture to target.			/// The GPU Architecture to target.
	enum GPUArch { NVPTX64 };			enum GPUArch { NVPTX64, SPIR32, SPIR64 };

	/// The GPU Runtime implementation to use.			/// The GPU Runtime implementation to use.
	enum GPURuntime { CUDA, OpenCL };			enum GPURuntime { CUDA, OpenCL };

	#endif // POLLY_PPCGCODEGENERATION_H			#endif // POLLY_PPCGCODEGENERATION_H

lib/CodeGen/PPCGCodeGeneration.cpp

Show All 31 Lines
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"		#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"

		#include <regex>
		grosserUnsubmitted Done Reply Inline Actions After having removed the hack below, this should hopefully not be needed any more. grosser: After having removed the hack below, this should hopefully not be needed any more.
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions That is correct, yes. This should only be needed as long as we have to manually change the string occurrences for barriers and global/local IDs. PhilippSchaad: That is correct, yes. This should only be needed as long as we have to manually change the…

#include "isl/union_map.h"		#include "isl/union_map.h"

extern "C" {		extern "C" {
#include "ppcg/cuda.h"		#include "ppcg/cuda.h"
#include "ppcg/gpu.h"		#include "ppcg/gpu.h"
#include "ppcg/gpu_print.h"		#include "ppcg/gpu_print.h"
#include "ppcg/ppcg.h"		#include "ppcg/ppcg.h"
#include "ppcg/schedule.h"		#include "ppcg/schedule.h"
▲ Show 20 Lines • Show All 517 Lines • ▼ Show 20 Lines	private:
/// Create an in-kernel synchronization call.		/// Create an in-kernel synchronization call.
void createKernelSync();		void createKernelSync();

/// Create a PTX assembly string for the current GPU kernel.		/// Create a PTX assembly string for the current GPU kernel.
///		///
/// @returns A string containing the corresponding PTX assembly code.		/// @returns A string containing the corresponding PTX assembly code.
std::string createKernelASM();		std::string createKernelASM();

		/// Create a SPIR string for the current GPU kernel.
		///
		/// @returns A string containing the corresponding SPIR code.
		std::string createKernelSPIR(std::string IR);

/// Remove references from the dominator tree to the kernel function @p F.		/// Remove references from the dominator tree to the kernel function @p F.
///		///
/// @param F The function to remove references to.		/// @param F The function to remove references to.
void clearDominators(Function *F);		void clearDominators(Function *F);

/// Remove references from scalar evolution to the kernel function @p F.		/// Remove references from scalar evolution to the kernel function @p F.
///		///
/// @param F The function to remove references to.		/// @param F The function to remove references to.
▲ Show 20 Lines • Show All 644 Lines • ▼ Show 20 Lines	void GPUNodeBuilder::createScopStmt(isl_ast_expr *Expr,
if (Stmt->isBlockStmt())		if (Stmt->isBlockStmt())
BlockGen.copyStmt(*Stmt, LTS, Indexes);		BlockGen.copyStmt(*Stmt, LTS, Indexes);
else		else
RegionGen.copyStmt(*Stmt, LTS, Indexes);		RegionGen.copyStmt(*Stmt, LTS, Indexes);
}		}

void GPUNodeBuilder::createKernelSync() {		void GPUNodeBuilder::createKernelSync() {
Module *M = Builder.GetInsertBlock()->getParent()->getParent();		Module *M = Builder.GetInsertBlock()->getParent()->getParent();
		const char *SpirName = "__gen_ocl_barrier_global";

Function *Sync;		Function *Sync;

switch (Arch) {		switch (Arch) {
		case GPUArch::SPIR64:
		case GPUArch::SPIR32:
		Sync = M->getFunction(SpirName);

		// If Sync is not available, declare it.
		if (!Sync) {
		GlobalValue::LinkageTypes Linkage = Function::ExternalLinkage;
		std::vector<Type *> Args;
		FunctionType *Ty = FunctionType::get(Builder.getVoidTy(), Args, false);
		Sync = Function::Create(Ty, Linkage, SpirName, M);
		Sync->setCallingConv(CallingConv::SPIR_FUNC);
		}
		break;
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
Sync = Intrinsic::getDeclaration(M, Intrinsic::nvvm_barrier0);		Sync = Intrinsic::getDeclaration(M, Intrinsic::nvvm_barrier0);
		grosserUnsubmitted Done Reply Inline Actions Why do you emit NVIDIA intrinsics for SPIR code? Can you not emit an opencl barrier? If you don't know what intrinsic corresponds to this, create a simple .cl file, compile it with clang to SPIR and see which intrinsic gets emitted. grosser: Why do you emit NVIDIA intrinsics for SPIR code? Can you not emit an opencl barrier? If you…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions The generated intrinsics are the ones declared in the SPIR reference, so for example @_Z13get_global_idj(i32 0) . I cannot find any corresponding intrinsics in LLVM's TableGen/Intrinsics, did I overlook some? This is essentially the only real point where the hack is needed. If we can insert correct intrinsics, we can ditch the regex. Because fixing this is all the regex does. PhilippSchaad: The generated intrinsics are the ones declared in the SPIR reference, so for example…
		grosserUnsubmitted Done Reply Inline Actions No, I think they may not exist. However, you can just create declarations for such functions using code as in "GPUNodeBuilder::createCallGetKernel(". grosser: No, I think they may not exist. However, you can just create declarations for such functions…
break;		break;
}		}

Builder.CreateCall(Sync, {});		Builder.CreateCall(Sync, {});
}		}

/// Collect llvm::Values referenced from @p Node		/// Collect llvm::Values referenced from @p Node
///		///
▲ Show 20 Lines • Show All 377 Lines • ▼ Show 20 Lines	void GPUNodeBuilder::createKernel(__isl_take isl_ast_node *KernelStmt) {

createKernelFunction(Kernel, SubtreeValues, SubtreeFunctions);		createKernelFunction(Kernel, SubtreeValues, SubtreeFunctions);
setupKernelSubtreeFunctions(SubtreeFunctions);		setupKernelSubtreeFunctions(SubtreeFunctions);

create(isl_ast_node_copy(Kernel->tree));		create(isl_ast_node_copy(Kernel->tree));

finalizeKernelArguments(Kernel);		finalizeKernelArguments(Kernel);
Function *F = Builder.GetInsertBlock()->getParent();		Function *F = Builder.GetInsertBlock()->getParent();
		if (Arch == GPUArch::NVPTX64)
addCUDAAnnotations(F->getParent(), BlockDimX, BlockDimY, BlockDimZ);		addCUDAAnnotations(F->getParent(), BlockDimX, BlockDimY, BlockDimZ);
clearDominators(F);		clearDominators(F);
clearScalarEvolution(F);		clearScalarEvolution(F);
clearLoops(F);		clearLoops(F);

IDToValue = HostIDs;		IDToValue = HostIDs;

ValueMap = std::move(HostValueMap);		ValueMap = std::move(HostValueMap);
ScalarMap = std::move(HostScalarMap);		ScalarMap = std::move(HostScalarMap);
Show All 40 Lines	if (!is64Bit) {
Ret += "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:"		Ret += "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:"
"64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:"		"64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:"
"64-v128:128:128-n16:32:64";		"64-v128:128:128-n16:32:64";
}		}

return Ret;		return Ret;
}		}

		/// Compute the DataLayout string for a SPIR kernel.
		///
		/// @param is64Bit Are we looking for a 64 bit architecture?
		static std::string computeSPIRDataLayout(bool is64Bit) {
		std::string Ret = "";

		if (!is64Bit) {
		Ret += "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:"
		"64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:"
		"32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:"
		"256:256-v256:256:256-v512:512:512-v1024:1024:1024";
		} else {
		Ret += "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:"
		"64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:"
		"32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:"
		"256:256-v256:256:256-v512:512:512-v1024:1024:1024";
		}

		return Ret;
		}

Function *		Function *
GPUNodeBuilder::createKernelFunctionDecl(ppcg_kernel *Kernel,		GPUNodeBuilder::createKernelFunctionDecl(ppcg_kernel *Kernel,
SetVector<Value *> &SubtreeValues) {		SetVector<Value *> &SubtreeValues) {
std::vector<Type *> Args;		std::vector<Type *> Args;
std::string Identifier = getKernelFuncName(Kernel->id);		std::string Identifier = getKernelFuncName(Kernel->id);

		std::vector<Metadata *> MemoryType;

for (long i = 0; i < Prog->n_array; i++) {		for (long i = 0; i < Prog->n_array; i++) {
if (!ppcg_kernel_requires_array_argument(Kernel, i))		if (!ppcg_kernel_requires_array_argument(Kernel, i))
continue;		continue;

if (gpu_array_is_read_only_scalar(&Prog->array[i])) {		if (gpu_array_is_read_only_scalar(&Prog->array[i])) {
isl_id *Id = isl_space_get_tuple_id(Prog->array[i].space, isl_dim_set);		isl_id *Id = isl_space_get_tuple_id(Prog->array[i].space, isl_dim_set);
const ScopArrayInfo *SAI = ScopArrayInfo::getFromId(Id);		const ScopArrayInfo *SAI = ScopArrayInfo::getFromId(Id);
Args.push_back(SAI->getElementType());		Args.push_back(SAI->getElementType());
		MemoryType.push_back(
		ConstantAsMetadata::get(ConstantInt::get(Builder.getInt32Ty(), 0)));
} else {		} else {
static const int UseGlobalMemory = 1;		static const int UseGlobalMemory = 1;
Args.push_back(Builder.getInt8PtrTy(UseGlobalMemory));		Args.push_back(Builder.getInt8PtrTy(UseGlobalMemory));
		MemoryType.push_back(
		ConstantAsMetadata::get(ConstantInt::get(Builder.getInt32Ty(), 1)));
}		}
}		}

int NumHostIters = isl_space_dim(Kernel->space, isl_dim_set);		int NumHostIters = isl_space_dim(Kernel->space, isl_dim_set);

for (long i = 0; i < NumHostIters; i++)		for (long i = 0; i < NumHostIters; i++) {
Args.push_back(Builder.getInt64Ty());		Args.push_back(Builder.getInt64Ty());
		MemoryType.push_back(
		ConstantAsMetadata::get(ConstantInt::get(Builder.getInt32Ty(), 0)));
		}

int NumVars = isl_space_dim(Kernel->space, isl_dim_param);		int NumVars = isl_space_dim(Kernel->space, isl_dim_param);

for (long i = 0; i < NumVars; i++) {		for (long i = 0; i < NumVars; i++) {
isl_id *Id = isl_space_get_dim_id(Kernel->space, isl_dim_param, i);		isl_id *Id = isl_space_get_dim_id(Kernel->space, isl_dim_param, i);
Value *Val = IDToValue[Id];		Value *Val = IDToValue[Id];
isl_id_free(Id);		isl_id_free(Id);
Args.push_back(Val->getType());		Args.push_back(Val->getType());
		MemoryType.push_back(
		ConstantAsMetadata::get(ConstantInt::get(Builder.getInt32Ty(), 0)));
}		}

for (auto *V : SubtreeValues)		for (auto *V : SubtreeValues) {
Args.push_back(V->getType());		Args.push_back(V->getType());
		MemoryType.push_back(
		ConstantAsMetadata::get(ConstantInt::get(Builder.getInt32Ty(), 0)));
		}

		grosserUnsubmitted Done Reply Inline Actions Instead of spreading the SPIR code all over the place, can we create one function addSPIRMetadata, that takes 'Args' as argument, emits all the metadata needed and also emits the metadata you add a little further below? grosser: Instead of spreading the SPIR code all over the place, can we create one function…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Will look into that. PhilippSchaad: Will look into that.
		grosserUnsubmitted Done Reply Inline Actions I just see this is not only about the llvm::Type, but also carries an integer which is 0 for local/shared memory and 1 otherwise. So moving this into a function does not seem so easy. If you don't find an easy way, just leave it as it is. grosser: I just see this is not only about the llvm::Type, but also carries an integer which is 0 for…
auto *FT = FunctionType::get(Builder.getVoidTy(), Args, false);		auto *FT = FunctionType::get(Builder.getVoidTy(), Args, false);
auto *FN = Function::Create(FT, Function::ExternalLinkage, Identifier,		auto *FN = Function::Create(FT, Function::ExternalLinkage, Identifier,
GPUModule.get());		GPUModule.get());

		std::vector<Metadata *> EmptyStrings;

		for (unsigned int i = 0; i < MemoryType.size(); i++) {
		EmptyStrings.push_back(MDString::get(FN->getContext(), ""));
		}

		if (Arch == GPUArch::SPIR32 \|\| Arch == GPUArch::SPIR64) {
		FN->setMetadata("kernel_arg_addr_space",
		singam-sanjayUnsubmitted Done Reply Inline Actions @PhilippSchaad I'm not yet acquainted with the SPIR annotations. Could you please explain why it matters to let the SPIR backend know whether a kernel argument is a read only ? singam-sanjay: @PhilippSchaad I'm not yet acquainted with the SPIR annotations. Could you please explain why…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions The issue here is the Intel Beignet driver. Beignet expects a very specific set of kernel annotations when parsing the Kernel IR. The metadata is simply required to be there, but for the kernels we generate here, it does not matter what is actually contained in there. This is entirely Beignet dependent, maybe Intel will adapt that in the future. PhilippSchaad: The issue here is the Intel Beignet driver. Beignet expects a very specific set of kernel…
		singam-sanjayUnsubmitted Done Reply Inline Actions Ohk. Thanks for the info ! singam-sanjay: Ohk. Thanks for the info !
		MDNode::get(FN->getContext(), MemoryType));
		FN->setMetadata("kernel_arg_name",
		singam-sanjayUnsubmitted Done Reply Inline Actions Why are you setting these metadata to empty strings ? Is this a feature yet to be implemented ? singam-sanjay: Why are you setting these metadata to empty strings ? Is this a feature yet to be implemented ?
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Dito above. PhilippSchaad: Dito above.
		singam-sanjayUnsubmitted Done Reply Inline Actions Alright. I guess beignet isn't worried about the names of kernel arguments in "kernel_arg_name". singam-sanjay: Alright. I guess beignet isn't worried about the names of kernel arguments in "kernel_arg_name".
		MDNode::get(FN->getContext(), EmptyStrings));
		FN->setMetadata("kernel_arg_access_qual",
		MDNode::get(FN->getContext(), EmptyStrings));
		FN->setMetadata("kernel_arg_type",
		MDNode::get(FN->getContext(), EmptyStrings));
		FN->setMetadata("kernel_arg_type_qual",
		MDNode::get(FN->getContext(), EmptyStrings));
		FN->setMetadata("kernel_arg_base_type",
		MDNode::get(FN->getContext(), EmptyStrings));
		}

switch (Arch) {		switch (Arch) {
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
FN->setCallingConv(CallingConv::PTX_Kernel);		FN->setCallingConv(CallingConv::PTX_Kernel);
break;		break;
		case GPUArch::SPIR32:
		case GPUArch::SPIR64:
		FN->setCallingConv(CallingConv::SPIR_KERNEL);
		break;
}		}

auto Arg = FN->arg_begin();		auto Arg = FN->arg_begin();
for (long i = 0; i < Kernel->n_array; i++) {		for (long i = 0; i < Kernel->n_array; i++) {
if (!ppcg_kernel_requires_array_argument(Kernel, i))		if (!ppcg_kernel_requires_array_argument(Kernel, i))
continue;		continue;

Arg->setName(Kernel->array[i].array->name);		Arg->setName(Kernel->array[i].array->name);
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	GPUNodeBuilder::createKernelFunctionDecl(ppcg_kernel *Kernel,
return FN;		return FN;
}		}

void GPUNodeBuilder::insertKernelIntrinsics(ppcg_kernel *Kernel) {		void GPUNodeBuilder::insertKernelIntrinsics(ppcg_kernel *Kernel) {
Intrinsic::ID IntrinsicsBID[2];		Intrinsic::ID IntrinsicsBID[2];
Intrinsic::ID IntrinsicsTID[3];		Intrinsic::ID IntrinsicsTID[3];

switch (Arch) {		switch (Arch) {
		case GPUArch::SPIR64:
		case GPUArch::SPIR32:
		grosserUnsubmitted Done Reply Inline Actions Why do you emit NVIDIA intrinsics. As discussed above, this there no way to emit the OpenCL intrinsics instead? grosser: Why do you emit NVIDIA intrinsics. As discussed above, this there no way to emit the OpenCL…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Dito above. PhilippSchaad: Dito above.
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
IntrinsicsBID[0] = Intrinsic::nvvm_read_ptx_sreg_ctaid_x;		IntrinsicsBID[0] = Intrinsic::nvvm_read_ptx_sreg_ctaid_x;
IntrinsicsBID[1] = Intrinsic::nvvm_read_ptx_sreg_ctaid_y;		IntrinsicsBID[1] = Intrinsic::nvvm_read_ptx_sreg_ctaid_y;

IntrinsicsTID[0] = Intrinsic::nvvm_read_ptx_sreg_tid_x;		IntrinsicsTID[0] = Intrinsic::nvvm_read_ptx_sreg_tid_x;
IntrinsicsTID[1] = Intrinsic::nvvm_read_ptx_sreg_tid_y;		IntrinsicsTID[1] = Intrinsic::nvvm_read_ptx_sreg_tid_y;
IntrinsicsTID[2] = Intrinsic::nvvm_read_ptx_sreg_tid_z;		IntrinsicsTID[2] = Intrinsic::nvvm_read_ptx_sreg_tid_z;
break;		break;
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	for (long i = 0; i < Kernel->n_array; i++) {
Arg++;		Arg++;
}		}
}		}

void GPUNodeBuilder::finalizeKernelArguments(ppcg_kernel *Kernel) {		void GPUNodeBuilder::finalizeKernelArguments(ppcg_kernel *Kernel) {
auto *FN = Builder.GetInsertBlock()->getParent();		auto *FN = Builder.GetInsertBlock()->getParent();
auto Arg = FN->arg_begin();		auto Arg = FN->arg_begin();

bool StoredScalar = false;		bool StoredScalar = false;
		grosserUnsubmitted Done Reply Inline Actions If you store the group names in an array, you can just index this array with "i". grosser: If you store the group names in an array, you can just index this array with "i".
for (long i = 0; i < Kernel->n_array; i++) {		for (long i = 0; i < Kernel->n_array; i++) {
if (!ppcg_kernel_requires_array_argument(Kernel, i))		if (!ppcg_kernel_requires_array_argument(Kernel, i))
continue;		continue;

isl_id *Id = isl_space_get_tuple_id(Prog->array[i].space, isl_dim_set);		isl_id *Id = isl_space_get_tuple_id(Prog->array[i].space, isl_dim_set);
const ScopArrayInfo *SAI = ScopArrayInfo::getFromId(isl_id_copy(Id));		const ScopArrayInfo *SAI = ScopArrayInfo::getFromId(isl_id_copy(Id));
isl_id_free(Id);		isl_id_free(Id);

▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	void GPUNodeBuilder::createKernelFunction(
switch (Arch) {		switch (Arch) {
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
if (Runtime == GPURuntime::CUDA)		if (Runtime == GPURuntime::CUDA)
GPUModule->setTargetTriple(Triple::normalize("nvptx64-nvidia-cuda"));		GPUModule->setTargetTriple(Triple::normalize("nvptx64-nvidia-cuda"));
else if (Runtime == GPURuntime::OpenCL)		else if (Runtime == GPURuntime::OpenCL)
GPUModule->setTargetTriple(Triple::normalize("nvptx64-nvidia-nvcl"));		GPUModule->setTargetTriple(Triple::normalize("nvptx64-nvidia-nvcl"));
GPUModule->setDataLayout(computeNVPTXDataLayout(true /* is64Bit */));		GPUModule->setDataLayout(computeNVPTXDataLayout(true /* is64Bit */));
break;		break;
		case GPUArch::SPIR32:
		GPUModule->setTargetTriple(Triple::normalize("spir-unknown-unknown"));
		GPUModule->setDataLayout(computeSPIRDataLayout(false /* is64Bit */));
		break;
		case GPUArch::SPIR64:
		GPUModule->setTargetTriple(Triple::normalize("spir64-unknown-unknown"));
		GPUModule->setDataLayout(computeSPIRDataLayout(true /* is64Bit */));
		break;
}		}

Function *FN = createKernelFunctionDecl(Kernel, SubtreeValues);		Function *FN = createKernelFunctionDecl(Kernel, SubtreeValues);

BasicBlock *PrevBlock = Builder.GetInsertBlock();		BasicBlock *PrevBlock = Builder.GetInsertBlock();
auto EntryBlock = BasicBlock::Create(Builder.getContext(), "entry", FN);		auto EntryBlock = BasicBlock::Create(Builder.getContext(), "entry", FN);

DT.addNewBlock(EntryBlock, PrevBlock);		DT.addNewBlock(EntryBlock, PrevBlock);

Builder.SetInsertPoint(EntryBlock);		Builder.SetInsertPoint(EntryBlock);
Builder.CreateRetVoid();		Builder.CreateRetVoid();
Builder.SetInsertPoint(EntryBlock, EntryBlock->begin());		Builder.SetInsertPoint(EntryBlock, EntryBlock->begin());

ScopDetection::markFunctionAsInvalid(FN);		ScopDetection::markFunctionAsInvalid(FN);

prepareKernelArguments(Kernel, FN);		prepareKernelArguments(Kernel, FN);
createKernelVariables(Kernel, FN);		createKernelVariables(Kernel, FN);
insertKernelIntrinsics(Kernel);		insertKernelIntrinsics(Kernel);
}		}

std::string GPUNodeBuilder::createKernelASM() {		std::string GPUNodeBuilder::createKernelASM() {
		grosserUnsubmitted Done Reply Inline Actions Maybe call this insertCUDAKernelIntrinsics and insertSPIRKernelIntrincics for consistency? grosser: Maybe call this insertCUDAKernelIntrinsics and insertSPIRKernelIntrincics for consistency?
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions The reason I kept this separated the way it is now, without specifying "CUDA" in the earlier version, is that SPIR should be the only example. When AMD support follows, or anything where there's a registered LLVM target with corresponding intrinsics (AMDGPU in that case), the original function could be used with just an additional switch case, changing the intrinsics IDs. Do you agree with that? PhilippSchaad: The reason I kept this separated the way it is now, without specifying "CUDA" in the earlier…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions And with 'example' I mean 'exception', my bad. PhilippSchaad: And with 'example' I mean 'exception', my bad.
		grosserUnsubmitted Done Reply Inline Actions Sure. Still, the function names could be more similar. What about insertKernelCallsSPIR? grosser: Sure. Still, the function names could be more similar. What about insertKernelCallsSPIR?
llvm::Triple GPUTriple;		llvm::Triple GPUTriple;

switch (Arch) {		switch (Arch) {
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
switch (Runtime) {		switch (Runtime) {
case GPURuntime::CUDA:		case GPURuntime::CUDA:
GPUTriple = llvm::Triple(Triple::normalize("nvptx64-nvidia-cuda"));		GPUTriple = llvm::Triple(Triple::normalize("nvptx64-nvidia-cuda"));
break;		break;
case GPURuntime::OpenCL:		case GPURuntime::OpenCL:
GPUTriple = llvm::Triple(Triple::normalize("nvptx64-nvidia-nvcl"));		GPUTriple = llvm::Triple(Triple::normalize("nvptx64-nvidia-nvcl"));
break;		break;
}		}
break;		break;
		case GPUArch::SPIR64:
		case GPUArch::SPIR32:
		llvm_unreachable("Cannot generate ASM for SPIR architecture");
		break;
		singam-sanjayUnsubmitted Done Reply Inline Actions nitpick: Do we need this ? singam-sanjay: nitpick: Do we need this ?
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions We can change it to an if-clause if that is desired. The compiler just complains when we have a switch and do not check every possible value of `Arch`. An alternative would be a default clause handling that `llvm_unreachable`. PhilippSchaad: We can change it to an if-clause if that is desired. The compiler just complains when we have a…
		singam-sanjayUnsubmitted Done Reply Inline Actions I was talking about the `break;` after `llvm_unreachable`. If `llvm_unreachable` would call `abort`, the `break` would be redundant. singam-sanjay: I was talking about the `break;` after `llvm_unreachable`. If `llvm_unreachable` would call…
}		}

std::string ErrMsg;		std::string ErrMsg;
auto GPUTarget = TargetRegistry::lookupTarget(GPUTriple.getTriple(), ErrMsg);		auto GPUTarget = TargetRegistry::lookupTarget(GPUTriple.getTriple(), ErrMsg);

if (!GPUTarget) {		if (!GPUTarget) {
errs() << ErrMsg << "\n";		errs() << ErrMsg << "\n";
return "";		return "";
}		}

TargetOptions Options;		TargetOptions Options;
Options.UnsafeFPMath = FastMath;		Options.UnsafeFPMath = FastMath;

std::string subtarget;		std::string subtarget;

switch (Arch) {		switch (Arch) {
case GPUArch::NVPTX64:		case GPUArch::NVPTX64:
subtarget = CudaVersion;		subtarget = CudaVersion;
break;		break;
		case GPUArch::SPIR32:
		case GPUArch::SPIR64:
		llvm_unreachable("No subtarget for SPIR architecture");
		break;
		singam-sanjayUnsubmitted Done Reply Inline Actions nitpick: Do we need this ? singam-sanjay: nitpick: Do we need this ?
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Dito above. PhilippSchaad: Dito above.
		singam-sanjayUnsubmitted Done Reply Inline Actions Ditto above ;-). singam-sanjay: Ditto above ;-).
}		}

std::unique_ptr<TargetMachine> TargetM(GPUTarget->createTargetMachine(		std::unique_ptr<TargetMachine> TargetM(GPUTarget->createTargetMachine(
GPUTriple.getTriple(), subtarget, "", Options, Optional<Reloc::Model>()));		GPUTriple.getTriple(), subtarget, "", Options, Optional<Reloc::Model>()));

SmallString<0> ASMString;		SmallString<0> ASMString;
raw_svector_ostream ASMStream(ASMString);		raw_svector_ostream ASMStream(ASMString);
llvm::legacy::PassManager PM;		llvm::legacy::PassManager PM;

PM.add(createTargetTransformInfoWrapperPass(TargetM->getTargetIRAnalysis()));		PM.add(createTargetTransformInfoWrapperPass(TargetM->getTargetIRAnalysis()));

if (TargetM->addPassesToEmitFile(		if (TargetM->addPassesToEmitFile(
PM, ASMStream, TargetMachine::CGFT_AssemblyFile, true /* verify */)) {		PM, ASMStream, TargetMachine::CGFT_AssemblyFile, true /* verify */)) {
errs() << "The target does not support generation of this file type!\n";		errs() << "The target does not support generation of this file type!\n";
return "";		return "";
}		}

PM.run(*GPUModule);		PM.run(*GPUModule);

return ASMStream.str();		return ASMStream.str();
}		}

		std::string StringReplace(std::string const &in, std::string const &replace,
		std::string const &with) {
		return std::regex_replace(in, std::regex(replace), with);
		}

		std::string GPUNodeBuilder::createKernelSPIR(std::string IR) {
		IR = StringReplace(IR, "declare i32 @llvm.nvvm.read.ptx.sreg.tid.x\$\$",
		"declare spir_func i32 @__gen_ocl_get_local_id0()");
		IR = StringReplace(IR, "declare i32 @llvm.nvvm.read.ptx.sreg.tid.y\$\$",
		"declare spir_func i32 @__gen_ocl_get_local_id1()");
		IR = StringReplace(IR, "declare i32 @llvm.nvvm.read.ptx.sreg.tid.z\$\$",
		"declare spir_func i32 @__gen_ocl_get_local_id2()");

		IR = StringReplace(IR, "declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x\$\$",
		"declare spir_func i32 @__gen_ocl_get_group_id0()");
		IR = StringReplace(IR, "declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.y\$\$",
		"declare spir_func i32 @__gen_ocl_get_group_id1()");
		IR = StringReplace(IR, "declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.z\$\$",
		"declare spir_func i32 @__gen_ocl_get_group_id2()");

		IR = StringReplace(IR, "call i32 @llvm.nvvm.read.ptx.sreg.tid.x\$\$",
		"call spir_func i32 @__gen_ocl_get_local_id0()");
		IR = StringReplace(IR, "call i32 @llvm.nvvm.read.ptx.sreg.tid.y\$\$",
		"call spir_func i32 @__gen_ocl_get_local_id1()");
		IR = StringReplace(IR, "call i32 @llvm.nvvm.read.ptx.sreg.tid.z\$\$",
		"call spir_func i32 @__gen_ocl_get_local_id2()");

		IR = StringReplace(IR, "call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x\$\$",
		"call spir_func i32 @__gen_ocl_get_group_id0()");
		IR = StringReplace(IR, "call i32 @llvm.nvvm.read.ptx.sreg.ctaid.y\$\$",
		"call spir_func i32 @__gen_ocl_get_group_id1()");
		IR = StringReplace(IR, "call i32 @llvm.nvvm.read.ptx.sreg.ctaid.z\$\$",
		"call spir_func i32 @__gen_ocl_get_group_id2()");

		return IR;
		}

std::string GPUNodeBuilder::finalizeKernelFunction() {		std::string GPUNodeBuilder::finalizeKernelFunction() {

		grosserUnsubmitted Done Reply Inline Actions It seems all these regexp hacks are needed because we don't generate the right intrinsics. What is holding us back from generating them directly? grosser: It seems all these regexp hacks are needed because we don't generate the right intrinsics. What…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Dito above. PhilippSchaad: Dito above.
if (verifyModule(*GPUModule)) {		if (verifyModule(*GPUModule)) {
DEBUG(dbgs() << "verifyModule failed on module:\n";		DEBUG(dbgs() << "verifyModule failed on module:\n";
GPUModule->print(dbgs(), nullptr); dbgs() << "\n";);		GPUModule->print(dbgs(), nullptr); dbgs() << "\n";);

if (FailOnVerifyModuleFailure)		if (FailOnVerifyModuleFailure)
llvm_unreachable("VerifyModule failed.");		llvm_unreachable("VerifyModule failed.");

BuildSuccessful = false;		BuildSuccessful = false;
return "";		return "";
}		}

if (DumpKernelIR)		if (DumpKernelIR)
outs() << *GPUModule << "\n";		outs() << *GPUModule << "\n";

		if (Arch != GPUArch::SPIR32 && Arch != GPUArch::SPIR64) {
// Optimize module.		// Optimize module.
llvm::legacy::PassManager OptPasses;		llvm::legacy::PassManager OptPasses;
PassManagerBuilder PassBuilder;		PassManagerBuilder PassBuilder;
PassBuilder.OptLevel = 3;		PassBuilder.OptLevel = 3;
PassBuilder.SizeLevel = 0;		PassBuilder.SizeLevel = 0;
PassBuilder.populateModulePassManager(OptPasses);		PassBuilder.populateModulePassManager(OptPasses);
OptPasses.run(*GPUModule);		OptPasses.run(*GPUModule);
		}
		grosserUnsubmitted Not Done Reply Inline Actions Interesting. So the optimization break stuff. Would be great to understand what exactly we brake! grosser: Interesting. So the optimization break stuff. Would be great to understand what exactly we…
		PhilippSchaadAuthorUnsubmitted Not Done Reply Inline Actions Looking into that. What is clear, is that the optimized version skips a few 'dummy' computations, because the optimized kernels expect a smaller workgroup size in OpenCL. Not sure if and how we could account for that. It seems very tricky to me atm, have played around with it quite a bit. PhilippSchaad: Looking into that. What is clear, is that the optimized version skips a few 'dummy'…
		grosserUnsubmitted Not Done Reply Inline Actions It's OK. Let's get the remainder of the patch updated, then we can look at this issue later on. grosser: It's OK. Let's get the remainder of the patch updated, then we can look at this issue later on.

		std::string Assembly;

std::string Assembly = createKernelASM();		if (Arch == GPUArch::SPIR32 \|\| Arch == GPUArch::SPIR64) {
		std::string IR;
		raw_string_ostream IROstream(IR);
		IROstream << *GPUModule;
		IROstream.flush();
		Assembly = createKernelSPIR(IR);
		} else {
		Assembly = createKernelASM();
		}
		singam-sanjayUnsubmitted Done Reply Inline Actions Would it be better to move the SPIR specific code into createKernelASM, as well ? You might not need the switch cases to handle buggy control flow like, case GPUArch::SPIR64: case GPUArch::SPIR32: llvm_unreachable("Cannot generate ASM for SPIR architecture"); singam-sanjay: Would it be better to move the SPIR specific code into createKernelASM, as well ? You might not…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Will do, good point. Originally moved it out because we thought we had to do a lot more hacking on the resulting IR. PhilippSchaad: Will do, good point. Originally moved it out because we thought we had to do a lot more hacking…
		singam-sanjayUnsubmitted Done Reply Inline Actions Alright. singam-sanjay: Alright.

if (DumpKernelASM)		if (DumpKernelASM)
outs() << Assembly << "\n";		outs() << Assembly << "\n";

GPUModule.release();		GPUModule.release();
KernelIDs.clear();		KernelIDs.clear();

return Assembly;		return Assembly;
▲ Show 20 Lines • Show All 972 Lines • Show Last 20 Lines

lib/Support/RegisterPasses.cpp

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	cl::values(clEnumValN(GPURuntime::CUDA, "libcudart",
"use the CUDA Runtime API"),		"use the CUDA Runtime API"),
clEnumValN(GPURuntime::OpenCL, "libopencl",		clEnumValN(GPURuntime::OpenCL, "libopencl",
"use the OpenCL Runtime API")),		"use the OpenCL Runtime API")),
cl::init(GPURuntime::CUDA), cl::ZeroOrMore, cl::cat(PollyCategory));		cl::init(GPURuntime::CUDA), cl::ZeroOrMore, cl::cat(PollyCategory));

static cl::opt<GPUArch>		static cl::opt<GPUArch>
GPUArchChoice("polly-gpu-arch", cl::desc("The GPU Architecture to target"),		GPUArchChoice("polly-gpu-arch", cl::desc("The GPU Architecture to target"),
cl::values(clEnumValN(GPUArch::NVPTX64, "nvptx64",		cl::values(clEnumValN(GPUArch::NVPTX64, "nvptx64",
"target NVIDIA 64-bit architecture")),		"target NVIDIA 64-bit architecture"),
		clEnumValN(GPUArch::SPIR32, "spir32",
		"target SPIR 32-bit architecture"),
		clEnumValN(GPUArch::SPIR64, "spir64",
		"target SPIR 64-bit architecture")),
cl::init(GPUArch::NVPTX64), cl::ZeroOrMore,		cl::init(GPUArch::NVPTX64), cl::ZeroOrMore,
cl::cat(PollyCategory));		cl::cat(PollyCategory));
#endif		#endif

VectorizerChoice polly::PollyVectorizerChoice;		VectorizerChoice polly::PollyVectorizerChoice;
static cl::opt<polly::VectorizerChoice, true> Vectorizer(		static cl::opt<polly::VectorizerChoice, true> Vectorizer(
"polly-vectorizer", cl::desc("Select the vectorization strategy"),		"polly-vectorizer", cl::desc("Select the vectorization strategy"),
cl::values(		cl::values(
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

tools/GPURuntime/GPUJIT.c

Show All 16 Lines
#include <cuda.h>		#include <cuda.h>
#include <cuda_runtime.h>		#include <cuda_runtime.h>
#endif /* HAS_LIBCUDART */		#endif /* HAS_LIBCUDART */

#ifdef HAS_LIBOPENCL		#ifdef HAS_LIBOPENCL
#ifdef __APPLE__		#ifdef __APPLE__
#include <OpenCL/opencl.h>		#include <OpenCL/opencl.h>
#else		#else
		#ifdef HAS_INTEL_OCL
		#include <CL/cl_intel.h>
		#else
#include <CL/cl.h>		#include <CL/cl.h>
#endif		#endif /* HAS_INTEL_OCL */
		#endif /* __APPLE__ */
#endif /* HAS_LIBOPENCL */		#endif /* HAS_LIBOPENCL */

#include <dlfcn.h>		#include <dlfcn.h>
#include <stdarg.h>		#include <stdarg.h>
#include <stdio.h>		#include <stdio.h>
#include <string.h>		#include <string.h>
		#include <unistd.h>

static int DebugMode;		static int DebugMode;
static int CacheMode;		static int CacheMode;

static PollyGPURuntime Runtime = RUNTIME_NONE;		static PollyGPURuntime Runtime = RUNTIME_NONE;

static void debug_print(const char *format, ...) {		static void debug_print(const char *format, ...) {
if (!DebugMode)		if (!DebugMode)
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines

typedef cl_int		typedef cl_int
clEnqueueWriteBufferFcnTy(cl_command_queue CommandQueue, cl_mem Buffer,		clEnqueueWriteBufferFcnTy(cl_command_queue CommandQueue, cl_mem Buffer,
cl_bool BlockingWrite, size_t Offset, size_t Size,		cl_bool BlockingWrite, size_t Offset, size_t Size,
const void *Ptr, cl_uint NumEventsInWaitList,		const void *Ptr, cl_uint NumEventsInWaitList,
const cl_event EventWaitList, cl_event Event);		const cl_event EventWaitList, cl_event Event);
static clEnqueueWriteBufferFcnTy *clEnqueueWriteBufferFcnPtr;		static clEnqueueWriteBufferFcnTy *clEnqueueWriteBufferFcnPtr;

		typedef cl_program
		clCreateProgramWithLLVMIntelFcnTy(cl_context Context, cl_uint NumDevices,
		const cl_device_id *DeviceList,
		const char Filename, cl_int ErrcodeRet);
		static clCreateProgramWithLLVMIntelFcnTy *clCreateProgramWithLLVMIntelFcnPtr;
		singam-sanjayUnsubmitted Done Reply Inline Actions This is new to me. Why are you providing runtime support for Intel to JIT SPIR code ? Is it to run with the integrated GPU on an Intel CPU? Guessing from https://github.com/intel/beignet. singam-sanjay: This is new to me. Why are you providing runtime support for Intel to JIT SPIR code ? Is it to…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Your guess is exactly correct. This also corresponds to some of the points mentioned above. PhilippSchaad: Your guess is exactly correct. This also corresponds to some of the points mentioned above.
		singam-sanjayUnsubmitted Done Reply Inline Actions Ohk. singam-sanjay: Ohk.

typedef cl_program clCreateProgramWithBinaryFcnTy(		typedef cl_program clCreateProgramWithBinaryFcnTy(
cl_context Context, cl_uint NumDevices, const cl_device_id *DeviceList,		cl_context Context, cl_uint NumDevices, const cl_device_id *DeviceList,
const size_t Lengths, const unsigned char Binaries, cl_int BinaryStatus,		const size_t Lengths, const unsigned char Binaries, cl_int BinaryStatus,
cl_int *ErrcodeRet);		cl_int *ErrcodeRet);
static clCreateProgramWithBinaryFcnTy *clCreateProgramWithBinaryFcnPtr;		static clCreateProgramWithBinaryFcnTy *clCreateProgramWithBinaryFcnPtr;

		grosserUnsubmitted Done Reply Inline Actions Why do these declarations need to be optional? I think they should compile easily in any situation, no? grosser: Why do these declarations need to be optional? I think they should compile easily in any…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Oh yeah, my bad. Only the LLVMIntel one has to be conditional. PhilippSchaad: Oh yeah, my bad. Only the LLVMIntel one has to be conditional.
typedef cl_int clBuildProgramFcnTy(		typedef cl_int clBuildProgramFcnTy(
cl_program Program, cl_uint NumDevices, const cl_device_id *DeviceList,		cl_program Program, cl_uint NumDevices, const cl_device_id *DeviceList,
const char *Options,		const char *Options,
void(CL_CALLBACK pfn_notify)(cl_program Program, void UserData),		void(CL_CALLBACK pfn_notify)(cl_program Program, void UserData),
void *UserData);		void *UserData);
static clBuildProgramFcnTy *clBuildProgramFcnPtr;		static clBuildProgramFcnTy *clBuildProgramFcnPtr;

typedef cl_kernel clCreateKernelFcnTy(cl_program Program,		typedef cl_kernel clCreateKernelFcnTy(cl_program Program,
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	static void getAPIHandleCL(void Handle, const char *FuncName) {
if ((Err = dlerror()) != 0) {		if ((Err = dlerror()) != 0) {
fprintf(stderr, "Load OpenCL Runtime API failed: %s. \n", Err);		fprintf(stderr, "Load OpenCL Runtime API failed: %s. \n", Err);
return 0;		return 0;
}		}
return FuncPtr;		return FuncPtr;
}		}

static int initialDeviceAPILibrariesCL() {		static int initialDeviceAPILibrariesCL() {
		#ifdef HAS_INTEL_OCL
		HandleOpenCL = dlopen("/usr/local/lib/beignet/libcl.so", RTLD_LAZY);
		#else
HandleOpenCL = dlopen("libOpenCL.so", RTLD_LAZY);		HandleOpenCL = dlopen("libOpenCL.so", RTLD_LAZY);
		#endif /* HAS_INTEL_OCL */
		grosserUnsubmitted Done Reply Inline Actions I am surprised. Intel's OpenCL uses such a different path. In fact, I would assume that libOpenCL.so as a generic OpenCL library would take care of loading beignet. Is this not the case? grosser: I am surprised. Intel's OpenCL uses such a different path. In fact, I would assume that…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions The funny thing is: it does, somewhat. If I only dlopen libOpenCL and then call for example clCreateProgramWithBinary, looking at that in gdb reveals that the beignet routines get loaded by libOpenCL. However as soon as a function is not present in the standard OpenCL API, eg. clCreateProgramWithLLVMIntel, it does not like that. PhilippSchaad: The funny thing is: it does, somewhat. If I only dlopen libOpenCL and then call for example…
		grosserUnsubmitted Done Reply Inline Actions OK. Can you then unconditionally try to load beignet/libcl.so into another variable, e.g. HandleOpenCLBeignet? I assume this handle should be nullptr if the dlopen fails, right? Can you then check if this handle is zero and use this to decide if you want to call clCreateProgramWithLLVMIntel? This should allow us to get rid of all the conditional compilation. grosser: OK. Can you then unconditionally try to load beignet/libcl.so into another variable, e.g.
if (!HandleOpenCL) {		if (!HandleOpenCL) {
fprintf(stderr, "Cannot open library: %s. \n", dlerror());		fprintf(stderr, "Cannot open library: %s. \n", dlerror());
return 0;		return 0;
}		}
return 1;		return 1;
}		}

/* Get function pointer to OpenCL Runtime API.		/* Get function pointer to OpenCL Runtime API.
Show All 10 Lines
* http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html		* http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html
*/		*/
#pragma GCC diagnostic push		#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpedantic"		#pragma GCC diagnostic ignored "-Wpedantic"
static int initialDeviceAPIsCL() {		static int initialDeviceAPIsCL() {
if (initialDeviceAPILibrariesCL() == 0)		if (initialDeviceAPILibrariesCL() == 0)
return 0;		return 0;

clGetPlatformIDsFcnPtr =		clGetPlatformIDsFcnPtr =
		singam-sanjayUnsubmitted Done Reply Inline Actions `void Handle = ( HandleOpenCLBeignet ? HandleOpenCLBeignet : HandleOpenCL );` maybe ? Let me know if this isn't recommended. singam-sanjay:* `void *Handle = ( HandleOpenCLBeignet ? HandleOpenCLBeignet : HandleOpenCL );` maybe ? Let me…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Definitely the prettier solution. Will change. PhilippSchaad: Definitely the prettier solution. Will change.
		singam-sanjayUnsubmitted Done Reply Inline Actions `void Handle = ( HandleOpenCLBeignet!=NULL ? HandleOpenCLBeignet : HandleOpenCL );` would be better. singam-sanjay:* `void *Handle = ( HandleOpenCLBeignet!=NULL ? HandleOpenCLBeignet : HandleOpenCL );` would be…
(clGetPlatformIDsFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetPlatformIDs");		(clGetPlatformIDsFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetPlatformIDs");

		singam-sanjayUnsubmitted Not Done Reply Inline Actions Is it that we prefer to use libraries provided by Intel's SDK and inegrated GPU over other OpenCL drivers and devices ? I'm assuming that this leads the runtime to use the integrated GPU even when a more powerful Radeon card is present. Please correct me if I'm wrong. singam-sanjay: Is it that we prefer to use libraries provided by Intel's SDK and inegrated GPU over other…
		PhilippSchaadAuthorUnsubmitted Not Done Reply Inline Actions This is actually a good point. We don't want to prefer the Intel provided SDK, the choice of selecting Intel's Beignet should be dependent on what -polly-gpu-arch value has been selected. AMD is in the pipeline. PhilippSchaad: This is actually a good point. We don't want to prefer the Intel provided SDK, the choice of…
		singam-sanjayUnsubmitted Not Done Reply Inline Actions Alright. Any ideas on how you'd differentiate between the integrated and dedicated GPU at command line ? singam-sanjay: Alright. Any ideas on how you'd differentiate between the integrated and dedicated GPU at…
		PhilippSchaadAuthorUnsubmitted Not Done Reply Inline Actions The goal is to run on AMD GPUs with the AMDGPU backend, which would mean the differentiation would be done with -polly-gpu-arch=spir32/amdgcn32 for example. I am not yet sure how to transport that choice to here, but I will look into that as I am working on the AMD side of things. Thanks for pointing out this flaw. PhilippSchaad: The goal is to run on AMD GPUs with the AMDGPU backend, which would mean the differentiation…
clGetDeviceIDsFcnPtr =		clGetDeviceIDsFcnPtr =
(clGetDeviceIDsFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetDeviceIDs");		(clGetDeviceIDsFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetDeviceIDs");

clGetDeviceInfoFcnPtr =		clGetDeviceInfoFcnPtr =
(clGetDeviceInfoFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetDeviceInfo");		(clGetDeviceInfoFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetDeviceInfo");

clGetKernelInfoFcnPtr =		clGetKernelInfoFcnPtr =
(clGetKernelInfoFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetKernelInfo");		(clGetKernelInfoFcnTy *)getAPIHandleCL(HandleOpenCL, "clGetKernelInfo");

clCreateContextFcnPtr =		clCreateContextFcnPtr =
(clCreateContextFcnTy *)getAPIHandleCL(HandleOpenCL, "clCreateContext");		(clCreateContextFcnTy *)getAPIHandleCL(HandleOpenCL, "clCreateContext");

clCreateCommandQueueFcnPtr = (clCreateCommandQueueFcnTy *)getAPIHandleCL(		clCreateCommandQueueFcnPtr = (clCreateCommandQueueFcnTy *)getAPIHandleCL(
HandleOpenCL, "clCreateCommandQueue");		HandleOpenCL, "clCreateCommandQueue");

clCreateBufferFcnPtr =		clCreateBufferFcnPtr =
(clCreateBufferFcnTy *)getAPIHandleCL(HandleOpenCL, "clCreateBuffer");		(clCreateBufferFcnTy *)getAPIHandleCL(HandleOpenCL, "clCreateBuffer");

clEnqueueWriteBufferFcnPtr = (clEnqueueWriteBufferFcnTy *)getAPIHandleCL(		clEnqueueWriteBufferFcnPtr = (clEnqueueWriteBufferFcnTy *)getAPIHandleCL(
HandleOpenCL, "clEnqueueWriteBuffer");		HandleOpenCL, "clEnqueueWriteBuffer");

		clCreateProgramWithLLVMIntelFcnPtr =
		(clCreateProgramWithLLVMIntelFcnTy *)getAPIHandleCL(
		HandleOpenCL, "clCreateProgramWithLLVMIntel");

clCreateProgramWithBinaryFcnPtr =		clCreateProgramWithBinaryFcnPtr =
(clCreateProgramWithBinaryFcnTy *)getAPIHandleCL(		(clCreateProgramWithBinaryFcnTy *)getAPIHandleCL(
HandleOpenCL, "clCreateProgramWithBinary");		HandleOpenCL, "clCreateProgramWithBinary");

		grosserUnsubmitted Done Reply Inline Actions Does this break or does this just return nullptr if the function is not available. If we can always try and just detect if it is not around, this would be great! grosser: Does this break or does this just return nullptr if the function is not available. If we can…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions I'm gonna look into that, but iirc it didn't work. Might be wrong, will try it out again! PhilippSchaad: I'm gonna look into that, but iirc it didn't work. Might be wrong, will try it out again!
clBuildProgramFcnPtr =		clBuildProgramFcnPtr =
(clBuildProgramFcnTy *)getAPIHandleCL(HandleOpenCL, "clBuildProgram");		(clBuildProgramFcnTy *)getAPIHandleCL(HandleOpenCL, "clBuildProgram");

clCreateKernelFcnPtr =		clCreateKernelFcnPtr =
(clCreateKernelFcnTy *)getAPIHandleCL(HandleOpenCL, "clCreateKernel");		(clCreateKernelFcnTy *)getAPIHandleCL(HandleOpenCL, "clCreateKernel");

clSetKernelArgFcnPtr =		clSetKernelArgFcnPtr =
(clSetKernelArgFcnTy *)getAPIHandleCL(HandleOpenCL, "clSetKernelArg");		(clSetKernelArgFcnTy *)getAPIHandleCL(HandleOpenCL, "clSetKernelArg");
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	static PollyGPUFunction getKernelCL(const char BinaryBuffer,
}		}

if (!GlobalDeviceID) {		if (!GlobalDeviceID) {
fprintf(stderr, "GPGPU-code generation not initialized correctly.\n");		fprintf(stderr, "GPGPU-code generation not initialized correctly.\n");
exit(-1);		exit(-1);
}		}

cl_int Ret;		cl_int Ret;

		#ifdef HAS_INTEL_OCL
		// TODO: This is a workaround, since clCreateProgramWithLLVMIntel only
		// accepts a filename to a valid llvm-ir file as an argument, instead
		// of accepting the BinaryBuffer directly.
		FILE *fp = fopen("kernel.ll", "wb");
		if (fp != NULL) {
		fputs(BinaryBuffer, fp);
		fclose(fp);
		}

		((OpenCLKernel *)Function->Kernel)->Program =
		clCreateProgramWithLLVMIntelFcnPtr(
		((OpenCLContext *)GlobalContext->Context)->Context, 1,
		&GlobalDeviceID, "kernel.ll", &Ret);
		grosserUnsubmitted Done Reply Inline Actions Why do we read the file from "kernel.ll" and not use the file passed to us in BinaryBuffer? grosser: Why do we read the file from "kernel.ll" and not use the file passed to us in BinaryBuffer?
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Well the problem is that clCreateProgramWithLLVMIntel wants to read from a file, and requests a filename/filepath as an argument there. BinaryBuffer however holds the complete binary. If you know a simpler solution, that would of course be great, because this is another hacky-part. PhilippSchaad: Well the problem is that clCreateProgramWithLLVMIntel wants to read from a file, and requests…
		grosserUnsubmitted Done Reply Inline Actions I see. Maybe that's fine then. Just add a comment for now. I would prefer if we could use clCreateProgramWithBinary and just pass an LLVM bitcode file, but this likely only works if the LLVM versions of beignet is very modern. Let's leave it for now like this. It's good to have something working! grosser: I see. Maybe that's fine then. Just add a comment for now. I would prefer if we could use…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Totally agreed, that would be the better option. As you pointed out, it is currently not yet possible. PhilippSchaad: Totally agreed, that would be the better option. As you pointed out, it is currently not yet…
		grosserUnsubmitted Done Reply Inline Actions Another change. Could you possibly remove the conditional compilation here and just check if clCreateProgramWithLLVMIntelFcnPtr != nullptr? grosser: Another change. Could you possibly remove the conditional compilation here and just check if…
		singam-sanjayUnsubmitted Not Done Reply Inline Actions @grosser What is your rationale to avoid conditional compilation ? singam-sanjay: @grosser What is your rationale to avoid conditional compilation ?
		checkOpenCLError(Ret, "Failed to create program from llvm.\n");
		unlink("kernel.ll");
		#else
size_t BinarySize = strlen(BinaryBuffer);		size_t BinarySize = strlen(BinaryBuffer);
((OpenCLKernel *)Function->Kernel)->Program = clCreateProgramWithBinaryFcnPtr(		((OpenCLKernel *)Function->Kernel)->Program = clCreateProgramWithBinaryFcnPtr(
((OpenCLContext *)GlobalContext->Context)->Context, 1, &GlobalDeviceID,		((OpenCLContext *)GlobalContext->Context)->Context, 1, &GlobalDeviceID,
(const size_t )&BinarySize, (const unsigned char *)&BinaryBuffer, NULL,		(const size_t )&BinarySize, (const unsigned char *)&BinaryBuffer, NULL,
&Ret);		&Ret);
checkOpenCLError(Ret, "Failed to create program from binary.\n");		checkOpenCLError(Ret, "Failed to create program from binary.\n");
		#endif /* HAS_INTEL_OCL */

Ret = clBuildProgramFcnPtr(((OpenCLKernel *)Function->Kernel)->Program, 1,		Ret = clBuildProgramFcnPtr(((OpenCLKernel *)Function->Kernel)->Program, 1,
&GlobalDeviceID, NULL, NULL, NULL);		&GlobalDeviceID, NULL, NULL, NULL);
checkOpenCLError(Ret, "Failed to build program.\n");		checkOpenCLError(Ret, "Failed to build program.\n");
		singam-sanjayUnsubmitted Not Done Reply Inline Actions Is this file opened inside <llvm_build>/bin or $PWD ? If it's $PWD, we should to take precautions to not open a file that the user needs. You could instead open a file in /tmp or %TEMP% in Windows, like "/tmp/R@ηdηдm3.ll". Anyways, you could implement this later. singam-sanjay: Is this file opened inside <llvm_build>/bin or $PWD ? If it's $PWD, we should to take…
		PhilippSchaadAuthorUnsubmitted Not Done Reply Inline Actions I will add this later, but yes it is $PWD. The solution with using /tmp and %TEMP% seem reasonable. PhilippSchaad: I will add this later, but yes it is $PWD. The solution with using /tmp and %TEMP% seem…
		singam-sanjayUnsubmitted Not Done Reply Inline Actions Alright. singam-sanjay: Alright.

((OpenCLKernel *)Function->Kernel)->Kernel = clCreateKernelFcnPtr(		((OpenCLKernel *)Function->Kernel)->Kernel = clCreateKernelFcnPtr(
((OpenCLKernel *)Function->Kernel)->Program, KernelName, &Ret);		((OpenCLKernel *)Function->Kernel)->Program, KernelName, &Ret);
checkOpenCLError(Ret, "Failed to create kernel.\n");		checkOpenCLError(Ret, "Failed to create kernel.\n");

((OpenCLKernel *)Function->Kernel)->BinaryString = BinaryBuffer;		((OpenCLKernel *)Function->Kernel)->BinaryString = BinaryBuffer;
		singam-sanjayUnsubmitted Done Reply Inline Actions Was this supposed to a message letting the user know that the runtime's not using the "beignet" OpenCL implementation ? singam-sanjay: Was this supposed to a message letting the user know that the runtime's not using the "beignet"…
		PhilippSchaadAuthorUnsubmitted Done Reply Inline Actions Oops, thanks for pointing this out, this was unintentionally left in and should never be reached! Used it for debugging. Removing it. PhilippSchaad: Oops, thanks for pointing this out, this was unintentionally left in and should never be…
		singam-sanjayUnsubmitted Done Reply Inline Actions 👍 singam-sanjay: 👍

if (CacheMode) {		if (CacheMode) {
if (KernelCache[NextCacheItem])		if (KernelCache[NextCacheItem])
freeKernelCL(KernelCache[NextCacheItem]);		freeKernelCL(KernelCache[NextCacheItem]);

KernelCache[NextCacheItem] = Function;		KernelCache[NextCacheItem] = Function;

NextCacheItem = (NextCacheItem + 1) % KERNEL_CACHE_SIZE;		NextCacheItem = (NextCacheItem + 1) % KERNEL_CACHE_SIZE;
▲ Show 20 Lines • Show All 1,144 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for IntelClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 106000

CMakeLists.txt

include/polly/CodeGen/PPCGCodeGeneration.h

lib/CodeGen/PPCGCodeGeneration.cpp

lib/Support/RegisterPasses.cpp

tools/GPURuntime/GPUJIT.c

[Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for Intel
ClosedPublic