This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen
ClosedPublic

Authored by PhilippSchaad on Apr 24 2017, 6:39 AM.

Details

Summary

When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The
GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library
for that purpose, correctly choosing the corresponding library calls to the
option chosen when compiling (via different initialization calls).

Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far).

Event Timeline

PhilippSchaad created this revision.Apr 24 2017, 6:39 AM
PhilippSchaad retitled this revision from Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen to [Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen.Apr 24 2017, 6:44 AM
PhilippSchaad set the repository for this revision to rL LLVM.
PhilippSchaad added a project: Restricted Project.
PhilippSchaad added subscribers: pollydev, llvm-commits.
  • Replaced magic numbers, added assertions and fixed if-braces.
Meinersbur edited edge metadata.Apr 25 2017, 4:31 AM

I wrote a runtime with similar scope here: https://github.com/Meinersbur/prl . We were one discussing to use it for Polly as well. What's the status of that?

lib/CodeGen/PPCGCodeGeneration.cpp
57–59

Did you consider an enum?

1694

Is there some vendor-neutral triple?

1807

Why a static flag?

2683–2689

Did you consider

Pass *polly::createPPCGCodeGenerationPass(int Runtime);

?

lib/Support/RegisterPasses.cpp
328–332

A switch instead?

I wrote a runtime with similar scope here: https://github.com/Meinersbur/prl . We were one discussing to use it for Polly as well. What's the status of that?

I have looked into it a tiny little bit about a month ago, but had then decided to write a basic OpenCL Runtime from scratch in GPUJIT. So to my knowledge, nothing has changed on that status yet.

Currently looking into the rest of your comment-mentioned points.

lib/CodeGen/PPCGCodeGeneration.cpp
1694

Do you mean like nvptx64-nvcl / nvptx64-cuda?

PhilippSchaad added inline comments.Apr 25 2017, 9:38 AM
lib/CodeGen/PPCGCodeGeneration.cpp
2683–2689

That seems reasonable, but I get a template-conflict for the LLVM Pass-Creation template when trying to change the pass-creation-method structure. I thought it might be easier this way?

PhilippSchaad added inline comments.Apr 25 2017, 9:45 AM
lib/CodeGen/PPCGCodeGeneration.cpp
2683–2689

Correction: looking at wrong function of course, you mean a different one :-)

  • Stylistic changes and switch to -polly-gpu-runtime=cuda/opencl compiler flag
PhilippSchaad marked 6 inline comments as done.Apr 25 2017, 12:20 PM
  • Removed left over commented out macros
Meinersbur added inline comments.Apr 25 2017, 1:36 PM
lib/CodeGen/PPCGCodeGeneration.cpp
59–61

See http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly for LLVM's coding policy for enum members.

Nitpick: A "T" suffix is rather unusual.

157

Nitpick: No need to use an enum qualifier.

1694

I hoped that there might be some kind of triple that works for OpenCL in general, not only for nvidia (nvptx, nvcl). If the generated program only works for devices that support cuda anyway, I don't see where the benefit of such a backend is.

If there is indeed no backend that also works on non-nvidia devices, should we call the the runtime accordingly, e.g. "nvcl" then?

lib/Support/RegisterPasses.cpp
328–332

Now that createPPCGCodeGenerationPass takes an argument, you don't need a switch anymore.

tools/GPURuntime/GPUJIT.c
308–316

Consistent variable name style? What style do you intend to use in this file?

352–353

Replace the magic number 256 by sizeof(DeviceRevision)?

etherzhhb added inline comments.Apr 25 2017, 5:12 PM
include/polly/LinkAllPasses.h
52–53

is this Runtime supposed to be with type GPURuntimeT ? it is a little bit tricky here.
Maybe we need to introduce a PPCG header and define the runtime enum there, than include that runtime enum.

or we can declare the function as

llvm::Pass *createPPCGCodeGenerationPass(int Runtime = 0);

to at least avoid the magic number 0 in line 86.

lib/CodeGen/PPCGCodeGeneration.cpp
1694

for opencl, it can be "spir-unknown-unknown" or "spir64-unknown-unknown", but that may not work :)

Looking into the rest of your comments.

include/polly/LinkAllPasses.h
52–53

Yes, it would be. The reason it's not is exactly the one you mentioned. I was considering adding a PPCG header, but refrained from it because I was hesitant about creating a header 'just for one enum'. If you agree that this is a good solution, I will indeed introduce a new header for PPCG and define the enum there, to get rid of magic numbers. The second option seems reasonable too though.

lib/CodeGen/PPCGCodeGeneration.cpp
1694

Looking into it. The next goal would be to add the AMDGPU backend to generate AMD ISA, which would then again utilize the same OpenCL Runtime implemented here. (I realize there will have to be some naming changes to make that clear in the GPUJIT, but as you pointed out, I have a naming-mess to fix there anyway.

etherzhhb added inline comments.Apr 26 2017, 12:12 AM
include/polly/LinkAllPasses.h
52–53

we could start from the second option if you think it is reasonable

  • Addressed consistency and naming concerns
PhilippSchaad marked 7 inline comments as done.Apr 26 2017, 3:15 AM
PhilippSchaad edited the summary of this revision. (Show Details)Apr 26 2017, 3:19 AM
  • Made CUDA Runtime default, fixed formatting, adapted test case
grosser edited edge metadata.Apr 27 2017, 1:52 AM

Hi Philip and others,

this already looks very cool. I also added some minor comments.

Best,
Tobias

lib/CodeGen/PPCGCodeGeneration.cpp
57–59

You can use C++11 enums ala

enum class GPURuntime { CUDA, OpenCL };

1694

Making OpenCL work for CUDA is just the first step. I expect that when adding AMDGPU support, we will use here different triples depending on which vendor to target. AMD will have a specific one, CUDA will have a specific one, and for Intel we likely use the generic SPIR-V comment. I assume this could then also work for Xilinx.

  • Fixed enum style to C++11
PhilippSchaad marked an inline comment as done.Apr 27 2017, 5:03 AM
Meinersbur added inline comments.Apr 27 2017, 6:55 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1694

At compile time, we don't know on which hardware it will run on, so we cannot specify a triple here.

Unless you think of a runtime dispatch system, then you need to generate all kernels at once. In that case, I still would like to select a single target only for when I know I will run only on that hardware and to keep the executable small.

PhilippSchaad added inline comments.Apr 27 2017, 7:49 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1694

I thought the goal was to let the user compile for a specific target, i.e. providing something like -polly-gpu-arch=amd/nvidia/intel, and then choosing the correct target triple according to said selection. Meaning for example -polly-gpu-arch=amd would utilize the AMDGPU backend triple and feed that into the OpenCL runtime.

Am I misunderstanding something?

Meinersbur added inline comments.Apr 27 2017, 8:48 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1694

I think we were miscommunicating. The -polly-gpu-arch switch is new to me and doesn't appear in this patch. I assumed a fat executable when you mentioned an AMD backend.

OpenCL claims to be hardware-independent with platform's driver translating OpenCL-C or SPIR(-V) to its proprietary format. In NVidia's terminology, CUDA is a platform of which CUDA C++, their OpenCL implementation, cudart (CUDA runtime) etc. are part of. We are still vendor-locked to CUDA, since it only works with CUDA's OpenCL runtime library. -polly-gpu-runtime=opencl therefore is misleading (at least it was to me), it it no alternative to CUDA.

It might resolve if it is indeed just the runtime library GPUJIT is linked to. If so, could you make it more clear? I suggest the following switches:

-polly-target=cpu/gpu

if -polly-target=gpu then
-polly-gpu-arch=nvptx64/hsa/spir/spir-v/opencl-c
(with -polly-gpu-arch=nvptx64 the only one implemented so far)

if -polly-gpu-arch=nvptx64 then there is a choice between
-polly-cuda-runtime=libcudart/libopencl

  • GPURuntime works on systems with just one of CUDA/OpenCL now.
PhilippSchaad added inline comments.Apr 29 2017, 6:40 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1694

This change should address exactly this. The framework is now set to introduce new architectures and utilize eg. the AMDGPU backend instead of NVPTX etc.

singam-sanjay added inline comments.
lib/CodeGen/PPCGCodeGeneration.cpp
1728

Does nvptx64-nvidia-nvcl mean OpenCL code meant to be run on NVIDIA GPUs ?

PhilippSchaad added inline comments.Apr 29 2017, 7:40 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1728

Yes, exactly. It generates a slightly different flavor of PTX, which can be used by OpenCL to generate a kernel from the PTX binary (on NVIDIA GPUs). If you were to use the standard CUDA PTX, OpenCL would complain because of wrong argument accesses.

PhilippSchaad edited the summary of this revision. (Show Details)Apr 29 2017, 8:14 AM
PhilippSchaad set the repository for this revision to rL LLVM.
singam-sanjay added inline comments.Apr 29 2017, 9:33 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1728

Okay. From what you're saying, nvptx64-nvidia-nvcl indicates that backend must generate NVPTX code for a 64bit architecture for an NVIDIA GPU controlled by a OpenCL driver. Please correct me if I'm wrong.

PhilippSchaad added inline comments.Apr 29 2017, 9:37 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1728

That is correct.

singam-sanjay added inline comments.Apr 29 2017, 11:25 PM
lib/CodeGen/PPCGCodeGeneration.cpp
1728

Thank you ! That was helpful.

  • Integrated D32226 - Managed memory support
  • Fixed formatting and managed-memory test case (including pre-existing bug)

I only consider the clSetKernelArg as a remaining bigger issue. Having only "polly_"-prefixed function non-static would also be great.

Tobias is the sole author of Polly-ACC. I think he should give the final LGTM.

include/polly/LinkAllPasses.h
52–53

I assume you kept the arguments of type int to not include header files here.

lib/CodeGen/PPCGCodeGeneration.cpp
764–767

Can you make this a switch so you get warned by the compiler when adding more runtimes?

1725–1728

Could also be a switch.

2692

We should check whether the arguments are valid. Such as:

switch (Runtime) {
case 1:
case 2:
  ...
default:
  llvm_unreachable("Invalid argument for Runtime");
}
2694–2698

Similarly:

switch (Arch) {
case 1:
  ...
default:
  llvm_unreachable("Invalid argument for Arch");
}
lib/Support/RegisterPasses.cpp
330–332
int Arch;
switch (GPUArch) {
case GPU_ARCH_NVPTX64;
  Arch = 1;
  break;
}

int Runtime;
switch (GPURuntime) {
case GPU_RUNTIME_CUDA:
  Runtime = 1;
  break;
case GPU_RUNTIME_OPENCL:
  Runtime = 2;
  breal
}

PM.add(polly::createPPCGCodeGenerationPass(Arch, Runtime))

With "you don't need a switch anymore", I was thinking about the like of:

PM.add(polly::createPPCGCodeGenerationPass(1, GPURuntime + 1));

Your choice.

It could be helpful to have createPPCGCodeGenerationPass accept the GPURuntime enum as arguments instead.

In your solution, I don't see the use of static const int local variables. If you want identifiers that give names to the accepted arguments, declare them as #define or static const int in the header file that also declares createPPCGCodeGenerationPass, so these can be used in the implementation of createPPCGCodeGenerationPass as well.

tools/GPURuntime/GPUJIT.c
397–401

Did you consider introducing a new function this sequence of code? It appears quite often.

610–618

Trying each argument size after the other and hoping one matches is not good. The caller must know the argument sizes. You probably have to pass the sizes in another argument to launchKernelCL that contains those sizes for each argument, generated by Polly.

Without this, the code will fail if you pass a struct (or vector) of size other than 8, 4, 2, or 1.

676

Shouldn't these print to stderr?

750

The function name does not follow the naming of other functions in this file. In C it is common have the public API functions prefixed with the library name (here: "polly") and everything else static. Don't choose the prefix of another library (here: "cl_"). This avoids symbol conflicts because multiple libraries happen to give the same name for a function.

PhilippSchaad marked 17 inline comments as done.May 2 2017, 1:13 PM
PhilippSchaad added inline comments.
include/polly/LinkAllPasses.h
52–53

That is correct, yes.

lib/Support/RegisterPasses.cpp
330–332

With "you don't need a switch anymore", I was thinking about the like of:

PM.add(polly::createPPCGCodeGenerationPass(1, GPURuntime + 1));

Your choice.

Personally, I think the current solution is tiny little bit more 'documenting'. Both is fine though, good call.

It could be helpful to have createPPCGCodeGenerationPass accept the GPURuntime enum as arguments instead.

That is true, would mean having to provide a header with that GPURuntime enum instead though, right?

In your solution, I don't see the use of static const int local variables. If you want identifiers that give names to the accepted arguments, declare them as #define or static const int in the header file that also declares createPPCGCodeGenerationPass, so these can be used in the implementation of createPPCGCodeGenerationPass as well.

Would it maybe make sense to introduce a PPCGCodeGeneration header at this point?

tools/GPURuntime/GPUJIT.c
610–618

Yes, this is a priority issue still. The issue will have to be resolved at some point. This is basically a temporary way around some (probably) major argument handling changes in PPCG etc.

  • Addressed multiple issues pointed out in comment
  • Fixed formatting
Meinersbur accepted this revision.May 4 2017, 2:41 AM

You changed stdout to stderr everywhere, which is better in my point of view, but logically is a different change. Sorry that I didn't realize that the libcudart also printed to stdout before so you tried to be consistent. Could you commit that change separately beforehand? (Maybe also the change in argument name capitalization)

I am accepting the patch proved that you are going to improve the clSetKernelArg situation later and add a TODO into the code about it. The other stuff is of stylistic nature only.

Please also wait for Tobias' approval.

lib/Support/RegisterPasses.cpp
330–332

Please remove at least the static keyword. It makes sense for global constants, but not for function-local ones.

The style

static const int ArgumentName = 0;
func(ArgumentName);

rather unusual in LLVM-style code (but not bad if applied consistenly, which is unfortunately not the case in Polly). I've seen

func(/* ArgumentName = */ 0);

much more often.

In this case I think UseOpenCLRuntime, UseCUDARuntime and TargetNVPTX64 should really be global constants declated close to the declaration of createPPCGCodeGenerationPass so it can be used by ever caller of that function.

Would it maybe make sense to introduce a PPCGCodeGeneration header at this point?

Yes, that sounds good to me as well.

tools/GPURuntime/GPUJIT.c
606–607

Thanks for the introduction of checkOpenCLError. You could also introduce one for these two lines. for instance:

if (!GlobalContext)
  handleError("GPGPU-code generation not correctly initialized.\n");

handleError could also be called by checkOpenCLError. It helps centralising the error handling, such that if we change some detail about it (e.g. the return code on exit, or some cleanup code), there is a single function for that.

610–618

Also note that I am not sure that OpenCL ICD's are required to check for correct CL_INVALID_ARG_SIZE. It might just trust the caller, or be a badly written one.

961–967

These are unrelated changes Tobias usually complains about. I personally don't care.

1101

Unrelated whitespace change?

This revision is now accepted and ready to land.May 4 2017, 2:41 AM
PhilippSchaad marked 8 inline comments as done.May 4 2017, 3:45 AM

Addressed most of your concerns. @grosser it should be ready now, what do you think?

  • Introduced PPCGCodeGeneration header file for simplicity

@Meinersbur the unrelated changes you mentioned have been added/moved to D32852 and D32854.

grosser accepted this revision.May 4 2017, 3:52 AM
grosser added inline comments.
test/GPGPU/cuda-managed-memory-simple.ll
49

This change is unrelated.

PhilippSchaad added inline comments.May 4 2017, 3:55 AM
test/GPGPU/cuda-managed-memory-simple.ll
49

It is, but it got fixed in the meantime anyway. Removing it.

This revision was automatically updated to reflect the committed changes.
PhilippSchaad reopened this revision.May 7 2017, 2:36 AM

Reopened for rebase

This revision is now accepted and ready to land.May 7 2017, 2:36 AM
This revision was automatically updated to reflect the committed changes.