This is an archive of the discontinued LLVM Phabricator instance.

[Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for Intel
ClosedPublic

Authored by PhilippSchaad on Jul 9 2017, 11:51 AM.

Details

Summary

Added SPIR Code Generation to the PPCG Code Generator. This can be invoked using
the polly-gpu-arch flag value 'spir32' or 'spir64' for 32 and 64 bit code respectively.
In addition to that, runtime support has been added to execute said SPIR code on Intel
GPU's, where the system is equipped with Intel's open source driver Beignet (development
version). This requires the cmake flag 'USE_INTEL_OCL' to be turned on, and the polly-gpu-runtime
flag value to be 'libopencl'.
The transformation of LLVM IR to SPIR is currently quite a hack, consisting in part of regex
string transformations.
Has been tested (working) with Polybench 3.2 on an Intel i7-5500U (integrated graphics chip).

Event Timeline

PhilippSchaad created this revision.Jul 9 2017, 11:51 AM
PhilippSchaad set the repository for this revision to rL LLVM.
PhilippSchaad added a project: Restricted Project.
PhilippSchaad edited the summary of this revision. (Show Details)
PhilippSchaad added a subscriber: pollydev.
grosser edited edge metadata.Jul 10 2017, 1:50 PM

Hi Philipp,

this looks indeed already surprisingly good. As you noted, there are still some hacks, but I have good hopes that we can work around most of them. I added some comments. Let me know what you think!

Best,
Tobias

lib/CodeGen/PPCGCodeGeneration.cpp
40

After having removed the hack below, this should hopefully not be needed any more.

1259

Why do you emit NVIDIA intrinsics for SPIR code? Can you not emit an opencl barrier?

If you don't know what intrinsic corresponds to this, create a simple .cl file, compile it with clang to SPIR and see which intrinsic gets emitted.

1782

Instead of spreading the SPIR code all over the place, can we create one function addSPIRMetadata, that takes 'Args' as argument, emits all the metadata needed and also emits the metadata you add a little further below?

1882

Why do you emit NVIDIA intrinsics. As discussed above, this there no way to emit the OpenCL intrinsics instead?

2181

It seems all these regexp hacks are needed because we don't generate the right intrinsics. What is holding us back from generating them directly?

2204

Interesting. So the optimization break stuff. Would be great to understand what exactly we brake!

tools/GPURuntime/GPUJIT.c
158

Why do these declarations need to be optional? I think they should compile easily in any situation, no?

228

I am surprised. Intel's OpenCL uses such a different path. In fact, I would assume that libOpenCL.so as a generic OpenCL library would take care of loading beignet. Is this not the case?

286

Does this break or does this just return nullptr if the function is not available. If we can always try and just detect if it is not around, this would be great!

517

Why do we read the file from "kernel.ll" and not use the file passed to us in BinaryBuffer?

grosser requested changes to this revision.Jul 10 2017, 1:51 PM

Set this to request changes, that I get notified in case you update this revision.

This revision now requires changes to proceed.Jul 10 2017, 1:51 PM
PhilippSchaad added inline comments.Jul 10 2017, 4:06 PM
lib/CodeGen/PPCGCodeGeneration.cpp
40

That is correct, yes. This should only be needed as long as we have to manually change the string occurrences for barriers and global/local IDs.

1259

The generated intrinsics are the ones declared in the SPIR reference, so for example @_Z13get_global_idj(i32 0) . I cannot find any corresponding intrinsics in LLVM's TableGen/Intrinsics, did I overlook some? This is essentially the only real point where the hack is needed. If we can insert correct intrinsics, we can ditch the regex. Because fixing this is all the regex does.

1782

Will look into that.

1882

Dito above.

2181

Dito above.

2204

Looking into that. What is clear, is that the optimized version skips a few 'dummy' computations, because the optimized kernels expect a smaller workgroup size in OpenCL. Not sure if and how we could account for that. It seems very tricky to me atm, have played around with it quite a bit.

tools/GPURuntime/GPUJIT.c
158

Oh yeah, my bad. Only the LLVMIntel one has to be conditional.

228

The funny thing is: it does, somewhat. If I only dlopen libOpenCL and then call for example clCreateProgramWithBinary, looking at that in gdb reveals that the beignet routines get loaded by libOpenCL. However as soon as a function is not present in the standard OpenCL API, eg. clCreateProgramWithLLVMIntel, it does not like that.

286

I'm gonna look into that, but iirc it didn't work. Might be wrong, will try it out again!

517

Well the problem is that clCreateProgramWithLLVMIntel wants to read from a file, and requests a filename/filepath as an argument there. BinaryBuffer however holds the complete binary. If you know a simpler solution, that would of course be great, because this is another hacky-part.

PhilippSchaad edited edge metadata.
PhilippSchaad marked 4 inline comments as done.Jul 10 2017, 5:29 PM
  • Adapted dynamic method loading for intel
grosser added inline comments.Jul 10 2017, 10:08 PM
CMakeLists.txt
178

Forgot to mention, but would be great if we could get away without a configure test here.

lib/CodeGen/PPCGCodeGeneration.cpp
1259

No, I think they may not exist. However, you can just create declarations for such functions using code as in "GPUNodeBuilder::createCallGetKernel(".

1782

I just see this is not only about the llvm::Type, but also carries an integer which is 0 for local/shared memory and 1 otherwise. So moving this into a function does not seem so easy. If you don't find an easy way, just leave it as it is.

2204

It's OK. Let's get the remainder of the patch updated, then we can look at this issue later on.

tools/GPURuntime/GPUJIT.c
517

I see. Maybe that's fine then. Just add a comment for now.

I would prefer if we could use clCreateProgramWithBinary and just pass an LLVM bitcode file, but this likely only works if the LLVM versions of beignet is very modern. Let's leave it for now like this. It's good to have something working!

grosser requested changes to this revision.Jul 10 2017, 10:15 PM
This revision now requires changes to proceed.Jul 10 2017, 10:15 PM
PhilippSchaad edited edge metadata.
PhilippSchaad marked 3 inline comments as done.
  • Inserting SPIR barriers with custom function.
CMakeLists.txt
178

Yes I agree. I don't know of a method for detecting a Beignet installation with CMake yet, but if that exists, that would of course be great.

tools/GPURuntime/GPUJIT.c
517

Totally agreed, that would be the better option. As you pointed out, it is currently not yet possible.

  • Added comment clarification for workaround.
  • Removed Regexp hack
PhilippSchaad marked 6 inline comments as done.Jul 11 2017, 4:58 AM

Nice. Can you potentially also add a test case?

CMakeLists.txt
178

I don't think we want to add this to cmake at all. At best, we just compile the beignet support in unconditionally, if this is possibile.

lib/CodeGen/PPCGCodeGeneration.cpp
1948

If you store the group names in an array, you can just index this array with "i".

2080

Maybe call this

insertCUDAKernelIntrinsics

and

insertSPIRKernelIntrincics

for consistency?

tools/GPURuntime/GPUJIT.c
228

OK. Can you then unconditionally try to load beignet/libcl.so into another variable, e.g. HandleOpenCLBeignet? I assume this handle should be nullptr if the dlopen fails, right? Can you then check if this handle is zero and use this to decide if you want to call clCreateProgramWithLLVMIntel? This should allow us to get rid of all the conditional compilation.

grosser requested changes to this revision.Jul 11 2017, 6:26 AM
This revision now requires changes to proceed.Jul 11 2017, 6:26 AM
grosser added inline comments.Jul 11 2017, 6:29 AM
tools/GPURuntime/GPUJIT.c
517

Another change. Could you possibly remove the conditional compilation here and just check if clCreateProgramWithLLVMIntelFcnPtr != nullptr?

PhilippSchaad edited edge metadata.
PhilippSchaad marked 8 inline comments as done.
  • Simplification
  • Removed compile-conditional CMake dependency

I am not yet too familiar with how to implement such a case and what exactly to check for, but if that is desired, I can look into it.

This should now address all concerns, minus the file-workaround.

lib/CodeGen/PPCGCodeGeneration.cpp
2080

The reason I kept this separated the way it is now, without specifying "CUDA" in the earlier version, is that SPIR should be the only example. When AMD support follows, or anything where there's a registered LLVM target with corresponding intrinsics (AMDGPU in that case), the original function could be used with just an additional switch case, changing the intrinsics IDs. Do you agree with that?

PhilippSchaad added inline comments.Jul 11 2017, 7:54 AM
lib/CodeGen/PPCGCodeGeneration.cpp
2080

And with 'example' I mean 'exception', my bad.

grosser requested changes to this revision.Jul 11 2017, 8:47 AM

Great. Yes, a test case would be great. It is not very complicated. You already updated test cases in your last patch. I would suggest you copy one of the simple double nested loop kernels from the PTX test and just check that the right SPIR intrinsics are generated! That should be enough.

Otherwise, I think this is good to go. Very nice work!

lib/CodeGen/PPCGCodeGeneration.cpp
2080

Sure. Still, the function names could be more similar. What about insertKernelCallsSPIR?

This revision now requires changes to proceed.Jul 11 2017, 8:47 AM
PhilippSchaad edited edge metadata.
PhilippSchaad marked 11 inline comments as done.
  • Added testcase for SPIR
  • Fixed naming inconsistencies
grosser accepted this revision.Jul 14 2017, 11:55 AM
grosser added a reviewer: singam-sanjay.

LGTM from my side.

This revision is now accepted and ready to land.Jul 14 2017, 11:55 AM

@singam-sanjay can you have a final look and commit it if you are OK with this patch.

singam-sanjay edited edge metadata.Jul 15 2017, 7:40 AM

Sure. Will do in sometime.

@grosser Thanks for adding me as a reviewer ! It's helped me acquaint myself with the SPIR V codegen and understand GPUJIT better.

@PhilippSchaad I've made suggestions to restructure code, a possibly unintentional debug print statement and added some questions about the SPIR. Do let me know if (and why) some of my suggestions are unnecessary.

General Question: I was following a discussion about integrating the SPIR-V backend as a part of LLVM, i.e. moving it from /tools to lib/Target. If that were to happen, would we be able to generate and feed SPIR IR in the same way as NVPTX ?

lib/CodeGen/PPCGCodeGeneration.cpp
1794

@PhilippSchaad I'm not yet acquainted with the SPIR annotations. Could you please explain why it matters to let the SPIR backend know whether a kernel argument is a read only ?

1796

Why are you setting these metadata to empty strings ? Is this a feature yet to be implemented ?

2097

nitpick: Do we need this ?

2120

nitpick: Do we need this ?

2208–2216

Would it be better to move the SPIR specific code into createKernelASM, as well ? You might not need the switch cases to handle buggy control flow like,

case GPUArch::SPIR64:
case GPUArch::SPIR32:
  llvm_unreachable("Cannot generate ASM for SPIR architecture");
tools/GPURuntime/GPUJIT.c
151

This is new to me. Why are you providing runtime support for Intel to JIT SPIR code ? Is it to run with the integrated GPU on an Intel CPU? Guessing from https://github.com/intel/beignet.

255

void *Handle = ( HandleOpenCLBeignet ? HandleOpenCLBeignet : HandleOpenCL ); maybe ?

Let me know if this isn't recommended.

257

Is it that we prefer to use libraries provided by Intel's SDK and inegrated GPU over other OpenCL drivers and devices ?

I'm assuming that this leads the runtime to use the integrated GPU even when a more powerful Radeon card is present. Please correct me if I'm wrong.

517

@grosser What is your rationale to avoid conditional compilation ?

531

Is this file opened inside <llvm_build>/bin or $PWD ? If it's $PWD, we should to take precautions to not open a file that the user needs. You could instead open a file in /tmp or %TEMP% in Windows, like "/tmp/R@ηdηдm3.ll". Anyways, you could implement this later.

537

Was this supposed to a message letting the user know that the runtime's not using the "beignet" OpenCL implementation ?

Will address the points mentioned asap. As for your SPIR-V question: The goal, should that back-end be added, would be to add SPIR-V compilation as an additional GPU target option (In the future). This would allow kernel execution on any SPIR-V supporting target device. This SPIR solution is essentially a 'quickfix' to get it to work on Intel, as long as SPIR-V is not an option.

lib/CodeGen/PPCGCodeGeneration.cpp
1794

The issue here is the Intel Beignet driver. Beignet expects a very specific set of kernel annotations when parsing the Kernel IR. The metadata is simply required to be there, but for the kernels we generate here, it does not matter what is actually contained in there. This is entirely Beignet dependent, maybe Intel will adapt that in the future.

1796

Dito above.

2097

We can change it to an if-clause if that is desired. The compiler just complains when we have a switch and do not check every possible value of Arch. An alternative would be a default clause handling that llvm_unreachable.

2120

Dito above.

2208–2216

Will do, good point. Originally moved it out because we thought we had to do a lot more hacking on the resulting IR.

tools/GPURuntime/GPUJIT.c
151

Your guess is exactly correct. This also corresponds to some of the points mentioned above.

255

Definitely the prettier solution. Will change.

257

This is actually a good point. We don't want to prefer the Intel provided SDK, the choice of selecting Intel's Beignet should be dependent on what -polly-gpu-arch value has been selected. AMD is in the pipeline.

531

I will add this later, but yes it is $PWD. The solution with using /tmp and %TEMP% seem reasonable.

537

Oops, thanks for pointing this out, this was unintentionally left in and should never be reached! Used it for debugging. Removing it.

singam-sanjay added inline comments.Jul 17 2017, 12:56 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1794

Ohk. Thanks for the info !

1796

Alright. I guess beignet isn't worried about the names of kernel arguments in "kernel_arg_name".

2097

I was talking about the break; after llvm_unreachable. If llvm_unreachable would call abort, the break would be redundant.

2120

Ditto above ;-).

2208–2216

Alright.

tools/GPURuntime/GPUJIT.c
151

Ohk.

255

void *Handle = ( HandleOpenCLBeignet!=NULL ? HandleOpenCLBeignet : HandleOpenCL ); would be better.

257

Alright.

Any ideas on how you'd differentiate between the integrated and dedicated GPU at command line ?

531

Alright.

537

👍

PhilippSchaad set the repository for this revision to rL LLVM.

Removed left over debug print, moved SPIR creation into createASM, fixed minor issues addressed in comments.

PhilippSchaad marked 24 inline comments as done.Jul 17 2017, 5:25 PM
PhilippSchaad added inline comments.
tools/GPURuntime/GPUJIT.c
257

The goal is to run on AMD GPUs with the AMDGPU backend, which would mean the differentiation would be done with -polly-gpu-arch=spir32/amdgcn32 for example. I am not yet sure how to transport that choice to here, but I will look into that as I am working on the AMD side of things. Thanks for pointing out this flaw.

Ping do the latest changes address your concerns @singam-sanjay ? Can I land this?

Rebase for commit
This revision was automatically updated to reflect the committed changes.