PPCGCodeGeneration now attaches the size of the kernel launch parameters at the end of the parameter list. For the existing CUDA Runtime, this gets ignored, but the OpenCL Runtime knows to check for kernel-argument size at the end of the parameter list. (The resulting parameters list is twice as long. This has been accounted for in the corresponding test cases).
Details
- Reviewers
grosser Meinersbur bollu - Commits
- rGa90be207c60c: [Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen
rPLO302515: [Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen
rL302515: [Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1198 ↗ | (On Diff #98132) | I believe int NumArgs = F->getArgumentList().size() will also work? seems a little clearer to me. |
1198 ↗ | (On Diff #98132) | NumArgs could be const? It doesn't seem to be mutated anywhere. |
1199 ↗ | (On Diff #98132) | why not std::vector<int>(NumArgs)? That removes the explicit memory allocation/deallocation. I wish std::dynarray existed, since that's exactly the use case for this. |
1276 ↗ | (On Diff #98132) | The computation of SizeInBytes seems to be a pure computation which is repeated thrice in this function (GPUNodeBuilder::createLaunchParameters). Could this be refactored into a standalone function? That would also make the assignment look a little cleaner: ArgSizes[Index] = computeSizeInBytes(Val->getType()); |
1347 ↗ | (On Diff #98132) | Could the pattern of to-split.cpp Value *Slot = Builder.CreateGEP( Parameters, {Builder.getInt64(0), Builder.getInt64(Index)}); Value *ParamTyped = Builder.CreatePointerCast(Param, Builder.getInt8PtrTy()); Builder.CreateStore(ParamTyped, Slot); be refactored into a separate function if it is not too much trouble? It occurs thrice in this function (GPUNodeBuilder::createLaunchParameters) |
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1198 ↗ | (On Diff #98132) | Hm, it looks like it has been removed. [I was looking at an older reference to Function::getArgumentList](http://llvm.org/docs/doxygen/html/classllvm_1_1Function.html#a0a46edcf9b885556850d8ed9c49d9b52). Apologies! I believe that [llvm::Function::arg_size()](http://llvm.org/doxygen/classllvm_1_1Function.html#abccf59dbcc12707d079124e6bcfb4a47) should work for this purpose. |
This can be done in another patch, but I think the Index++ calls inside for loops can be hoisted as for(..; ..; i++, Index++). I personally find this clearer, but I'm not sure what the others think.
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
146 ↗ | (On Diff #98186) | getPrimitiveSizeInBits and getScalarSizeInBits accept const Type *T. Could you please change the Type *T to const Type *T? |
1199 ↗ | (On Diff #98132) | Have you considered std::vector<int> ArgSizes(NumArgs);? I find this more idiomatic. |
I personally find it better like this, because the i is per-loop, so it is self contained in every iteration, the Index however is across loops. To me, this way it is more clear, that the Index is not a loop-specific iteration variable.
Let's see what the others think though. Either way, I would suggest moving that to a different patch indeed, should we decide on changing that.
(Will address your commented points in just a second, agree with both of them.)
I personally like the Index++ calls in the loop body, just to add another vote to the bikeshedding. I agree that such a change is unrelated to this patch and if anybody feels strong, we should have a dedicated discussion about this topic outside of this thread. In any case, this should not affect this patch.