This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] Emit enqueued block as kernel
ClosedPublic

Authored by yaxunl on Sep 21 2017, 7:48 AM.

Details

Summary

In OpenCL the kernel function and non-kernel function has different calling conventions.
For certain targets they have different argument ABIs. Also kernels have special function
attributes and metadata for runtime to launch them.

The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such,
the block invoke function should be emitted as kernel with proper calling convention and
argument ABI.

This patch emits enqueued block as kernel. If a block is both called directly and passed
to enqueue_kernel, separate functions will be generated.

Diff Detail

Repository
rL LLVM

Event Timeline

yaxunl created this revision.Sep 21 2017, 7:48 AM
Anastasia edited edge metadata.Sep 21 2017, 10:48 AM

Now if we have a block which is being called and enqueued at the same time, will we generate 2 functions for it? Could we add such test case btw?

I feel it would be much simpler if we could always generate the kernel metadata for blocks. A lot of special case code would be removed if we do this. OpenCL doesn't prevent kernel functions to be used just as normal functions (6.7.1) so it should be a perfectly valid thing to do. Do you seen any issues with that?

lib/CodeGen/CGBlocks.cpp
1255 ↗(On Diff #116186)

Is there any test that covers this?

lib/CodeGen/CGOpenCLRuntime.cpp
113 ↗(On Diff #116186)

I am not particularly in favour of duplicating CodeGen functionality as it typically has so many special cases that are hard to catch. Is this logic needed in order to pass to block literal information that the block is enqueued?

lib/CodeGen/CodeGenFunction.cpp
535 ↗(On Diff #116186)

I don't quite understand why we need to special case this? As far as I undertsnad block argument is a generic void* type but it's being cast to a concrete block struct inside the block function. Do we gain anything from having it a specific type here?

yaxunl marked 3 inline comments as done.Sep 21 2017, 5:47 PM

Now if we have a block which is being called and enqueued at the same time, will we generate 2 functions for it? Could we add such test case btw?

Yes. It is covered by test/CodeGenOpenCL/cl20-device-side-enqueue.cl, line 246, 250, and 256.

I feel it would be much simpler if we could always generate the kernel metadata for blocks. A lot of special case code would be removed if we do this. OpenCL doesn't prevent kernel functions to be used just as normal functions (6.7.1) so it should be a perfectly valid thing to do. Do you seen any issues with that?

The special cases in metadata generation code is due to the first argument of LLVM block invoke function is not defined in BlockDecl. Emitting metadata for all block invoke functions does not help here.

lib/CodeGen/CGBlocks.cpp
1255 ↗(On Diff #116186)

Yes. This is covered by test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl where the block struct is passed directly to the kernel instead of by a pointer.

lib/CodeGen/CGOpenCLRuntime.cpp
113 ↗(On Diff #116186)

This code is needed to emit separate functions for a block which is directly called and also enqueued as a kernel. Since the kernel needs to have proper calling convention and ABI, it cannot be emitted as the same function as when the block is called directly. Since it is OpenCL specific code, I found it is cleaner to separate this code as member of CGOpenCLRuntime instead of fitting it into CGF.EmitBlockLiteral.

lib/CodeGen/CodeGenFunction.cpp
535 ↗(On Diff #116186)

This argument is not part of BlockDecl. BlockDecl only has arguments shown up in the source code. The first argument in the LLVM block invoke function is generated by codegen and there is no correspondence in AST, so it has to be handled as a special case.

I feel it would be much simpler if we could always generate the kernel metadata for blocks. A lot of special case code would be removed if we do this. OpenCL doesn't prevent kernel functions to be used just as normal functions (6.7.1) so it should be a perfectly valid thing to do. Do you seen any issues with that?

The special cases in metadata generation code is due to the first argument of LLVM block invoke function is not defined in BlockDecl. Emitting metadata for all block invoke functions does not help here.

To be more specific. I am just wondering what do we need for blocks to be used as kernels pragmatically. I feel it is essentially kernel calling convention and kernel metadata? The kernel arguments metadata however can be omitted because their type is fixed to be local void* and the number of arguments is passed into enqueue_kernel call so it is known at the enqueueing side too. The block descriptor can be passed as a generic pointer generic void* as it is cast to the right struct type inside the block invoke function anyway. So if we do this we can avoid adding a lot of extra code. Because blocks have reduced functionality compared to kernel functions. Also OpenCL allows kernel functions to be called just as normal functions so this way we can support second use case for blocks too. What do you think about it?

lib/CodeGen/CGOpenCLRuntime.cpp
113 ↗(On Diff #116186)

This part is replacing standard EmitScalarExpr call which is doing several things before calling into block generation. That's why I am a bit worried we are covering all the corner cases here.

So if we transform all blocks into kernels unconditionally we won't need this special handling then?

Do we generate two versions of the blocks now: one for enqueue and one for call?

lib/CodeGen/CodeGenFunction.cpp
535 ↗(On Diff #116186)

Considering that enqueued kernel always takes the same type of the arguments (local void*) and # of args is specified in enqueue_kernel, I was wondering whether we need to generate the information on the kernel parameters at all? The enqueueing side will have the information provided in the enqueue_kernel code.

As for the block itself it can be passed as generic void* and then cast to the block struct type inside the block itself.

667 ↗(On Diff #116186)

Why this change?

test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
3 ↗(On Diff #116186)

This struct is not identical to block literal struct?

yaxunl marked 10 inline comments as done.Sep 22 2017, 10:58 AM
yaxunl added inline comments.
lib/CodeGen/CGOpenCLRuntime.cpp
113 ↗(On Diff #116186)

If we transform all blocks into kernels, we could simplify the logic. Probably will not need this special handling.

However, when the block is called directly, the first argument is a private pointer, when it is executed as a kernel, the first argument is a global pointer or a struct (for amdgpu target), therefore the emitted functions cannot be the same.

lib/CodeGen/CodeGenFunction.cpp
535 ↗(On Diff #116186)

amdgpu backend relies on kernel argument metadata to generate some metadata in elf for runtime to launch the kernel. The backend expects the kernel argument metadata on each kernel argument. Not generating kernel metadata on the first kernel argument requires special handling in the backend. I think it is better to let Clang generate kernel argument metadata for all kernel arguments.

667 ↗(On Diff #116186)

CGM is no longer a function parameter since now this function requires a CGF parameter.

test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
3 ↗(On Diff #116186)

The LLVM type of the first argument of block invoke function is created directly with sorting and rearrangement. There is no AST type corresponding to it. However, the function codegen requires AST type of this argument. I feel it is unnecessary to create the corresponding AST type. For simplicity, just create an AST type with the same size and alignment as the LLVM type. In the function code, it will be bitcasted to the correct LLVM struct type and get the captured variables.

I think we should add a test case when the same block is both called and enqueued.

lib/CodeGen/CGOpenCLRuntime.cpp
113 ↗(On Diff #116186)

Would using generic address space for this first argument not work?

test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
3 ↗(On Diff #116186)

So void ptr won't be possible here? Since it is cast to a right struct inside the block anyway. Once again a block is a special type object with known semantic to compiler and runtime in contract to kernels that can be written with any arbitrary type of arguments.

I just don't like the idea of duplicating the block invoke function in case it's being both called and enqueued. Also the login in blocks code generation becomes difficult to understand. So I am wondering if we could perhaps create a separate kernel function (as a wrapper) for enqueue_kernel which would call a block instead. What do you think about it? I think the kernel prototype would be fairly generic as it would just have a block call inside and pass the arguments into it... We won't need to modify block generation then at all.

yaxunl marked 10 inline comments as done.Oct 4 2017, 1:47 PM

I think we should add a test case when the same block is both called and enqueued.

Will do.

test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
3 ↗(On Diff #116186)

Will emit a wrapper kernel which calls the block invoke function and keep the block invoke function unchanged.

yaxunl updated this revision to Diff 117739.Oct 4 2017, 2:18 PM
yaxunl marked an inline comment as done.
yaxunl edited the summary of this revision. (Show Details)

Emit enqueued block as a wrapper kernel which calls the block invoke function. Added test for calling and enqueue the same block.

Anastasia added inline comments.Oct 6 2017, 4:39 AM
lib/CodeGen/CGOpenCLRuntime.cpp
144 ↗(On Diff #117739)

Why do we need to replace original block calls with the kernels? I think in case of calling a block we could use the original block function and only for enqueue use the kernel that would call the block function inside. The pointer to the kernel wrapper could be passed as an additional parameter to enqueue_kernel calls. We won't need to iterate through all IR then.

lib/CodeGen/TargetInfo.cpp
8927 ↗(On Diff #117739)

Could you add some comments please?

8949 ↗(On Diff #117739)

Wondering if we should add the kernel metadata (w/o args) since it was used for long time to indicate the kernel.

lib/CodeGen/TargetInfo.h
35 ↗(On Diff #117739)

Do we need this?

test/CodeGenOpenCL/cl20-device-side-enqueue.cl
9 ↗(On Diff #117739)

Can we check generated kernel function too?

yaxunl marked 6 inline comments as done.Oct 6 2017, 8:43 AM
yaxunl added inline comments.
lib/CodeGen/CGOpenCLRuntime.cpp
144 ↗(On Diff #117739)

CGF.EmitScalarExpr(Block) returns the block literal structure which contains the size/align/invoke_function/captures. The block invoke function is stored to the struct by a StoreInst. To create the wrapper kernel, we need to get the block invoke function, therefore we have to iterate through IR.

Since we need to find the store instruction any way, it is simpler to just replace the stored function with the kernel and pass the block literal struct, instead of passing the kernel separately.

lib/CodeGen/TargetInfo.cpp
8927 ↗(On Diff #117739)

Will do.

8949 ↗(On Diff #117739)

Currently (before this change), clang already does not generate kernel metadata if there is no vec_type_hint, work_group_size_hint, reqd_work_group_size. Remember last time we made the change to use function metadata to represent these attributes. Whether a function is a kernel can be determined by its calling convention.

lib/CodeGen/TargetInfo.h
35 ↗(On Diff #117739)

Will remove it.

test/CodeGenOpenCL/cl20-device-side-enqueue.cl
9 ↗(On Diff #117739)

will do.

yaxunl updated this revision to Diff 118064.Oct 6 2017, 1:30 PM
yaxunl marked 5 inline comments as done.

Revise by Anastasia's comments.

Anastasia added inline comments.Oct 10 2017, 9:42 AM
lib/CodeGen/CGOpenCLRuntime.cpp
144 ↗(On Diff #117739)

So we cann't get the invoke function from the block literal structure passed into the kernel wrapper directly knowing its offset? Iterating through IR adds extra time and also I am not sure how reliable this is wrt different corner cases of IR.

lib/CodeGen/TargetInfo.cpp
8949 ↗(On Diff #117739)

Ok, let's leave it for now. We can always add it in on request.

test/CodeGenOpenCL/cl20-device-side-enqueue.cl
297 ↗(On Diff #118064)

Perhaps we could check the body of this one too since it has a different prototype.

yaxunl marked 4 inline comments as done.Oct 10 2017, 11:27 AM
yaxunl added inline comments.
lib/CodeGen/CGOpenCLRuntime.cpp
144 ↗(On Diff #117739)

Unfortunately the invoke function is not returned directly. Instead, it is buried in an LLVM value. And to extract the invoke function from the LLVM value we have to wade through a bunch of LLVM IRs.

There is one way to get the invoke function directly instead of going through IRs, but we need to change the functions for generating code for the blocks a little bit so that they return the block invoke function.

yaxunl updated this revision to Diff 118677.Oct 11 2017, 1:15 PM
yaxunl marked 2 inline comments as done.

Revised by Anastasia's comments. Get block invoke function by API instead of iterate through IR's. Pass the block kernel directly to __enqueu_kernel functions.

I think it would be good to add a block test to CodeGenOpenCL where we would just call the block without any enqueue and check that the invoke function is generated but the kernel wrapper isn't.

lib/CodeGen/CGBuiltin.cpp
2846 ↗(On Diff #118677)

Formatting seems inconsistent from above.

lib/CodeGen/CodeGenFunction.h
2921 ↗(On Diff #118677)

It will be nullptr in case block is not enqueued? May be it's worth explaining it in the comment.

lib/CodeGen/TargetInfo.h
290 ↗(On Diff #118677)

Can we also explain the wrapper kernel here?

yaxunl updated this revision to Diff 118795.Oct 12 2017, 8:49 AM
yaxunl marked 5 inline comments as done.

Revised by Anastasia's comments.

I think it would be good to add a block test to CodeGenOpenCL where we would just call the block without any enqueue and check that the invoke function is generated but the kernel wrapper isn't.

we have test/CodeGenOpenCL/blocks.cl which only calls a block. I can add check to make sure no kernels generated.

lib/CodeGen/CGBuiltin.cpp
2846 ↗(On Diff #118677)

Will fix.

lib/CodeGen/CodeGenFunction.h
2921 ↗(On Diff #118677)

Will do.

lib/CodeGen/TargetInfo.h
290 ↗(On Diff #118677)

Will do.

Anastasia accepted this revision.Oct 13 2017, 8:54 AM

LGTM! Great work! Thanks!

This revision is now accepted and ready to land.Oct 13 2017, 8:54 AM
This revision was automatically updated to reflect the committed changes.