This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] Fix __enqueue_block for block with captures
ClosedPublic

Authored by yaxunl on Feb 13 2018, 8:05 AM.

Details

Summary

The following test case causes issue with codegen of __enqueue_block

void (^block)(void) = ^{ callee(id, out); };

enqueue_kernel(queue, 0, ndrange, block);

Clang first does codegen for block expression in the first line and deletes its block info.
Clang then tries to do codegen for the same block expression again for the second line,
and fails because the block info is gone.

The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to
record llvm block invoke function and llvm block literal emitted for each AST block
expression, and use the recorded information for generating the wrapper kernel.

The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen
of blocks.

Another minor issue is that some clean up AST expression is generated for block
with captures, which can be stripped by IgnoreImplicit.

Diff Detail

Event Timeline

yaxunl created this revision.Feb 13 2018, 8:05 AM
Anastasia accepted this revision.Feb 15 2018, 3:05 AM

LGTM! Thanks for looking at this. Just one thing, I was wondering whether it would be cleaner way if we extend test/CodeGenOpenCL/cl20-device-side-enqueue.cl instead of adding a new one here? Because this is the test that is meant to exercise all DSE codegen bits. Perhaps we can modify one block in that test to have the same format as here (i.e. using captures), since now we test the same block there most of the time. However, we don't test any of kernel wrapper *_block_invoke_kernel there. Not sure why...

This revision is now accepted and ready to land.Feb 15 2018, 3:05 AM

LGTM! Thanks for looking at this. Just one thing, I was wondering whether it would be cleaner way if we extend test/CodeGenOpenCL/cl20-device-side-enqueue.cl instead of adding a new one here? Because this is the test that is meant to exercise all DSE codegen bits. Perhaps we can modify one block in that test to have the same format as here (i.e. using captures), since now we test the same block there most of the time. However, we don't test any of kernel wrapper *_block_invoke_kernel there. Not sure why...

We do check block wrappers in cl20-device-side-enqueue.cl, which is done at the end of the test.

I did not add this test to cl20-device-side-enqueue.cl because cl20-device-side-enqueue.cl is already very complicated. I can add this test to cl20-device-side-enqueue.cl when committing.

This revision was automatically updated to reflect the committed changes.