This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] Add support of __opencl_c_device_enqueue feature macro.
AbandonedPublic

Authored by azabaznov on Oct 21 2021, 9:00 AM.

Details

Summary

This feature requires support of __opencl_c_generic_address_space,
so diagnostics for that is provided as well.

The main problem with device enqueue feature is that block literal for block
with no captures emitted in global address space. This is not correct if feature
for program scope global variables are not supported. This patch:

  • Disables generation of blocks in constant address space (temporally) if feature for program scope global variables is not supported, since such blocks always have no captures. Global blocks are not allowed in any other address space then constant if feature for program scope global variables is not supported.
  • For local blocks without captures generate block literal in local scope with use of generic address space. This is achieved with checks during code generation and not treating block literals with no captures as constant expressions.

Diff Detail

Event Timeline

azabaznov created this revision.Oct 21 2021, 9:00 AM
azabaznov requested review of this revision.Oct 21 2021, 9:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2021, 9:00 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

@Anastasia, @yaxunl, do you think it's possible to refactor code generation for blocks such that block literal for global blocks (with no captures) would be emitted in constant address space? Now it's emitted in global address space (for example @__block_literal_global in https://godbolt.org/z/4z8hGj7hz).

azabaznov edited the summary of this revision. (Show Details)Oct 21 2021, 9:06 AM
azabaznov edited the summary of this revision. (Show Details)Oct 21 2021, 9:09 AM
azabaznov edited the summary of this revision. (Show Details)Oct 21 2021, 11:20 AM

@Anastasia, @yaxunl, do you think it's possible to refactor code generation for blocks such that block literal for global blocks (with no captures) would be emitted in constant address space? Now it's emitted in global address space (for example @__block_literal_global in https://godbolt.org/z/4z8hGj7hz).

I think that the main intent of using generic was to simplify the code generation to the block invoke function, as it could always just use generic address space regardless the block kind, see @my_block_A_block_invoke in https://godbolt.org/z/956E1KrPP.

However it is probably possible to refactor clang to emit exact address space depending on the type of the block literal. However, it would break ABI for OpenCL 2.0 which is undesirable and also it would add extra complexity in already undocumented and hard to understand code. Plus there are always uncertainties around the exact implementation of blocks. So we would need a pretty decent prototype before we confirm this is actually even doable. Unless we find a strong need for this I would prefer to avoid this refactoring.

How about we clarify with Khronos whether it would be sufficient to add a restriction like:

Program scope blocks are only supported when program scope variables feature is supported.

And then see if this is acceptable route forward?

clang/lib/AST/ExprConstant.cpp
9554

What test case covers this change? It feels like something we should try to diagnose earlier...

clang/lib/CodeGen/CGOpenCLRuntime.cpp
191

This function feels like something that belongs better to OpenCLOptions rather than CodeGen?

clang/lib/Sema/SemaDecl.cpp
8024

Good spot, I didn't feel the intent was to allow qualifying the block by an address space... but I don't think this was ever clarified. I feel that the assumption was that blocks would always be in global address space...

How about we clarify with Khronos whether it would be sufficient to add a restriction like:

Program scope blocks are only supported when program scope variables feature is supported.

That's sounds good to me. Especially because this states nothing about if block has captures/no captures. Originally, I was thinking about something like: blocks in constant address space require program scope variables feature support, but that's too much relies on concrete ABI.

clang/lib/AST/ExprConstant.cpp
9554

The test case which exactly was added. Since blocks in constant address space are disallowed at this point, we can treat all other blocks with no captures not as constant expressions - it will make CodeGen generate block literal in local scope for blocks with no captures. See buildGlobalBlock and CodeGenModule::GetAddrOfGlobalBlock for details.

clang/lib/CodeGen/CGOpenCLRuntime.cpp
191

IIRC we can't query Sema from a CodeGen... The alternative would be to introduce a new language option, which is not desirable.

clang/lib/Sema/SemaDecl.cpp
8024

Well, unfortunately there are already few cases in CTS coverage which rely on that... Note that this is an address space of a block, but not a block literal.

azabaznov added inline comments.Nov 11 2021, 8:08 AM
clang/lib/AST/ExprConstant.cpp
9554

I mean - blocks_no_global_literal.cl - that's exactly the case which covers it.

Anastasia added inline comments.Nov 24 2021, 7:18 AM
clang/lib/AST/ExprConstant.cpp
9554

Ok I see, it doesn't seem to have any constant but my guess is that it becomes constant implicitly?

Does the case with program scope variable supported need testing too or is it covered elsewhere?

clang/lib/Sema/Sema.cpp
328

I wonder if we should simplify this condition by only checking getLangOpts().Blocks but then we make sure to set it correctly based on the language version and the target just like we do now for generic address space or pipes?

azabaznov abandoned this revision.Dec 13 2021, 7:50 AM

Now program scope global variables are required by device enqueue in the spec, new patch for device enqueue support: https://reviews.llvm.org/D115640