This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] Handle taking address of block captures
ClosedPublic

Authored by Anastasia on Aug 7 2017, 9:28 AM.

Details

Summary

There is an issue with taking an address of captured variables, because captures can be put in different locations depending on the vendor implementation (and therefore they are passed as generic AS pointer to the block).

The physical location for the captures can be:
(a) in case the block is called as a lambda function the captures are put on the stack - in the private AS.
(b) in case the block is used in enqueue it depends on the vendor implementation to put the captured content in the memory accessible for the enqueued kernels. In general case global memory can be used which is accessible everywhere.

Example:

void foo(){
  int a;
  void (^bl)(void) = ^(void) {
    private int* b = &a; // a here is not the same object as a in foo
  }

Currently Clang hits unreachable due to bitcast of generic to private AS pointer because the original location of a is on the stack but the location of the captures is in generic AS.

This patch disallows taking address of captures because the physical address space can be different depending on the block use cases and vendor implementations.

An alternative approached could be (in case we want to allow this code) to assume captures are located in the generic AS implicitly. However, in this case programmers should be advised that erroneous AS casts can occur further that can't be diagnosed by the frontend (i.e. if capture address is used further in the operations of a different address space to where the captures physical location is).

Example:

void foo(){
  int a;
  void (^bl)(void) = ^(void) {
    local int* b = (local int*)&a; //valid cast of implicitly generic to local but &a will be located in private or global AS for the 2 uses below
  }
  b();
  enqueue_kernel(..., b)
}

@sam, @Alexey, does it make sense taking this route?

PS, this patch also contains a minor clean up - removal of duplicate diagnostic for taking an address of a function.

Diff Detail

Event Timeline

Anastasia created this revision.Aug 7 2017, 9:28 AM
Anastasia edited the summary of this revision. (Show Details)
yaxunl edited edge metadata.Aug 16 2017, 7:50 AM

Sorry for the delay.

I feel forbidding user taking address of captured variable is too restrictive and make blocks less useful.

A block is quite like a normal function with a generic pointer argument. Users take the responsibility to make sure the cast of a generic pointer to private or global address space is valid at runtime.

I am more inclined to the alternative approach, i.e., assuming captures are located in the generic AS implicitly.

Ok, I will update it to be implicitly generic then. Just to be sure, @bader do you agree on this too?

bader edited edge metadata.Aug 29 2017, 5:43 AM

Ok, I will update it to be implicitly generic then. Just to be sure, @bader do you agree on this too?

An alternative approached could be (in case we want to allow this code) to assume captures are located in the generic AS implicitly. However, in this case programmers should be advised that erroneous AS casts can occur further that can't be diagnosed by the frontend (i.e. if capture address is used further in the operations of a different address space to where the captures physical location is).

I don't have a strong opinion on this topic, but I'm worried about relying on implicit assumptions, which might make code less portable.

How the alternative approach is this suppose to work for enqueue_kernel? Do we assume that the captured variables passed by generic pointer and compiler will assign the correct address space based on the explicit casts in the block?
There are some restrictions on kernel parameters: 6.9.a. Arguments to kernel functions declared in a program that are pointers must be declared with the global, constant or local qualifier.

I feel forbidding user taking address of captured variable is too restrictive and make blocks less useful.

I have no information how widely blocks are used by the OpenCL community and how important this feature is.

Ok, I will update it to be implicitly generic then. Just to be sure, @bader do you agree on this too?

An alternative approached could be (in case we want to allow this code) to assume captures are located in the generic AS implicitly. However, in this case programmers should be advised that erroneous AS casts can occur further that can't be diagnosed by the frontend (i.e. if capture address is used further in the operations of a different address space to where the captures physical location is).

I don't have a strong opinion on this topic, but I'm worried about relying on implicit assumptions, which might make code less portable.

How the alternative approach is this suppose to work for enqueue_kernel? Do we assume that the captured variables passed by generic pointer and compiler will assign the correct address space based on the explicit casts in the block?
There are some restrictions on kernel parameters: 6.9.a. Arguments to kernel functions declared in a program that are pointers must be declared with the global, constant or local qualifier.

The captured variable is still passed by value. The address taking is on the duplicate of the captured variable, not on the original variable. My understanding is that OpenCL does not require the real address space of a generic pointer to be deducible at compile/link time, therefore a compliant implementation should be able to dereference a generic pointer at runtime.

I feel forbidding user taking address of captured variable is too restrictive and make blocks less useful.

I have no information how widely blocks are used by the OpenCL community and how important this feature is.

The OpenCL spec does not forbid taking address of captured variable. Forbidding taking address of captured variable would violate the spec.

The captured variable is still passed by value. The address taking is on the duplicate of the captured variable, not on the original variable.

In this case address of captured variables should point to the private address space, as function parameters reside in private address space (see "6.5 Address Space Qualifiers" paragraph 3).
This makes unclear to me how capturing works for variables declared in the address spaces other than private (e.g. local int a;).

The OpenCL spec does not forbid taking address of captured variable. Forbidding taking address of captured variable would violate the spec.

I'm not sure yet, but it might be OpenCL spec inconsistency issue.

yaxunl added a comment.Sep 5 2017, 2:26 PM

The captured variable is still passed by value. The address taking is on the duplicate of the captured variable, not on the original variable.

In this case address of captured variables should point to the private address space, as function parameters reside in private address space (see "6.5 Address Space Qualifiers" paragraph 3).
This makes unclear to me how capturing works for variables declared in the address spaces other than private (e.g. local int a;).

Sorry I was not accurate when saying the captured variables are passed by value.

The captured variable is emitted as a member of the block context structure. There are two situations:

  1. the block is called as a normal block.

In this case , the block context is in stack and passed indirectly to the block invoke function, so taking address of captured variable results in a private pointer.

  1. the block is called through enqueue_kernel as a kernel.

In this case the address space of block context depends on how the target pass the block context to the kernel.

Some target may pass the block context struct directly, in this case taking address of captured variable results in a private pointer.

Some target may pass the block context struct by a pointer to global memory, in this case taking address of captured variable results in a global pointer.

Unless the target is able to handle indirect byval kernel argument, the block invoke function will ends up with different ISA's when called as a normal
block and called as a device-enqueued kernel.

One possible solution may be:

  1. when emitting code for the definition of the block, assuming it is a normal block and let captured variables have private address space.
  1. when emitting code for enqueue_kernel,

    emit a target-specific version of the block invoke function again which uses proper kernel ABI and with kernel metadata. Let the captured variables have target-specific address space.

However, since this may take some time and efforts to implement, I think the current patch is acceptable as a temporary measure.

The captured variable is still passed by value. The address taking is on the duplicate of the captured variable, not on the original variable.

In this case address of captured variables should point to the private address space, as function parameters reside in private address space (see "6.5 Address Space Qualifiers" paragraph 3).

This will only apply to blocks used as lambda functions though. But if they are enqueued using enqueue_kernel the captures should be accessible from the areas outside current thread too.

yaxunl accepted this revision.Sep 7 2017, 7:39 AM

LGTM as a temporary measure until we have a solution for properly emitting blocks as enqueued kernel.

This revision is now accepted and ready to land.Sep 7 2017, 7:39 AM

LGTM as a temporary measure until we have a solution for properly emitting blocks as enqueued kernel.

Should I start discussion with Khronos on that? What would our preference be - implicitly generic AS for capture addresses?

yaxunl added a comment.Sep 7 2017, 9:15 AM

LGTM as a temporary measure until we have a solution for properly emitting blocks as enqueued kernel.

Should I start discussion with Khronos on that? What would our preference be - implicitly generic AS for capture addresses?

I had a discussion with Brian and we realized that captured variable in global address space is kind of unusual since that means each work item needs to have a different global pointer as the block context argument to the kernel, whereas usually when you set global pointer kernel argument for a kernel, different work items get the same global pointer. However, we cannot totally rule out an implementation of enqueue_kernel like that.

That said, I kind of think the address space of captured variable is implementation dependent, though normally it seems private address space makes more sense.

I do not object to open a discussion at khronos to clarify this.

This revision was automatically updated to reflect the committed changes.

LGTM as a temporary measure until we have a solution for properly emitting blocks as enqueued kernel.

Should I start discussion with Khronos on that? What would our preference be - implicitly generic AS for capture addresses?

I had a discussion with Brian and we realized that captured variable in global address space is kind of unusual since that means each work item needs to have a different global pointer as the block context argument to the kernel, whereas usually when you set global pointer kernel argument for a kernel, different work items get the same global pointer. However, we cannot totally rule out an implementation of enqueue_kernel like that.

That said, I kind of think the address space of captured variable is implementation dependent, though normally it seems private address space makes more sense.

Would it not be a problem generally though? Because private AS memory won't be accessed elsewhere and therefore if the block is enqueued to run on a different core or any place with different memory segment it would cause an issue...

I do not object to open a discussion at khronos to clarify this.

Bug #16398 in case you would like to track it.

LGTM as a temporary measure until we have a solution for properly emitting blocks as enqueued kernel.

Should I start discussion with Khronos on that? What would our preference be - implicitly generic AS for capture addresses?

I had a discussion with Brian and we realized that captured variable in global address space is kind of unusual since that means each work item needs to have a different global pointer as the block context argument to the kernel, whereas usually when you set global pointer kernel argument for a kernel, different work items get the same global pointer. However, we cannot totally rule out an implementation of enqueue_kernel like that.

That said, I kind of think the address space of captured variable is implementation dependent, though normally it seems private address space makes more sense.

Would it not be a problem generally though? Because private AS memory won't be accessed elsewhere and therefore if the block is enqueued to run on a different core or any place with different memory segment it would cause an issue...

The block context can be passed to the kernel directly. In this case it ends up in the stack of the callee and takes private address space.

I do not object to open a discussion at khronos to clarify this.

Bug #16398 in case you would like to track it.