Page MenuHomePhabricator

[SYCL] Make default address space a superset of OpenCL address spaces.
Needs RevisionPublic

Authored by bader on Jun 1 2020, 8:28 AM.

Details

Summary

This change enables conversion between OpenCL address spaces and default
address space similar to conversions between OpenCL generic and OpenCL
global/local/private address spaces.

Diff Detail

Event Timeline

bader created this revision.Jun 1 2020, 8:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2020, 8:28 AM

Why?

Default address space is normally an address space of automatic storage i.e. private in OpenCL. If you look at the address space map of targets most of them map default and private address spaces to the same value. Converting between private and other named address space is not allowed in OpenCL see section 6.5. I don't believe this change is doing anything legitimate.

Can you explain what you are trying to achieve with this?

clang/test/SemaOpenCLCXX/address-space-lambda.cl
34

Here we are testing expected behavior. I believe we already had a discussion about this.

bader marked an inline comment as done.Jun 3 2020, 7:24 AM

Why? Can you explain what you are trying to achieve with this?

I think @asavonic can provide more detailed answer, but IIRC we spent a lot time trying to marry template meta-programming with OpenCL address space deductions, but even simplest template functions from C++ standard library breaks compilation. It's important to note one important difference in design goals for SYCL and C++ for OpenCL. SYCL aims to enable compilation of regular C++ as much as possible where as one of the C++ for OpenCL goals is to keep compatibility with OpenCL C. These two goals might lead to different design decisions. As we still wanted to re-use OpenCL address spaces for mapping explicitly annotated pointers to SPIR-V address spaces we decided to adopt alternative approach (inspired by OpenMP and CUDA modes) with using C++ Sema + additional semantic for non-standard attributes (like address spaces - e.g. https://reviews.llvm.org/D80317) and types. Address space deduction required by SPIR-V specification is implemented CodeGen module and it takes < 200 lines in a few files.

Default address space is normally an address space of automatic storage i.e. private in OpenCL. If you look at the address space map of targets most of them map default and private address spaces to the same value. Converting between private and other named address space is not allowed in OpenCL see section 6.5. I don't believe this change is doing anything legitimate.

AFAIK, OpenCL compiler deduce address space at Sema for all types of storage and even automatic (https://godbolt.org/z/-5QoBd). This change is legitimate for OpenCL because OpenCL doesn't use default address space and therefore it should be a non-functional change for OpenCL mode. I had to move on check from the OpenCL C++ test because this changes exposes a bug in the compiler (https://bugs.llvm.org/show_bug.cgi?id=45472), but other than that all existing OpenCL tests pass.

In SYCL mode we use default address space and apply conversion rules defined for generic address space, which is casting from generic to non-generic address space is allowed if memory is allocated in the same address space, otherwise behavior is undefined. In most cases, users don't need to worry about annotating pointers manually. They access memory through accessors which are annotated using template parameters, so template meta-programming will prevent invalid conversion. There is also an options to get a raw pointer and/or annotate a pointer using multi_ptr class, but in this case it's user's responsibility to make sure that conversion is valid. This might be required only in cases when compiler is not able to automatically optimize memory accesses to generic address space.

I was going to upstream CodeGen part with mapping "default address space" to LLVM address space in a separate patch. In the nutshell we would like to map the default address space to LLVM address space 4 for SPIR target (the same change is done for AMDGPU). NOTE: variables with automatic storage duration in default address space (i.e. w/o explicit annotation) do not use address space map, but rather data layout string to set the LLVM address space for alloca instructions - the default value is 0, which is treated as private by SPIR-V translator.

I can work on updating this patch with CodeGen changes if you think both patches should be committed together.

clang/test/SemaOpenCLCXX/address-space-lambda.cl
34

Yes, we discussed this here: https://github.com/intel/llvm/pull/1039.
I agree that behavior is expected, but I think that this checks works in spite of a bug @Fznamznon described here https://github.com/intel/llvm/pull/1039#discussion_r402905578 and in the bug report https://bugs.llvm.org/show_bug.cgi?id=45472.
This patch exposes the bug and I moved this check to a separate file to keep all checks active.

Anastasia requested changes to this revision.Jun 3 2020, 12:47 PM

Why? Can you explain what you are trying to achieve with this?

I think @asavonic can provide more detailed answer, but IIRC we spent a lot time trying to marry template meta-programming with OpenCL address space deductions, but even simplest template functions from C++ standard library breaks compilation. It's important to note one important difference in design goals for SYCL and C++ for OpenCL. SYCL aims to enable compilation of regular C++ as much as possible where as one of the C++ for OpenCL goals is to keep compatibility with OpenCL C. These two goals might lead to different design decisions.

I don't see a contradiction in those goals. We plan to support C++ libraries with C++ for OpenCL of course with functionality that is accepted by conformant OpenCL implementations. For example, libraries that use virtual functions leading to function pointers are not going to work portably in OpenCL devices. If SYCL aims to compile to conformant OpenCL devices then it should not be very different I imagine?

C++ for OpenCL and address spaces in C++ in general is still WIP and it is not impossible that there are parts that don't work as expected yet. Have you made an analysis of issues that you found with the templates and whether it is something that can't work conceptually or is just missing from implementation? Do you have a list of those issues somewhere?

As we still wanted to re-use OpenCL address spaces for mapping explicitly annotated pointers to SPIR-V address spaces we decided to adopt alternative approach (inspired by OpenMP and CUDA modes) with using C++ Sema + additional semantic for non-standard attributes (like address spaces - e.g. https://reviews.llvm.org/D80317) and types. Address space deduction required by SPIR-V specification is implemented CodeGen module and it takes < 200 lines in a few files.

Address spaces in OpenCL follow certain concept and if you believe it doesn't suit your goals then the right thing to do is to start from describing and discussing your design.

Default address space is normally an address space of automatic storage i.e. private in OpenCL. If you look at the address space map of targets most of them map default and private address spaces to the same value. Converting between private and other named address space is not allowed in OpenCL see section 6.5. I don't believe this change is doing anything legitimate.

AFAIK, OpenCL compiler deduce address space at Sema for all types of storage and even automatic (https://godbolt.org/z/-5QoBd).

This is not the only example for an automatic storage.

This change is legitimate for OpenCL because OpenCL doesn't use default address space and therefore it should be a non-functional change for OpenCL mode. I had to move on check from the OpenCL C++ test because this changes exposes a bug in the compiler (https://bugs.llvm.org/show_bug.cgi?id=45472), but other than that all existing OpenCL tests pass.

Default address space is a Clang concept that is generally used for an automatic storage. This is how embedded C defines it that has been used by Clang. OpenCL implementation has always used default for private and the decision to add private explicitly was made only about 2 years ago. OpenCL compiler infers address spaces for most of cases listed in the specification, however, there are cases that are not covered by the spec. There has not been any plan to forbid or remove the use of default addr space from the OpenCL implementation. And I don't think considering how long it was used historically it would be easy to do so.

In SYCL mode we use default address space and apply conversion rules defined for generic address space, which is casting from generic to non-generic address space is allowed if memory is allocated in the same address space, otherwise behavior is undefined. In most cases, users don't need to worry about annotating pointers manually. They access memory through accessors which are annotated using template parameters, so template meta-programming will prevent invalid conversion. There is also an options to get a raw pointer and/or annotate a pointer using multi_ptr class, but in this case it's user's responsibility to make sure that conversion is valid. This might be required only in cases when compiler is not able to automatically optimize memory accesses to generic address space.

I was going to upstream CodeGen part with mapping "default address space" to LLVM address space in a separate patch. In the nutshell we would like to map the default address space to LLVM address space 4 for SPIR target (the same change is done for AMDGPU). NOTE: variables with automatic storage duration in default address space (i.e. w/o explicit annotation) do not use address space map, but rather data layout string to set the LLVM address space for alloca instructions - the default value is 0, which is treated as private by SPIR-V translator.

I can work on updating this patch with CodeGen changes if you think both patches should be committed together.

I feel we will not make any progress without understanding and discussing your design first. If you envision large architectural changes we might need to make sure other stakeholders are informed about them.

This revision now requires changes to proceed.Jun 3 2020, 12:47 PM
Naghasan added a comment.EditedJun 4 2020, 6:48 AM

Why? Can you explain what you are trying to achieve with this?

I think @asavonic can provide more detailed answer, but IIRC we spent a lot time trying to marry template meta-programming with OpenCL address space deductions, but even simplest template functions from C++ standard library breaks compilation. It's important to note one important difference in design goals for SYCL and C++ for OpenCL. SYCL aims to enable compilation of regular C++ as much as possible where as one of the C++ for OpenCL goals is to keep compatibility with OpenCL C. These two goals might lead to different design decisions.

I don't see a contradiction in those goals. We plan to support C++ libraries with C++ for OpenCL of course with functionality that is accepted by conformant OpenCL implementations. For example, libraries that use virtual functions leading to function pointers are not going to work portably in OpenCL devices. If SYCL aims to compile to conformant OpenCL devices then it should not be very different I imagine?

(Edited a bit)

There is a fundamental difference (given the direction taken for OpenCL), OpenCL mode actively modifies user's types (C and C++ for OpenCL AFAIK). So inside a function, this:

int var;

becomes

VarDecl  var '__private int'

Which is fine in C where you can't really do much with your types, but this has a massive and destructive effect in C++. For instance this does not compile in C++ for OpenCL:

void foo() {
    int var; // '__private int' not 'int'
    int* ptr1 = &var; // '__generic int* __private' not 'int*'
    decltype(var)* ptr2 = ptr1; // error: cannot initialize a variable of type 'decltype(var) *__private' (aka '__private int *__private') with an lvalue of type '__generic int *__private'
}

We plan to support C++ libraries with C++ for OpenCL

With the direction taken so far, C++ for OpenCL can't properly implement or use basic C++ due to this. Small example using a is_same implementation (https://godbolt.org/z/CLFV6z):

template<typename T1, typename T2>
struct is_same {
    static constexpr int value = 0;
};

template<typename T>
struct is_same<T, T> {
    static constexpr int value = 1;
};

void foo(int p) {
    static_assert(is_same<decltype(p), int>::value, "int is not an int?"); // Fails: p is '__private int' != 'int'
    static_assert(is_same<decltype(&p), int*>::value, "int* is not an int*?");  // Fails: p is '__private int*' != '__generic int*'
}

So yes, you could implement std::is_same but it won't work as one would expect.

That's the reason why SYCL actively tries to prevent changing any type, this would prevent the compilation of valid C++ code without a fundamental reason (e.g. the target is not able to support it).

My point is only that the approach taken for C++ for OpenCL is not suitable for SYCL IMO

Default address space is a Clang concept that is generally used for an automatic storage.

That's not true in CUDA. On the other hand they actively avoids using address space.
Plus in isAddressSpaceSupersetOf you have:

// Consider pointer size address spaces to be equivalent to default.
((isPtrSizeAddressSpace(A) || A == LangAS::Default) &&
            (isPtrSizeAddressSpace(B) || B == LangAS::Default))

An alternative would be to clone the address space. But that would be shame to not find a way to reuse the opencl one as down the line, they map to the same thing.

clang/include/clang/AST/Type.h
493

Not sure how to make it better, but this may not be true depending on what is allowed by the language.

Why? Can you explain what you are trying to achieve with this?

I think @asavonic can provide more detailed answer, but IIRC we spent a lot time trying to marry template meta-programming with OpenCL address space deductions, but even simplest template functions from C++ standard library breaks compilation. It's important to note one important difference in design goals for SYCL and C++ for OpenCL. SYCL aims to enable compilation of regular C++ as much as possible where as one of the C++ for OpenCL goals is to keep compatibility with OpenCL C. These two goals might lead to different design decisions.

I don't see a contradiction in those goals. We plan to support C++ libraries with C++ for OpenCL of course with functionality that is accepted by conformant OpenCL implementations. For example, libraries that use virtual functions leading to function pointers are not going to work portably in OpenCL devices. If SYCL aims to compile to conformant OpenCL devices then it should not be very different I imagine?

(Edited a bit)

There is a fundamental difference (given the direction taken for OpenCL), OpenCL mode actively modifies user's types (C and C++ for OpenCL AFAIK). So inside a function, this:

int var;

becomes

VarDecl  var '__private int'

Which is fine in C where you can't really do much with your types, but this has a massive and destructive effect in C++. For instance this does not compile in C++ for OpenCL:

void foo() {
    int var; // '__private int' not 'int'
    int* ptr1 = &var; // '__generic int* __private' not 'int*'
    decltype(var)* ptr2 = ptr1; // error: cannot initialize a variable of type 'decltype(var) *__private' (aka '__private int *__private') with an lvalue of type '__generic int *__private'
}

Correct. For this we plan to add a metaprogramming utility to remove address space from a type. Here is a proposal:
https://reviews.llvm.org/D76636

However, this would mean modifying the C++ libraries to produce a flavor that can be supported by OpenCL devices.

I think my biggest problem is that I don't understand how you can just run C++ libraries on OpenCL devices. If we were able to compile plain C++ for OpenCL devices would we not just do it? But OpenCL devices have certain constraints. Also aren't libraries typically customized for performance depending on the range of HW they are being run on? I see the modifications are inevitable.

We plan to support C++ libraries with C++ for OpenCL

With the direction taken so far, C++ for OpenCL can't properly implement or use basic C++ due to this.Small example using a is_same implementation (https://godbolt.org/z/CLFV6z):

template<typename T1, typename T2>
struct is_same {
    static constexpr int value = 0;
};

template<typename T>
struct is_same<T, T> {
    static constexpr int value = 1;
};

void foo(int p) {
    static_assert(is_same<decltype(p), int>::value, "int is not an int?"); // Fails: p is '__private int' != 'int'
    static_assert(is_same<decltype(&p), int*>::value, "int* is not an int*?");  // Fails: p is '__private int*' != '__generic int*'
}

So yes, you could implement std::is_same but it won't work as one would expect.

That's the reason why SYCL actively tries to prevent changing any type, this would prevent the compilation of valid C++ code without a fundamental reason (e.g. the target is not able to support it).

My point is only that the approach taken for C++ for OpenCL is not suitable for SYCL IMO

Default address space is a Clang concept that is generally used for an automatic storage.

That's not true in CUDA. On the other hand they actively avoids using address space.

True and I don't think CUDA uses isAddressSpaceSupersetOf in their implementation? Perhaps SYCL can use the same flow?

Plus in isAddressSpaceSupersetOf you have:

// Consider pointer size address spaces to be equivalent to default.
((isPtrSizeAddressSpace(A) || A == LangAS::Default) &&
            (isPtrSizeAddressSpace(B) || B == LangAS::Default))

An alternative would be to clone the address space. But that would be shame to not find a way to reuse the opencl one as down the line, they map to the same thing.

I agree that reusing functionality is preferable however it is subject to whether it is sufficiently similar. If we end up adding a lot of code to diverge special cases or even worse regress existing functionality by adding new one then perhaps this is not a viable solution.

The problem is that I understand very little of the design at this point to suggest anything. All I know is that there are magic pointer classes in SYCL spec that represent address spaces somehow but what I see in this review is a change to OpenCL address space model that isn't governed by any spec or explained in any documentation. Perhaps it is easy and transparent to you but I see very little correlation. :(

bader marked an inline comment as done.Jun 5 2020, 1:29 PM

I think my biggest problem is that I don't understand how you can just run C++ libraries on OpenCL devices. If we were able to compile plain C++ for OpenCL devices would we not just do it? But OpenCL devices have certain constraints. Also aren't libraries typically customized for performance depending on the range of HW they are being run on? I see the modifications are inevitable.

The approach we use won't enable all C++ libraries, but it unblocks template metaprogramming patterns which conceptually are valid code. We can use STL parts of C++ standard library without re-writing them, which is a huge win for C++ developers.

We employ address space inference pass (https://llvm.org/doxygen/InferAddressSpaces_8cpp_source.html) to improve the performance of C++ libraries. Users can help the compiler analysis and optimize code by using SYCL multi_ptr template class specialized with particular memory region.

We plan to support C++ libraries with C++ for OpenCL

With the direction taken so far, C++ for OpenCL can't properly implement or use basic C++ due to this.Small example using a is_same implementation (https://godbolt.org/z/CLFV6z):

template<typename T1, typename T2>
struct is_same {
    static constexpr int value = 0;
};

template<typename T>
struct is_same<T, T> {
    static constexpr int value = 1;
};

void foo(int p) {
    static_assert(is_same<decltype(p), int>::value, "int is not an int?"); // Fails: p is '__private int' != 'int'
    static_assert(is_same<decltype(&p), int*>::value, "int* is not an int*?");  // Fails: p is '__private int*' != '__generic int*'
}

So yes, you could implement std::is_same but it won't work as one would expect.

That's the reason why SYCL actively tries to prevent changing any type, this would prevent the compilation of valid C++ code without a fundamental reason (e.g. the target is not able to support it).

My point is only that the approach taken for C++ for OpenCL is not suitable for SYCL IMO

Default address space is a Clang concept that is generally used for an automatic storage.

That's not true in CUDA. On the other hand they actively avoids using address space.

True and I don't think CUDA uses isAddressSpaceSupersetOf in their implementation? Perhaps SYCL can use the same flow?

Plus in isAddressSpaceSupersetOf you have:

// Consider pointer size address spaces to be equivalent to default.
((isPtrSizeAddressSpace(A) || A == LangAS::Default) &&
            (isPtrSizeAddressSpace(B) || B == LangAS::Default))

An alternative would be to clone the address space. But that would be shame to not find a way to reuse the opencl one as down the line, they map to the same thing.

I agree that reusing functionality is preferable however it is subject to whether it is sufficiently similar. If we end up adding a lot of code to diverge special cases or even worse regress existing functionality by adding new one then perhaps this is not a viable solution.

The problem is that I understand very little of the design at this point to suggest anything. All I know is that there are magic pointer classes in SYCL spec that represent address spaces somehow but what I see in this review is a change to OpenCL address space model that isn't governed by any spec or explained in any documentation. Perhaps it is easy and transparent to you but I see very little correlation. :(

@Anastasia, if we make this change specific to SYCL mode, will it address your concerns?

If not, we will go with the alternative proposed by Victor.
I'd like to mention that just a few hours ago we accepted a PR extending address space attributes in SYCL mode to enable efficient code generation of the code using USM extension (Unified Shared Memory) on targets w/o hardware support for this feature. I think this might be relevant information to take a decision.

clang/include/clang/AST/Type.h
493

You are right. For some reason I was sure that these address spaces are available only in SYCL and OpenCL modes. I update the patch to enable this conversion for SYCL mode only to avoid any impact on OpenCL mode.

@Anastasia, if we make this change specific to SYCL mode, will it address your concerns?

I can't answer this question for the reasons I have explained above.

bader added a comment.Jun 9 2020, 12:46 PM

@Anastasia, if we make this change specific to SYCL mode, will it address your concerns?

I can't answer this question for the reasons I have explained above.

Sorry, I'm not sure that I get your concern correctly, so let me clarify: is it allowing conversion between pointers w/ and w/o address space annotation in non-SYCL mode or using OpenCL address space attributes in SYCL mode?

Just to help you to understand the proposed design, I created the full patch for address space handling in SYCL: https://github.com/bader/llvm/pull/18. There are few CodeGen tests validating LLVM IR for SPIR target. Fee free to ask any questions on this PR.

I also merged it with https://reviews.llvm.org/D71016, which lowers C++ function object into OpenCL kernel in this PR (just revert last two commits - https://github.com/bader/llvm/pull/18/files/b2772482370844f33093a70e7d17a318caab49ce).
It's not directly related to this review, but completes the picture of producing LLVM IR for SYCL kernels.

@Anastasia, if we make this change specific to SYCL mode, will it address your concerns?

I can't answer this question for the reasons I have explained above.

Sorry, I'm not sure that I get your concern correctly, so let me clarify: is it allowing conversion between pointers w/ and w/o address space annotation in non-SYCL mode or using OpenCL address space attributes in SYCL mode?

Just to help you to understand the proposed design, I created the full patch for address space handling in SYCL: https://github.com/bader/llvm/pull/18. There are few CodeGen tests validating LLVM IR for SPIR target. Fee free to ask any questions on this PR.

Thanks! This is good but it is only an implementation of the design. Deducing the design from an implementation is time-consuming and not sure it is even feasible. I don't want to waste our time to provide you detailed feedback based on my interpretation of your design and invalid assumptions. I don't want to bind you to any particular format and we don't have any strict requirement for this in LLVM either, but I would encourage you to take a look at some of RFC threads sent to cfe-dev that explain new design concepts. Perhaps, they can help you to understand what information can be provided as a starting point to new design discussions.

Particularly I would suggest covering these two points:

  • SYCL specifies address spaces as classes and it is not very obvious how did you come from libraries classes to modifications in Clang? I believe there are a number of design choices that you could make. One might think that you can just implement address space classes using existing Clang functionality. But if it is not sufficient it is important to understand what modifications are required and why.
  • Other aspects that are important to highlight whether your design has any limitations. For example, I don't quite understand your need for SPIR*SYCLDeviceTargetInfo. Is there anything in your design that limits compilation to one particular target?

Overall, I see that there are a lot of changes in CodeGen that are related to the language semantic. I believe that CodeGen is supposed to be dialing primarily with target-specific logic and I don't know if you should change them to query target specific details instead. Also most of your CodeGen changes are not related to OpenCL so I would make sure to loop @rjmccall in this thread.

bader added a comment.Jun 26 2020, 5:11 AM

@Anastasia, if we make this change specific to SYCL mode, will it address your concerns?

I can't answer this question for the reasons I have explained above.

Sorry, I'm not sure that I get your concern correctly, so let me clarify: is it allowing conversion between pointers w/ and w/o address space annotation in non-SYCL mode or using OpenCL address space attributes in SYCL mode?

Just to help you to understand the proposed design, I created the full patch for address space handling in SYCL: https://github.com/bader/llvm/pull/18. There are few CodeGen tests validating LLVM IR for SPIR target. Fee free to ask any questions on this PR.

Thanks! This is good but it is only an implementation of the design. Deducing the design from an implementation is time-consuming and not sure it is even feasible. I don't want to waste our time to provide you detailed feedback based on my interpretation of your design and invalid assumptions. I don't want to bind you to any particular format and we don't have any strict requirement for this in LLVM either, but I would encourage you to take a look at some of RFC threads sent to cfe-dev that explain new design concepts. Perhaps, they can help you to understand what information can be provided as a starting point to new design discussions.

Particularly I would suggest covering these two points:

  • SYCL specifies address spaces as classes and it is not very obvious how did you come from libraries classes to modifications in Clang? I believe there are a number of design choices that you could make. One might think that you can just implement address space classes using existing Clang functionality. But if it is not sufficient it is important to understand what modifications are required and why.
  • Other aspects that are important to highlight whether your design has any limitations. For example, I don't quite understand your need for SPIR*SYCLDeviceTargetInfo. Is there anything in your design that limits compilation to one particular target?

Overall, I see that there are a lot of changes in CodeGen that are related to the language semantic. I believe that CodeGen is supposed to be dialing primarily with target-specific logic and I don't know if you should change them to query target specific details instead. Also most of your CodeGen changes are not related to OpenCL so I would make sure to loop @rjmccall in this thread.

@Anastasia, sorry for the delay.
I've sent an email with RFC covering these questions to cfe-dev mailing list - http://lists.llvm.org/pipermail/cfe-dev/2020-June/066047.html.

ychen added a subscriber: ychen.Jul 27 2020, 3:11 PM
bader marked an inline comment as not done.Oct 21 2020, 1:35 PM

Alternative approach with SYCL specific semantic attributes - https://reviews.llvm.org/D89909.