Page MenuHomePhabricator

[SYCL] Implement SYCL device code outlining
Needs ReviewPublic

Authored by Fznamznon on Apr 9 2019, 5:36 AM.

Details

Summary

SYCL is single source offload programming model so compiler should be
able to separate device code from host code.

Here is code example of the SYCL program demonstrates complier outlining
work:

int foo(int x) { return ++x; }
int bar(int x) { throw std::exception("CPU code only!"); }
...
using namespace cl::sycl;
queue Q;
buffer<int, 1> a(range<1>{1024});
Q.submit([&](handler& cgh) {
      auto A = a.get_access<access::mode::write>(cgh);
      cgh.parallel_for<init_a>(range<1>{1024}, [=](id<1> index) {
        A[index] = index[0] * 2 + index[1] + foo(42);
      });
    }
...

SYCL compiler needs to compile lambda function passed to
cl::sycl::handler::parallel_for method and function foo called from this
lambda function. Compiler also must ignore bar function when we "device" part
of the single source code.

Current approach is to add an attribute SYCL kernel, which SYCL runtime will use
to mark code passed to cl::sycl::handler::parallel_for as "kernel functions".
Obviously runtime library can't mark foo as "device" code - this is a compiler
job: to traverse all symbols accessible from kernel functions and add them to
the "device part" of the code marking them with new attribute SYCL
device.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Just to understand how this will work. I would imagine you can have a device function definition preceding its use. When the function is being parsed it's not known yet whether it will be called from the device or not, so it won't be possible to set the language mode correctly and hence provide the right diagnostics. So is the plan to launch a separate parsing phase then just to extract the call graph and annotate the device functions?

I'm not an expert in clang terminology but I will try briefly explain our current implementation approach.
In SYCL all kernel functions should be template functions so these functions have a deferred instantiation. If we found that we instantiated a sycl kernel function - we add it to a special array with sycl device functions (you can see the corresponding code here and here, actually AddSyclKernel adds declaration to the special array inside the Sema). After performing
pending instantiations we run a recursive AST visitor for each SYCL kernel to mark all device functions and add them to a special array with SYCL device functions (here we start traverse AST from MarkDevice function, code of MarkDevice is here).
To get a correct set of SYCL device functions in produced module we added a check for all declarations inside the CodeGen on sycl_device attribute existence - so it will ignore declarations without sycl_device attribute if we are compiling for SYCL device (code is here). But with this check it's possible situation when function was parsed and ignored by the CodeGen before we added sycl_device attribute to it so we added yet another parsing action inside the clang::ParseAST to generate code for all SYCL device functions from the special array (code is here).

Thanks for explanation! I would need to spend a bit more time to go through the pointers you have provided. Btw just to understand whether the use case with externally defined device side functions is covered too? I.e. can you have something like this:

extern void foo();
[clang::sycl_kernel]] void bar() {
  foo();
}

When foo is defined in a separate module that doesn't call it on a device side

void foo() {
  dosomething();
}

would compiler still be able to detect that this is a device function?

It would be better to rename clang/test/SemaSYCL/device-attrubutes.cpp to clang/test/SemaSYCL/device-attributes.cpp

It would be better to rename clang/test/SemaSYCL/device-attrubutes.cpp to clang/test/SemaSYCL/device-attributes.cpp

It's already renamed in the latest patch.

I am not sure we need to add a keyword actually, the attribute can just be added in AST since it's not supposed to be used in the source code?

The attribute is used by the SYCL headers to mark the functions to be outlined to the device.
https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/handler.hpp#L267
https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/handler.hpp#L295
https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/handler.hpp#L308
https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/handler.hpp#L325

My understanding of SYCL kernel is that it mainly matches OpenCL kernel functionality because the original intent of SYCL was to provide single source functionality on top of OpenCL.

Yes, this was the idea, when OpenCL was announced in November 2008, to build some higher-level programming models on top of it.
Now the SYCL standard is evolving to something more general to bring heterogeneous computing to modern C++.
So we could reuse in the same way some attributes from OpenMP 5 or CUDA if the Clang/LLVM community thinks it is better.

But I am not an expert in SYCL to confirm that.

I am pretty sure that the SYCL standard committee would love some active participation from ARM. ;-)

I think what we are missing currently is a thorough analysis/comparison between SYCL device mode and OpenCL kernel language mode to understand what's the best implementation strategy. That would apply to many other features: kernel function restrictions, address spaces, vectors, special types, etc.

That would make definitely sense when we target OpenCL.

I still see no point in polluting our code base with extra code that just does the same thing. It will save us a lot of time to just work cooperatively on the same problem and even improve readability of the code. But of course this can only be done if there is no need to diverge the implementation significantly.

Yes. Probably that even when the target is not OpenCL, the general principles remain similar. Probably the same for CUDA & OpenMP 5 too...

I think what we are missing currently is a thorough analysis/comparison between SYCL device mode and OpenCL kernel language mode to understand what's the best implementation strategy. That would apply to many other features: kernel function restrictions, address spaces, vectors, special types, etc.

That would make definitely sense when we target OpenCL.

I still see no point in polluting our code base with extra code that just does the same thing. It will save us a lot of time to just work cooperatively on the same problem and even improve readability of the code. But of course this can only be done if there is no need to diverge the implementation significantly.

Yes. Probably that even when the target is not OpenCL, the general principles remain similar. Probably the same for CUDA & OpenMP 5 too...

In the interest of speeding up the upstreaming work, would you be able to highlight the differences and similarity at least for SYCL and OpenCL kernel modes? Not sure if you are familiar enough with both. Because apart from the public announcements I can't see what has been changed in SYCL that would disallow to use OpenCL mode. It would be a valuable input to determine the implementation choices.

In the interest of speeding up the upstreaming work, would you be able to highlight the differences and similarity at least for SYCL and OpenCL kernel modes? Not sure if you are familiar enough with both. Because apart from the public announcements I can't see what has been changed in SYCL that would disallow to use OpenCL mode. It would be a valuable input to determine the implementation choices.

SYCL is similar to OpenMP 5 for C++, where you use only C++ classes instead of #pragma. So it is quite C++-friendlier than OpenMP.
But that means also there is not the same concept of explicit kernel like in OpenCL or CUDA. In OpenCL or CUDA, when there is a function with a specific attribute, you know it is a kernel and you can compile as such.

In SYCL or OpenMP you need an outliner that will estimate what should be executed as an heterogeneous kernel, split the code between the host side and the device side, add some glue/stub to implement an RPC between the host and the device, manage potentially some allocation/memory transfers, etc. This is quite more complex than compiling OpenCL, CUDA or other graphics shader languages. This is also why, while SYCL is technically pure standard C++, you need some specific compiler magic to do the code massaging to have everything working well between a host and some devices.

The attribute we discuss here is just an implementation detail to help the coordination between the compiler and the SYCL frontend classes to mark some area to outline, without relying to do some precise pattern matching, allowing more flexibility in the runtime without changing the compiler every time. So while it defines a zone to be outlined as a kernel, it is not really a kernel in the sense of OpenCL.

In triSYCL I made some completely different choices, using late outlining in LLVM and detecting some specific functions such as cl::sycl::detail::instantiate_kernel<KernelName>() that defines some stuff I want to outline https://github.com/triSYCL/triSYCL/blob/master/doc/architecture.rst#low-level-view-of-the-device-compiler-workflow
For me an attribute was not an option because I wanted to change Clang as little as possible. But at the end, I think it is quite more brittle than doing early outlining in Clang as discussed here, which also requires quite more knowledge of Clang than I have. :-)

So at the end, I think we should use a different keyword from OpenCL or CUDA because the semantics is different.

SYCL is similar to OpenMP 5 for C++, where you use only C++ classes instead of #pragma. So it is quite C++-friendlier than OpenMP.

I am not sure what you mean by friendlier? OpenMP concept is very clear to be - a program written in C/C++ or Fortran can be complimented with simple compiler directives to instruct the compiler about the parallelization. Hence exactly the same program can be used on sequential or parallel architectures. I can't imagine however anyone would use SYCL program on a non-parallel architecture? And this is where it is fundamentally different concept to me than C++ that has very different execution model (using very explicit language constructs for parallelism btw!).

To me SYCL dictates how program is to be written with explicit parallelism constructs using a special language. The fact that the language doesn't use different syntax from standard C++ at the moment doesn't mean that it's not there at least implicitly. If you would be able to just reuse C++ it would be perfectly a library style language but since you need language extensions to the compiler it isn't just a pure C++ library to me.

But that means also there is not the same concept of explicit kernel like in OpenCL or CUDA. In OpenCL or CUDA, when there is a function with a specific attribute, you know it is a kernel and you can compile as such.

I am very confused, because if you don't need an explicit kernel construct why are you adding it here at all? The fact that you don't provide the documentation for it in the spec but yet add it as an explicit attribute in the language to allow implementing the feature does show that it is actually explicitly required. It is just well hidden behind the C++ library syntax that however requires activating features that aren't part of ISO C++. Perhaps I am still missing something but I am just worried that we are going to end up with a language that pretends to be a C++ library. I certainly see that CUDA or OpenCL could just add a layer of C++ libraries on top of their language extensions to provide the same functionality. So I still feel SYCL is closer to CUDA than to pure C++.

In SYCL or OpenMP you need an outliner that will estimate what should be executed as an heterogeneous kernel, split the code between the host side and the device side, add some glue/stub to implement an RPC between the host and the device, manage potentially some allocation/memory transfers, etc.

But in SYCL this is requested explicitly in the source code using language constructs, isn't it?

This is quite more complex than compiling OpenCL, CUDA or other graphics shader languages.

I think CUDA still does fair bit of similar logic what you describe above though.

This is also why, while SYCL is technically pure standard C++, you need some specific compiler magic to do the code massaging to have everything working well between a host and some devices.

This patch is actually extending pure standard C++ to make it less pure. There is nothing magic about it.

The attribute we discuss here is just an implementation detail to help the coordination between the compiler and the SYCL frontend classes to mark some area to outline, without relying to do some precise pattern matching, allowing more flexibility in the runtime without changing the compiler every time. So while it defines a zone to be outlined as a kernel, it is not really a kernel in the sense of OpenCL.

Can you give some concrete examples of why device outlined functions can't be an OpenCL kernel or functions? What functionality (apart from kernel templates) wouldn't be applicable?

In triSYCL I made some completely different choices, using late outlining in LLVM and detecting some specific functions such as cl::sycl::detail::instantiate_kernel<KernelName>() that defines some stuff I want to outline https://github.com/triSYCL/triSYCL/blob/master/doc/architecture.rst#low-level-view-of-the-device-compiler-workflow
For me an attribute was not an option because I wanted to change Clang as little as possible.

Why not to continue this approach? What limitation does it have? Is it something that demonstrates that the language extension is the real solution to this?

But at the end, I think it is quite more brittle than doing early outlining in Clang as discussed here, which also requires quite more knowledge of Clang than I have. :-)

So at the end, I think we should use a different keyword from OpenCL or CUDA because the semantics is different.

Overall, I see that this work is now going into a very different direction from what was written in the original RFC.
https://lists.llvm.org/pipermail/cfe-dev/2019-January/060811.html

It was suggested that SYCL builds on top of OpenCL and therefore most of functionality can be reused. May be the best approach is to restart the RFC making the new intent and the overall concept very clear, especially the fact that you are going to add a number of extensions to C++ language. I think C++ developers should be aware that is is going to happen and they can also help you further with a guidance and align on a general C++ development flow.

I tried to reuse OpenCL kernel attribute with "__kernel" keyword in our current SYCL implementation. PR with this try is here - https://github.com/intel/llvm/pull/97
Now It looks feasible but with a couple notes:
From SYCL specification "SYCL is designed to be as close to standard C++ as possible. Standard C++ compiler can compile the SYCL programs and they will run correctly on host CPU." So SYCL doesn't provide non-standard kernel keyword which is provided by OpenCL. Due this fact it's not possible to add kernel keyword as in OpenCL, it will prevent compilation of following valid SYCL code:

int foo(int kernel) { return ++kernel; } // If "kernel" will be a keyword like in OpenCL, here will be a error
…
using namespace cl::sycl;
queue Q;
buffer<int, 1> a(range<1>{1024});
Q.submit([&](handler& cgh) {
      auto A = a.get_access<access::mode::write>(cgh);
      cgh.parallel_for<init_a>(range<1>{1024}, [=](id<1> index) {
        A[index] = index[0] * 2 + index[1] + foo(42);
      });
    }
...

So I added only __kernel keyword for SYCL because in C++ identifiers which start with __ are reserved for compiler internals.
Next note:
In our current implementation actually not quite that function which is marked with sycl_kernel (or __kernel, whatever) will be real OpenCL kernel in produced module. In SYCL all shared between host and device memory objects (buffers/images, these objects map to OpenCL buffers and images) can be accessed through special accessor classes. SYCL also has special mechanism for passing kernel arguments from host to device, if in OpenCL you need to do clSetKernelArg, in SYCL all kernel arguments are captures/fields of lambda/functor which is passed to parallel_for (See code snippet above, here one kernel argument - accessor A ). To map to OpenCL setting kernel arguments mechanism we added generation of some "kernel wrapper" function inside the compiler. "Kernel wrapper" function contains body of SYCL kernel function, receives OpenCL like parameters and additionally does some manipulation to initialize captured lambda fields with this parameters. In some pseudo code "kernel wrapper" looks like this:

// SYCL kernel is defined in SYCL headers
__kernel someSYCLKernel(lambda) {
  lambda();
}
// Kernel wrapper
__kernel wrapper(global int* a) {
  lambda; // Actually lambda declaration doesn't have a name in AST
  // Let lambda has one captured field - accessor A. We need to init it with global pointer from arguments:
  lambda.A.__init(a);
  // Body of SYCL kernel from SYCL headers:
  {
    lambda();
  }
}

And actually kernel wrapper is presented in result module and passed to OpenCL backend.
As I said, kernel wrapper is generated by the compiler inside the Sema and OpenCLKernel attribute manually added to it, no matter which attribute was added to "SYCL kernel" from SYCL headers.
So, while we are generating this wrapper I see only one profit to use OpenCL kernel attribute in SYCL kernels - don't add new attribute to clang (but we need to add __kernel keyword to SYCL).
I thought about idea - don't generate kernel wrapper but looks like it will not work with OpenCL since we can't pass OpenCL cl_mem arguments inside any structures (including accessors and lambdas) to the kernel.

I think potentially reusing OpenCL features is desirable since the device code of SYCL is largely OpenCL. However I don't think we are clear enough about the overall device compilation flow of SYCL and I can easily suggest a number of different approaches including those that don't modify compiler at all. :) I am afraid until we have the big picture clear it will be hard to make any sensible decisions.

I have created an issue about it earlier

https://github.com/intel/llvm/issues/59

and I am going to add some more comments there to explain what we should elaborate and agree on.

I suggest we finalize the big picture in the following weeks first and then we can go ahead with the detailed work in the reviews.

OlegM added a subscriber: OlegM.Mon, May 20, 11:22 PM
Fznamznon updated this revision to Diff 200513.EditedTue, May 21, 7:58 AM

Added semantics for new attributes

  • Added semantics for new attributes. Now complier can separate SYCL device code from host code using new arrtributes.
  • Removed spelling for sycl_device attribute and its documentation because it can be added only implicitly by the compiler for now
  • Removed docs for sycl_kernel attribute because this attribute is not presented in SYCL spec and not for public use - it's some implemetation detail. It will be used in SYCL headers implemetation to help compiler find device code entry point in single source file. So I think no need to add documentation for it.
Fznamznon retitled this revision from [SYCL] Add support for SYCL device attributes to [SYCL] Implement SYCL device code outlining.Tue, May 21, 8:03 AM
Fznamznon edited the summary of this revision. (Show Details)
Fznamznon updated this revision to Diff 200658.Wed, May 22, 1:12 AM

Minor fix.

Anastasia added inline comments.Wed, May 22, 10:04 AM
clang/include/clang/Basic/Attr.td
1022

Ok, I thought the earlier request was not to add undocumented attributes with the spelling?

Also did __kernel attribute not work at the end?

I can't quite get where the current disconnect comes from but I find it extremely unhelpful.

bader added inline comments.Wed, May 22, 11:36 AM
clang/include/clang/Basic/Attr.td
1022

Hi @Anastasia, let me try to help.

Ok, I thought the earlier request was not to add undocumented attributes with the spelling?

Right. @Fznamznon, could you document sycl_kernel attribute, please?

Also did __kernel attribute not work at the end?

Maria left a comment with the summary of our experiment: https://reviews.llvm.org/D60455#1472705. There is a link to pull request, where @keryell and @agozillon expressed preference to have separate SYCL attributes. Let me copy their feedback here:

@keryell :

Thank you for the experiment.
That looks like a straight forward change.
The interesting part is that it does not expose any advantage from reusing OpenCL __kernel marker.... So I am not more convinced that it is the way to go, because we would use any other keyword or attribute and it would be the same...

@agozillon :

Just my two cents, I think a separation of concerns and having separate attributes will simplify things long-term.

While possibly similar just now, the SYCL specification is evolving and may end up targeting more than just OpenCL. So the semantics of the attributes may end up being quite different, even if at the moment the SYCL attribute is there mostly just to mark something for outlining.

If it doesn't then the case for refactoring and merging them in a future patch could be brought up again.

To summarize: we don't have good arguments to justify re-use of OpenCL __kernel keyword for SYCL mode requested by @aaron.ballman here https://reviews.llvm.org/D60455#1469150.

I can't quite get where the current disconnect comes from but I find it extremely unhelpful.

Let me know how I can help here.

Additional note. I've submitted initial version of SYCL compiler design document to the GItHub: https://github.com/intel/llvm/blob/sycl/sycl/doc/SYCL_compiler_and_runtime_design.md. Please, take a look and let me know if you have questions.

Fznamznon added inline comments.Thu, May 23, 1:08 AM
clang/include/clang/Basic/Attr.td
1022

Ok, I thought the earlier request was not to add undocumented attributes with the spelling?

Right. @Fznamznon, could you document sycl_kernel attribute, please?

Do we really need add documentation for attribute which is not presented in SYCL spec and used for internal implementation details only because it has spelling?

Design question: since you are not aware what functions are to be run on a device while parsing them, at what point do you plan to diagnose the invalid behavior from the standard C++ i.e. using function pointers in kernel code is likely to cause issues?

Also do you need to outline the data structures too? For example classes used on device are not allowed to have virtual function.

clang/include/clang/Basic/Attr.td
1022
    Ok, I thought the earlier request was not to add undocumented attributes with the spelling?
 
Right. @Fznamznon, could you document sycl_kernel attribute, please?

Do we really need add documentation for attribute which is not presented in SYCL spec and used for internal implementation details only because it has spelling?

You are adding an attribute that is exposed to the programmers that use clang to compile their code, so unless you come up with some way to reject it in the non-toolchain mode it has to be documented. And for clang it will become "hidden" SYCL dialect so absolutely not different to __kernel.

Another aspect to consider is that clang used TypePrinter in diagnostics and even though printing entire function signature is rare it might appear in diagnostics and the programmer should have a way to understand what the "alien" construct is. This is where clang documentation will help.

1022

@keryell :

Thank you for the experiment.
That looks like a straight forward change.
The interesting part is that it does not expose any advantage from reusing OpenCL __kernel marker.... So I am not more convinced that it is the way to go, because we would use any other keyword or attribute and it would be the same...

I don't understand how this conclusions are made on incomplete implementation or even just an initial patch.

The kind of analysis I am missing at the moment is whether you would need to add similar logic for sycl_kernel as we have now for __kernel i.e. did anyone look at the occurrences of kernel handling in the code base to see if it's going to need the same logic or not:

include/clang/Basic/Attr.td:    : SubsetSubject<Function, [{S->hasAttr<OpenCLKernelAttr>()}],
include/clang/Parse/Parser.h:  void ParseOpenCLKernelAttributes(ParsedAttributes &attrs);
lib/AST/Decl.cpp:  if (hasAttr<OpenCLKernelAttr>())
lib/AST/Decl.cpp:  if (hasAttr<OpenCLKernelAttr>())
lib/AST/Decl.cpp:  if (hasAttr<OpenCLKernelAttr>())
lib/CodeGen/CGCall.cpp:  if (TargetDecl && TargetDecl->hasAttr<OpenCLKernelAttr>()) {
lib/CodeGen/CodeGenFunction.cpp:  if (!FD->hasAttr<OpenCLKernelAttr>())
lib/CodeGen/TargetInfo.cpp:    if (FD->hasAttr<OpenCLKernelAttr>()) {
lib/CodeGen/TargetInfo.cpp:    if (FD->hasAttr<OpenCLKernelAttr>()) {
lib/CodeGen/TargetInfo.cpp:  return D->hasAttr<OpenCLKernelAttr>() ||
lib/CodeGen/TargetInfo.cpp:  if (M.getLangOpts().OpenCL && FD->hasAttr<OpenCLKernelAttr>() &&
lib/Parse/ParseDecl.cpp:void Parser::ParseOpenCLKernelAttributes(ParsedAttributes &attrs) {
lib/Parse/ParseDecl.cpp:      ParseOpenCLKernelAttributes(DS.getAttributes());
lib/Sema/SemaDecl.cpp:        if (FD && !FD->hasAttr<OpenCLKernelAttr>()) {
lib/Sema/SemaDecl.cpp:        if (FD && FD->hasAttr<OpenCLKernelAttr>()) {
lib/Sema/SemaDecl.cpp:  if (getLangOpts().OpenCL && NewFD->hasAttr<OpenCLKernelAttr>()) {
lib/Sema/SemaDecl.cpp:        << FD->hasAttr<OpenCLKernelAttr>();
lib/Sema/SemaDecl.cpp:  if (FD->hasAttr<OpenCLKernelAttr>())
lib/Sema/SemaDeclAttr.cpp:    handleSimpleAttribute<OpenCLKernelAttr>(S, D, AL);
lib/Sema/SemaDeclAttr.cpp:  if (!D->hasAttr<OpenCLKernelAttr>()) {

I don't mind either way but I would like the decision to be based on the analysis of clang code base please!

@agozillon :

Just my two cents, I think a separation of concerns and having separate attributes will simplify things long-term.

This can potentially be a fair point!

While possibly similar just now, the SYCL specification is evolving and may end up targeting more than just OpenCL. So the semantics of the attributes may end up being quite different, even if at the moment the SYCL attribute is there mostly just to mark something for outlining.

This is really great! But unless you provide concrete information what the evolution is and what exactly you are trying to achieve and how it affect compiler design there is no way to review your patches.

Let me know how I can help here.

Additional note. I've submitted initial version of SYCL compiler design document to the GItHub: https://github.com/intel/llvm/blob/sycl/sycl/doc/SYCL_compiler_and_runtime_design.md. Please, take a look and let me know if you have questions.

Thanks for sharing! I will try to find time to look into this and provide my feedback if any.

clang/include/clang/Sema/Sema.h
11182

This deserves more explanation. I would suggest to just look at the code around to follow the style!

clang/lib/Sema/SemaSYCL.cpp
24

This is probably not something we can change at this point but I wish we could avoid complexities like this. :(

I think this is also preventing traditional linking of translation units. That is somewhat unfortunate.

It is good direction however to keep this logic in a separate dedicated compilation unit.

I would suggest to document it a bit more including any current limitations/assumption that you can mark under FIXME i.e. does your code handle lambdas yet, what if lambdas are used in function parameters, etc...

clang/test/SemaSYCL/device-attributes-on-non-sycl.cpp
2

Another confusion I have at the moment even though it doesn't belong to this patch - isn't SYCL based on C++11?

bader added a comment.Mon, May 27, 7:16 AM

Design question: since you are not aware what functions are to be run on a device while parsing them, at what point do you plan to diagnose the invalid behavior from the standard C++ i.e. using function pointers in kernel code is likely to cause issues?

We are going to use DeviceDiagBuilder and related infrastructure implemented in Clang to diagnose device side code errors in OpenMP/CUDA modes.
More details are in the comments here:
https://clang.llvm.org/doxygen/classclang_1_1Sema_1_1DeviceDiagBuilder.html#details

Also do you need to outline the data structures too? For example classes used on device are not allowed to have virtual function.

Yes. This restriction is already implemented in our code base on GitHub.

clang/include/clang/Basic/Attr.td
1022

@Anastasia, I've looked at the occurrences of OpenCLKernelAttr attribute and it looks like the only part useful for SYCL is lib/CodeGen/CodeGenFunction.cpp, which emits OpenCL specific metadata required for SPIR-V translation.

Sema part is mostly not relevant for SYCL mode because SYCL API do not allow the cases currently detected by clang (e.g. constant address space variable declaration in OpenCL kernel scope, naming OpenCL kernel main, etc).
A couple of check that might be useful are:

  • void return type for kernel functions
  • kernel can't be static function

and some of the checks are harmful for proposed implementation (e.g. kernels can't be template functions).

@Anastasia, @keryell, @agozillon and @aaron.ballman need to agree if this sufficient to justify the re-use of OpenCL kernel attribute.
Let me know if you need any additional information to make a decision.

clang/lib/Sema/SemaSYCL.cpp
24

I think this is also preventing traditional linking of translation units.

Could you elaborate more on this topic, please?
What do you mean by "traditional linking of translation units" and what exactly "is preventing" it?
Do you compare with the linking of regular C++ code (i.e. which do not split into host and device code)?
If so, SYCL is different from this model and more similar to CUDA/OpenMP models, which also skip "linking" of irrelevant part (e.g. host code is not linked by the device compiler).
Mariya added Justin (@jlebar) and Alexey (@ABataev), who work on single-source programming models to make them aware and provide feedback if any.

Design question: since you are not aware what functions are to be run on a device while parsing them, at what point do you plan to diagnose the invalid behavior from the standard C++ i.e. using function pointers in kernel code is likely to cause issues?

We are going to use DeviceDiagBuilder and related infrastructure implemented in Clang to diagnose device side code errors in OpenMP/CUDA modes.
More details are in the comments here:
https://clang.llvm.org/doxygen/classclang_1_1Sema_1_1DeviceDiagBuilder.html#details

Just a thought, if you parse host code first and provide the device outlining information to the device compilation phase would you then be able to reuse more parsing functionality from OpenCL?

Also do you need to outline the data structures too? For example classes used on device are not allowed to have virtual function.

Yes. This restriction is already implemented in our code base on GitHub.

Cool, is it implemented in SemaSYCL.cpp too?

clang/include/clang/Basic/Attr.td
1022

Sema part is mostly not relevant for SYCL mode because SYCL API do not allow the cases currently detected by clang (e.g. constant address space variable declaration in OpenCL kernel scope, naming OpenCL kernel main, etc).

Would you mind pointing me to your impl of those?

A couple of check that might be useful are:

void return type for kernel functions
kernel can't be static function

and some of the checks are harmful for proposed implementation (e.g. kernels can't be template functions).

@Anastasia, @keryell, @agozillon and @aaron.ballman need to agree if this sufficient to justify the re-use of OpenCL kernel attribute.
Let me know if you need any additional information to make a decision.

Ok, if from ~20 occurrences in the source code you will be able to reuse only just 2 it doesn't seem like it's worth to share __kernel attribute.

clang/lib/Sema/SemaSYCL.cpp
24

Yes indeed, I mean linking of modules in C/C++ even though it doesn't necessarily mean linking of object files. So you don't plan to support SYCL_EXTERNAL in clang?

In CUDA the functions executed on device are annotated manually using __device__ hence separate translation units can specify external device function... although I don't know if CUDA implementation in clang support this.

I guess OpenMP is allowed to fall back to run on host?

Fznamznon added inline comments.Tue, May 28, 4:35 AM
clang/test/SemaSYCL/device-attributes-on-non-sycl.cpp
2

Sorry for confusion. The C++ features used in SYCL are a subset of the C++11 standard features.
I will add -std=c++11 key to run line to avoid such confusion in future.

Fznamznon updated this revision to Diff 201641.Tue, May 28, 5:07 AM

Applied comments from @Anastasia

  • Added documentation for sycl_kernel function
  • Added comments to Sema.h
  • Added -std=c++11 to test run lines
Anastasia added inline comments.Thu, May 30, 10:53 AM
clang/include/clang/Basic/Attr.td
1022

Undocumented -> SYCLKernelDocs

clang/include/clang/Basic/AttrDocs.td
269

The example doesn't demonstrate the use of the attribute.

It explains how it is used by the toolchain only!

May be @aaron.ballman can help here as I am not sure what the format should be.

clang/lib/Parse/ParseAST.cpp
171

Do you also need to prevent generation of non-device functions somehow?

clang/lib/Sema/SemaSYCL.cpp
24

Ping!

I would suggest to document it a bit more including any current limitations/assumption that you can mark under FIXME i.e. does your code handle lambdas yet, what if lambdas are used in function parameters, etc...

clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
5520–5527

May be this should go into a helper function as it seems to be now a bigger chunk of code that is repeated?

Although, I am not very familiar with this code. You can try to get someone to review who has contributed to this more recently.

clang/test/CodeGenSYCL/device-functions.cpp
25

I can't see where the SPIR calling convention is currently set for SYCL?

clang/test/SemaSYCL/device-attributes-on-non-sycl.cpp
4

I don't think this comment is necessary.

Fznamznon added inline comments.Fri, May 31, 5:52 AM
clang/include/clang/Basic/Attr.td
1022

Oh, Thank you for that!

clang/lib/Parse/ParseAST.cpp
171

I think It's already prevented by change to CodeGenModule.cpp in this patch. CodeGen just ignores declarations without SYCL device attribute now.

clang/lib/Sema/SemaSYCL.cpp
24

Oh, sorry, I missed this comment when I updated patch last time.
Could you please advise in which form I can document it?

clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
5520–5527

I think this chunk of code seems big because of big repeated comment.

clang/test/CodeGenSYCL/device-functions.cpp
25

If I understand correct it's set automatically on AST level because we use SPIR-based triple for device code. Only in case of C++ methods clang doesn't set SPIR calling convention. We did a modification in our codebase to get SPIR calling convention for C++ methods too (available here )

Anastasia added inline comments.Mon, Jun 3, 3:00 AM
clang/test/CodeGenSYCL/device-functions.cpp
25

Ok and what happens if some other target is used - not SPIR?

Fznamznon added inline comments.Mon, Jun 3, 3:28 AM
clang/test/CodeGenSYCL/device-functions.cpp
25

There will be no SPIR calling convention for device functions.

Anastasia added inline comments.Mon, Jun 3, 7:54 AM
clang/test/CodeGenSYCL/device-functions.cpp
25

Just FYI at some point we generalized SPIR calling convention to be used for kernels irrespective from target by default (see TargetCodeGenInfo::getOpenCLKernelCallingConv). Not sure if it might make sense to do for SYCL device functions too. I am not saying it belongs to this patch though.

Fznamznon updated this revision to Diff 203785.Mon, Jun 10, 2:24 AM

Applied comments from @Anastasia

  • Added link to documentation for sycl_device attribute
  • Removed redundant comment from test

@Anastasia, do you have additional comments?

@aaron.ballman , please let me know if you have additional comments/suggestions. If not, could you please accept this revision?

Most of the comments are about minor nits like grammar and coding conventions, but I did have some questions regarding what kinds of functions the sycl_kernel attribute gets applied to. Also, I'd like to see some additional tests that demonstrate the sycl device attribute is being implicitly created on the proper declarations as expected (can probably do that using -ast-dump and checking to see if the right functions have the attribute attached).

clang/include/clang/Basic/AttrDocs.td
259

is SYCL "kernel function" -> is a SYCL "kernel function"

260

SYCL -> A SYCL

261

Kernel is a -> A kernel is a

263–264

This doesn't really demonstrate the need for the attribute -- the attribute is never shown in the code example. I'd prefer an example that shows when and how a user would write this attribute.

278

called SYLC -> called a SYLC

280

use sycl_kernel -> use the sycl_kernel

281

as SYCL -> as a SYCL
Compiler is supposed to -> The compiler will

284

In this code example compiler is supposed to add "foo" function -> In this code example, the compiler will add the "foo" function

clang/include/clang/Sema/Sema.h
11182

Function -> function

11183

In SYCL, when we generate device code, we don't know

11184

we emit sycl kernels, so we add device

11189

adds the function declaration

11190

Should be named addSyclDeviceFunc() per coding standards. Similar for the other new functions.

11194

Don't repeat the function name in the comments, please. Also, rather than returning a concrete SmallVector<>, I think it would be more natural to return a SmallVectorImpl so that callers don't have to contend with the explicit size. There should also be a const overload for this function.

11197

Constructs a SYCL kernel that is compatible with OpenCL from the SYCL "kernel

11200–11201

Marks all functions accessible from SYCL kernels with the SYCL device attribute

clang/lib/CodeGen/CodeGenModule.cpp
2410

with the SYCL device attribute

2412–2415

These if statements can be combined.

2533

Missing a full stop at the end of the comment.

clang/lib/Sema/SemaSYCL.cpp
14

This include doesn't seem to be necessary?

23

e does not use our usual naming conventions.

41

Spurious whitespace can be removed.

44

Elide braces.

48

Don't use auto as the type is not spelled out in the initialization.

52

Elide braces.

68

with the SYCL device attribute

70

elt -> Elt per naming conventions

71

auto * since the type is spelled out in the initialization.

73

Elide braces

clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
5523

for the SYCL kernel attribute

5525

Elide braces

5537

for the SYCL kernel attribute

5539

Elide braces

clang/test/SemaSYCL/device-attributes-on-non-sycl.cpp
4

#ifndef ?

11

I'd prefer to spell this with __attribute__, same in the other test

clang/test/SemaSYCL/device-attributes.cpp
3

I'd like to see some more tests covering less obvious scenarios. Can I add this attribute to a lambda? What about a member function? How does it work with virtual functions? That sort of thing.

Fznamznon added inline comments.Tue, Jun 18, 8:01 AM
clang/test/SemaSYCL/device-attributes.cpp
3

Actually there is no restrictions for adding this attribute to any function to outline device code so I just checked the simplest variant.

But I'm working on new patch which will put some requirements on function which is marked with sycl_kernel attribute.
This new patch will add generation of OpenCL kernel from function marked with sycl_kernel attribute. The main idea of this approach is described in this document (in this document generated kernel is called "kernel wrapper").
And to be able to generate OpenCL kernel using function marked with sycl_kernel attribute we put some requirements on this function, for example it must be a template function. You can find these requirements and example of proper function which can be marked with sycl_kernel in this comment .

aaron.ballman added inline comments.Tue, Jun 18, 3:15 PM
clang/test/SemaSYCL/device-attributes.cpp
3

Actually there is no restrictions for adding this attribute to any function to outline device code so I just checked the simplest variant.

So there are no concerns about code like:

struct Base {
  __attribute__((sycl_kernel)) virtual void foo();
  virtual void bar();
};

struct Derived : Base {
  void foo() override;
  __attribute__((sycl_kernel)) void bar() override;
};

void f(Base *B, Derived *D) {
  // Will all of these "do the right thing"?
  B->foo();
  B->bar();

  D->foo();
  D->bar();
}