Note: WIP patch 2/3 to go with a RFC for the device RTL design (see D64217)
This NFCI patch includes the following cleanup steps: - Adjust the code according to the LLVM coding style, especially wrt. variable and method names. - Document the code with doxygen comments. - Change the comments to be less NVPTX specific. - Wrap CUDA specific calls into __kmpc_impl_XXX functions and define them in an own target_impl.h file. - Use a templated barrier implementation to remove code duplication. - Use a (macro) generator to reduce code duplication.
We don't support cancellation in the GPU runtime currently, so I think it is better to set IsCancellable to false to make it clear that cancellation is not supported yet.