This NFCI patch includes the following cleanup steps:
- Adjust the code according to the LLVM coding style, especially wrt.
variable and method names.
- Document the code with doxygen comments.
- Change the comments to be less NVPTX specific.
- Wrap CUDA specific calls into __kmpc_impl_XXX functions and define
them in an own target_impl.h file.
- Use a templated barrier implementation to remove code duplication.
- Use a (macro) generator to reduce code duplication.