This patch adds the CodeGen changes needed by the standard algorithm offload feature being proposed here: https://discourse.llvm.org/t/rfc-adding-c-parallel-algorithm-offload-support-to-clang-llvm/72159/1, which will only be available for the HIP language on AMD targets. The verbose documentation is included in the head of the patch series. This change concludes the set of additions needed in Clang, and essentially relaxes restrictions on what gets emitted on the device path, when compiling in hipstdpar mode (after the previous patch relaxed restrictions on what is semantically correct):
- Unless a function is explicitly marked __host__, it will get emitted, whereas before only __device__ and __global__ functions would be emitted;
- Unsupported builtins are ignored as opposed to being marked as an error, as the decision on their validity is deferred to the hipstdpar specific code selection pass we are adding, which will be the topic of the final patch in this series;
- We add the stdpar specific passes to the opt pipeline, independent of optimisation level:
- When compiling for the accelerator / offload device, we add a code selection pass;
- When compiling for the host, iff the user requested it via the --hipstdpar-interpose-alloc flag, we add a pass which replaces canonical allocation / deallocation functions with accelerator aware equivalents.
A test to validate that unannotated functions get correctly emitted is added as well. Please note that __device__, __global__ and __host__ are used to match existing nomenclature, they would not be present in user code.
Formatting