Passes same tests as the current deviceRTL. Includes cmake change from D111987.
Otherwise the missing symbols prevents linking, not clear why it works on nvptx64
This is not good, need to revise sema checking on these intrinsics and add some lowering in clang/llvm that builds the switch. Written longhand here to get things running.
linking what? Clang emits the symbol, maybe just not for amdgpu.
Where? The only reference I can find to it is here, and it's marked extern.
Subscribed some AMD people to this. I wanted to apply this patch as-is to amd-stg-open to feed it to the internal testing, but it doesn't apply because Driver/ToolChains/AMDGPUOpenMP.cpp in rocm is significantly different to trunk (in particular the call to addOpenMPDeviceRTL is commented out)
- Enable tests on amdgpu, with same ones marked xfail/unsupported as on the old runtime
I think this is good enough for now. It drops the not yet used debug variable and writes out the lowering for runtime values of memory ordering manually. The latter will be simplified once clang learns to emit the switch instead of error. Omp lock is a problem I don't have a good solution to.
'amdgcn' appears to be a subset of 'amdgpu', so this seems a reasonable point to rename it.
rearranging the naming here - the llvm-link file is now prefixed linked_, with the optimised library left without a prefix. Updated depends / output clauses to match.
I think the cuda toolchain treats unresolved references as 'just use zero', in which case deleting this is a no-op on nvptx. Maybe it's intended to be patched by cuda rtl.cpp in the future? If so can reintroduce it then
Problem with missing symbol for __omp_rtl_debug_kind was a local error. I did the initial testing of this with a jury rigged clang that linked the new bitcode and ignored the old. The generation of this integer is guarded by which runtime clang thinks it is compiling for. Thus, my local clang compiled for the old runtime and linked with the new, which it turns out does not work.
I was under the impression that pointing clang at the new runtime with libomptarget-amdgcn-bc-path was sufficient, but evidently not. Doesn't matter for this patch now that the variable is reinstated.
Didn't fare very well under CI, investigating. Ten failures at https://lab.llvm.org/buildbot/#/builders/193/builds/915, but they all pass locally.
Error here - syncThreadsAligned is deleted but should not be
Tagging Ron as this is current stuck on the mystery of passing locally and failing CI
Landed a slightly modified version of this - code and test changes are included, but the tests are not run by default. I'm hopeful this will help the process of working out why ~10 are failing on CI and passing locally.