[libomptarget] Build cuda plugin without cuda installed locally
Compiles a new file, plugins/cuda/dynamic_cuda/cuda.cpp, to an object file that exposes the same symbols that the plugin presently uses from libcuda. The object file contains dlopen of libcuda and cached dlsym calls. Also provides a cuda.h containing the subset that is used.
This lets the cmake file choose between the system cuda and a dlopen shim, with no changes to rtl.cpp.
The corresponding change to amdgpu is postponed until after a refactor of the plugin to reduce the size of the hsa.h stub required
This probably warrants a variable to force one choice or the other, even if cuda is available on the system.
Also, not sure what the default should be when either option is available.