[libomptarget] Change nvcc compilation to use a unity build
This allows nvcc to inline functions between what would otherwise be distinct
translation units, which in turn removes any runtime cost from implementing
functions in source files (as opposed to inline in headers).
This will then allow the circular dependencies in deviceRTL to be readily
broken and individual components more easily shared between architectures.
Why is this still needed, or put differently, why do we not compile the unity.cu into bitcode where this variable (cuda_src_files) is still used below?