The argument after -Xarch_device will be added to the arguments for CUDA/HIP
device compilation and will be removed for host compilation.
The argument after -Xarch_host will be added to the arguments for CUDA/HIP
host compilation and will be removed for device compilation.