This fixes host-side LTO during CUDA compilation. Before, LTO
pipeline construction was clashing with CUDA pipeline construction.
At the moment there's no point doing LTO on device side as each
device-side TU is a complete program. We will need to figure out
compilation pipeline construction for the device-side LTO when we
have working support for multi-TU device-side CUDA compilation.