The root cause of this is, in NVPTX::Assembler::ConstructJob, the output file name might not match the Output's file name passed into the function because CudaToolChain::getInputFilename is a specialized version. That means the real output file is not added to the temp files list, which will be all removed in the d'tor of Compilation. In order to "fix" it, in the function NVPTX::OpenMPLinker::ConstructJob, before calling clang-nvlink-wrapper, the function calls getToolChain().getInputFilename(II) to get the right output file name for each input, and add it to temp file, and then they can be removed w/o any issue. However, this whole logic doesn't work when using the new OpenMP driver because NVPTX::OpenMPLinker::ConstructJob is not called at all, which causing the issue that the cubin file generated in each single unit compilation is out of track.
In this patch, we add the real output file into temp files if its name doesn't match Output. We add it when the file is an output instead of doing it when it is an input, like what we did in NVPTX::OpenMPLinker::ConstructJob, which makes more sense.