When compiling the HOST part of CUDA programs, clang replaces device kernels with so-called "stub" functions that contains a few calls to the Runtime API functions (which set the kernel argument values and launch the kernel itself). The stub functions are very small, so they may have identical generated code for different kernels with same arguments.
The Microsoft Linker has an optimization called "COMDAT Folding". It's able to detect functions with identical binary code and "merge" them, i.e. replace calls and pointers to those different functions with call/pointer to one of them and eliminate other copies.
Here is the description of this optimization: https://docs.microsoft.com/en-us/cpp/build/reference/opt-optimizations?view=vs-2019
That page contains a warning about "COMDAT Folding":
"Because /OPT:ICF can cause the same address to be assigned to different functions or read-only data members (that is, const variables when compiled by using /Gy), it can break a program that depends on unique addresses for functions or read-only data members."
That's exactly what happens to the CUDA stub functions.
This change disables setting "COMDAT" attribute for CUDA stub functions in the HOST code.