This allows multi-module / incremental compilation environments to have unique global CUDA constructor and destructor function names.
This feature is necessary for the cling (https://github.com/root-project/cling), which based on the clang. Cling is a C++-Interpreter (technically, it is a JIT with an interactive frontend – the using is really similar to the python interpreter) , which is developed by a team of the CERN. I want to add a new feature, which allows to interpreter CUDA-code, which is written with the Runtime-API.
This request address the follow problem. Compiling a cuda program with clang generates one llvm module per TU. Every llvm module has a cuda ctor and dtor (if a cuda fatbinary exist), with at least a function call to register the fatbinary. The ctor/dtor can also include function calls to register global functions and variables at runtime, depending on user's code.
In cling, we do not have finalized TU and instead it can be extended. The TU is extended with llvm modules as long as the governing cling instance is running. As we type a new line of code we generate a new llvm module which is added to the TU. Cling detects functions by the name. If the name (symbol) already exists it uses the existing translation. Otherwise it translates the function on first use (but it never translates twice).
If we iteratively (and iteractively) add new CUDA code to a governing cling instance, we have more than one cuda *module* ctor/dtor per TU. But the problem is that the *content* of every ctor/dtor function can be different. Unfurtunately we can not differenciate them by symbol name yet, since they all get the same name. So Cling will always use the translation of the first module.
In order to solve this problem, I added the module name (which is unique in cling) as a suffix to the cuda ctor/dtor function name as “_<ModuleName>” . For clang the ModuleName is by default the name of the input file – as in D34059 we escape its name for sanity. This means symbols will change with this patch (are ABI incompatible with previous releases). This solution is identical to the patch in https://reviews.llvm.org/D34059 – I just removed the file ending for brevity from the symbols.
Just for reference, a prototype of our CUDA JIT is available under:
https://github.com/SimeonEhrig/CUDA-Runtime-Interpreter
In addition to that prototype, we added the functionality to cling itself and this is the only additional clang patch we need.
https://github.com/SimeonEhrig/cling/tree/cudaDeviceSide
Please explain in the comment *why* you're doing this. It's just for debugging, right? So that it's known which object file the constructor function comes from.