nvcc generates a unique registration function for each object file
that contains relocatable device code. Unique names are achieved
with a module id that is also reflected in the function's name.
Details
Diff Detail
- Repository
- rC Clang
Event Timeline
I didn't write tests for this yet, but I wanted to get some early feedback on this and show what I have in mind.
lib/CodeGen/CGCUDANV.cpp | ||
---|---|---|
333–334 | Can we actually have multiple GPU binaries here? If yes, how do I get there? | |
334–335 | @jlebar The same here, probably __NV_CUDA,__nv_module_id? | |
335–336 | @jlebar Could yo help me here as I don't have a Mac? I'd guess it's __NV_CUDA,__nv_relfatbin but I'd feel better if I can get a confirmation... |
Still no regression tests.
I did some functional tests though (https://reviews.llvm.org/F5822023): With this patch Clang can generate valid object files with relocatable device code. For linking I still defer to nvcc and I'm not sure if I'm interested in reverse-engineering the needed tools to make this fully work with Clang's Driver: I think the biggest advantage of CUDA in Clang is using LLVM's CodeGen. Note that (in my simple tests) Clang's object files had enough compatibility to mix them with other objects generated by nvcc (see Makefile.mixed)!
lib/CodeGen/CGCUDANV.cpp | ||
---|---|---|
333–334 | Yes. clang --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_50... will compile for sm_35 and sm_50 and then will pass the names of GPU-side objects to the host compilation via -fcuda-include-gpubinary. |
lib/CodeGen/CGCUDANV.cpp | ||
---|---|---|
333–334 | I'm not sure if that's true anymore: I think they are now combined by fatbinary. This seems to be confirmed by test/Driver/cuda-options.cu. If that was the only use case, we may try to get rid of this possibility, let me check this. |
lib/CodeGen/CGCUDANV.cpp | ||
---|---|---|
333–334 | You are correct. All GPU binaries are in the single fatbin now. We should remove this loop and make CGM.getCodeGenOpts().CudaGpuBinaryFileNames a scalar. |
lib/CodeGen/CGCUDANV.cpp | ||
---|---|---|
333–334 | Ok, I'll work on this as a preparation patch and rebase this on top. That actually explains why my changes have always been working even though it didn't handle the loop correctly :-) |
Sorry for the long delay. This update rebases the patch against current trunk and adapts the regression test.
lib/CodeGen/CGCUDANV.cpp | ||
---|---|---|
298–300 | Instead of tracking these through the conditionals of pretty long function, could we make these pointers class fields and init them in the constructor and make accessors return them and, possibly, assert that they are used if RDC is enabled? | |
test/CodeGenCUDA/device-stub.cu | ||
2–6 | Labels could be a bit more descriptive: Long RUN lines could use some re-wrapping. |
Move FunctionTypes to methods and change test prefixes.
lib/CodeGen/CGCUDANV.cpp | ||
---|---|---|
298–300 | I've removed the caching entirely because that's already done by llvm::FunctionType::get(). These are now called in new methods to avoid duplication. |
lib/CodeGen/CGCUDANV.cpp | ||
---|---|---|
364–407 | This can all be folded into the 'else' branch of the 'if' below. |
Instead of tracking these through the conditionals of pretty long function, could we make these pointers class fields and init them in the constructor and make accessors return them and, possibly, assert that they are used if RDC is enabled?