Page MenuHomePhabricator

[CUDA] Register relocatable GPU binaries

Authored by Hahnfeld on Feb 5 2018, 11:22 AM.



nvcc generates a unique registration function for each object file
that contains relocatable device code. Unique names are achieved
with a module id that is also reflected in the function's name.

Diff Detail

rC Clang

Event Timeline

Hahnfeld created this revision.Feb 5 2018, 11:22 AM
Hahnfeld planned changes to this revision.Feb 5 2018, 11:28 AM

I didn't write tests for this yet, but I wanted to get some early feedback on this and show what I have in mind.


Can we actually have multiple GPU binaries here? If yes, how do I get there?


@jlebar The same here, probably __NV_CUDA,__nv_module_id?


@jlebar Could yo help me here as I don't have a Mac? I'd guess it's __NV_CUDA,__nv_relfatbin but I'd feel better if I can get a confirmation...

Hahnfeld updated this revision to Diff 133831.Feb 12 2018, 3:59 AM

Rebase and fix Debug build.

Hahnfeld planned changes to this revision.Feb 12 2018, 4:06 AM

Still no regression tests.

I did some functional tests though ( With this patch Clang can generate valid object files with relocatable device code. For linking I still defer to nvcc and I'm not sure if I'm interested in reverse-engineering the needed tools to make this fully work with Clang's Driver: I think the biggest advantage of CUDA in Clang is using LLVM's CodeGen. Note that (in my simple tests) Clang's object files had enough compatibility to mix them with other objects generated by nvcc (see Makefile.mixed)!

tra added inline comments.Feb 16 2018, 9:50 AM

Yes. clang --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_50... will compile for sm_35 and sm_50 and then will pass the names of GPU-side objects to the host compilation via -fcuda-include-gpubinary.

Hahnfeld added inline comments.Feb 16 2018, 10:18 AM

I'm not sure if that's true anymore: I think they are now combined by fatbinary. This seems to be confirmed by test/Driver/ If that was the only use case, we may try to get rid of this possibility, let me check this.

tra added inline comments.Feb 16 2018, 10:32 AM

You are correct. All GPU binaries are in the single fatbin now.
That said, you could still pass extra -fcuda-include-gpubinary to cc1 manually, but I see no practical reason to do it -- single fatbin serves the purpose better.

We should remove this loop and make CGM.getCodeGenOpts().CudaGpuBinaryFileNames a scalar.

Hahnfeld marked an inline comment as done.Feb 16 2018, 10:36 AM
Hahnfeld added inline comments.

Ok, I'll work on this as a preparation patch and rebase this on top. That actually explains why my changes have always been working even though it didn't handle the loop correctly :-)

Hahnfeld updated this revision to Diff 141685.Apr 9 2018, 10:43 AM

Sorry for the long delay. This update rebases the patch against current trunk and adapts the regression test.

Hahnfeld updated this revision to Diff 141698.Apr 9 2018, 11:39 AM

Correct test check prefix.

tra added inline comments.Apr 19 2018, 10:17 AM

Instead of tracking these through the conditionals of pretty long function, could we make these pointers class fields and init them in the constructor and make accessors return them and, possibly, assert that they are used if RDC is enabled?


Labels could be a bit more descriptive:

Long RUN lines could use some re-wrapping.

Hahnfeld updated this revision to Diff 143136.Apr 19 2018, 11:38 AM
Hahnfeld marked 2 inline comments as done.

Move FunctionTypes to methods and change test prefixes.


I've removed the caching entirely because that's already done by llvm::FunctionType::get(). These are now called in new methods to avoid duplication.

tra added inline comments.Apr 19 2018, 11:49 AM

This can all be folded into the 'else' branch of the 'if' below.

Hahnfeld updated this revision to Diff 143145.Apr 19 2018, 11:58 AM
Hahnfeld marked an inline comment as done.

Move module ID to corresponding else branch.

tra accepted this revision.Apr 19 2018, 12:05 PM
This revision is now accepted and ready to land.Apr 19 2018, 12:05 PM
This revision was automatically updated to reflect the committed changes.