Page MenuHomePhabricator

[CUDA] Make sure we emit all templated __global__ functions on device side. Again.

Authored by tra on Aug 12 2015, 2:30 PM.



This is a somewhat different way to do it than D11666 which got rolled back.

Codegen postpones emitting instantiated kernel function template until it's used.
If kernel is used only from the host side (which is normally the case) we'll never emit
it because on device side we don't emit the host code that uses it.

The change allows CUDA kernels to be emitted on device side unconditionally.
It's overly conservative and may emit more functions than we really need, but it
guarantees that the kernels launched from the host side are do exist on device-side.
In case it ever causes issues, there are other ways to address the issue,
though they are more invasive and are currently not worth the trouble.

Diff Detail

Event Timeline

tra updated this revision to Diff 31973.Aug 12 2015, 2:30 PM
tra retitled this revision from to [CUDA] Make sure we emit all templated __global__ functions on device side. Again..
tra updated this object.
tra added reviewers: echristo, rsmith, eliben.
tra added a subscriber: cfe-commits.
echristo accepted this revision.Aug 12 2015, 2:34 PM
echristo edited edge metadata.

LGTM. Thanks for working on this.


This revision is now accepted and ready to land.Aug 12 2015, 2:34 PM
eliben accepted this revision.Aug 12 2015, 2:44 PM
eliben edited edge metadata.



It't not clear what this metadata is part of? I'm guessing llvm.used, so maybe make that explicit with an earlier CHECK?

tra abandoned this revision.Sep 8 2015, 10:46 AM

Emitting IR is not sufficient to ensure that the kernels survive GDCE, so the patch does not work with optimizations on.
D11666 would have to do for now.