This is an archive of the discontinued LLVM Phabricator instance.

CUDA: Add option to allow host device functions to call host functions
ClosedPublic

Authored by jpienaar on Feb 23 2015, 3:06 PM.

Details

Reviewers
eliben
rnk
Summary

nvcc allows host device functions to call host functions with only a warning being produced (host device functions calling device functions is an error in nvcc). This nvcc feature (calling host functions from host device functions) is used by some existing GPU code. Add an option to clang to allow similar behavior. This does not affect code generation and trying to call a host function from the GPU is still an error. We are investigating a more complete solution that would avoid this but this is a first step to allow tools analyzing GPU code to accept the same code as nvcc does.

Diff Detail

Event Timeline

jpienaar updated this revision to Diff 20545.Feb 23 2015, 3:06 PM
jpienaar retitled this revision from to CUDA: Add option to allow host device functions to call host functions.
jpienaar updated this object.
jpienaar edited the test plan for this revision. (Show Details)
jpienaar added a reviewer: rnk.
jpienaar added subscribers: Unknown Object (MLST), eliben.
rnk edited edge metadata.Feb 23 2015, 5:45 PM

The use case here is more about getting a useful AST out even when the source-program contains errors, right? Can we handle this by intelligent error recovery instead? Nothing stops us from continuing the parse when we encounter this error and producing a partially invalid AST. Would that be sufficient?

That is sort of part of a disagreement I've been having with someone who uses this feature. He feels this is supported behavior and not a program with errors. So if there is code like:

void bar() {}
__host__ __device__ foo() { bar(); }

And if foo is never called from device then the program "makes sense", as you are never attempting to have host code executed on the GPU, and the compiled program runs as expected. Now this is a silly example, he is doing some template metaprogramming to generate kernels for both host and device which makes his use-case understandable. Using this patch the code we generate also runs correctly. So it isn't just for analysis as it useful in our code generation too.

Normally I would think this could be fixed by ifdef-guarding on CUDA_ARCH but if bar were to perform a templated kernel launch, which happens in this client's code, then that would not be allowed usage under nvcc.

chandlerc edited edge metadata.Feb 23 2015, 6:30 PM
chandlerc added a subscriber: tra.
chandlerc added a subscriber: chandlerc.

Adding Art to the CC so he sees this....

eliben added inline comments.Feb 24 2015, 8:45 AM
include/clang/Driver/CC1Options.td
612

I think the word "allow" should be in the flag somewhere.

How about: "fcuda-allow-host-calls-from-host-device" ?

Has the word "allow" AND is shorter ;-)

test/SemaCUDA/function-target.cu
1

Is DTEST_HOST and DTEST_DEVICE really different from the reliance on CUDA_ARCH that was there before?

I think this test is getting too complex - maybe it's worthwhile splitting the HD parts to a separate test file

rnk added a comment.Feb 24 2015, 8:55 AM

I'm surprised this change doesn't break the cuda codegen pipeline, because there aren't any changes to CodeGen in this patch. This is specifically relaxing the case of a host+device function calling host while in device mode. It's not actually possible to codegen this function, right? Is codegen already set up to compile this case to runtime error?

include/clang/Driver/CC1Options.td
612

+1 for the suggested name.

jpienaar updated this revision to Diff 20606.Feb 24 2015, 10:16 AM

Changed option name, split test into two and added a codegen test.

rnk added inline comments.Feb 24 2015, 10:33 AM
test/CodeGenCUDA/host-device-calls-host.cu
22

I think this is a more interesting test case:

extern "C" {
void host_function() {}
__host__ __device__ void hd_function(bool b) { if (b) host_function(); }
__device__ void device_function() { hd_function(false); }
}

It actually tests emission of the bogus call, even though it can never occur in practice. What should clang do for that?

In D7841#129039, @rnk wrote:

I'm surprised this change doesn't break the cuda codegen pipeline, because there aren't any changes to CodeGen in this patch. This is specifically relaxing the case of a host+device function calling host while in device mode. It's not actually possible to codegen this function, right? Is codegen already set up to compile this case to runtime error?

Yes, what happens is that the host function becomes a declaration in the generated LLVM IR and is treated as an extern function in the generated PTX. If a call on the device from host device function to a host function were possible then it would result in a compilation error when the PTX gets compiled at runtime.

eliben added inline comments.Feb 24 2015, 11:07 AM
include/clang/Basic/DiagnosticSemaKinds.td
6070

Update this name to the new one for consistency

eliben added inline comments.Feb 24 2015, 11:13 AM
test/SemaCUDA/function-target-hd.cu
4

Much better thanks. Just a small nit: add a comment that explains the various permutations of CUDA_ARCH and TEST_WARN_HD to make this test file more readable a year from now

jpienaar added inline comments.Feb 24 2015, 12:40 PM
test/CodeGenCUDA/host-device-calls-host.cu
22

Both nvcc and clang (with this patch) accepts this and the resulting code executes without errors. clang should still warn that this can cause a runtime failure as there is no call-site analysis performed.

rnk accepted this revision.Feb 24 2015, 1:04 PM
rnk edited edge metadata.

lgtm

One other mechanism you can consider is a default-error warning. We mostly added this mechanism so that we could emit errors by default, but suppress them in system headers or other places. However, I think this mostly confuses end users who have to pass things like -Wno-host-calls-from-host-device in order to silence something that looks like an error, not a warning.

test/CodeGenCUDA/host-device-calls-host.cu
22

Exciting. =D I think throwing this example into the IRgen test suite is nice because it's a good representative edge case.

This revision is now accepted and ready to land.Feb 24 2015, 1:04 PM
jpienaar updated this revision to Diff 20620.Feb 24 2015, 1:28 PM
jpienaar edited edge metadata.

Updated the warning message's name and added a description for the permutations in the host device Sema test.

jpienaar added inline comments.Feb 24 2015, 1:31 PM
include/clang/Basic/DiagnosticSemaKinds.td
6070

Done.

test/CodeGenCUDA/host-device-calls-host.cu
22

Done.

test/SemaCUDA/function-target-hd.cu
4

Done.

eliben accepted this revision.Feb 24 2015, 1:39 PM
eliben added a reviewer: eliben.

lgtm

jpienaar closed this revision.Feb 24 2015, 1:49 PM