nvcc allows host device functions to call host functions with only a warning being produced (host device functions calling device functions is an error in nvcc). This nvcc feature (calling host functions from host device functions) is used by some existing GPU code. Add an option to clang to allow similar behavior. This does not affect code generation and trying to call a host function from the GPU is still an error. We are investigating a more complete solution that would avoid this but this is a first step to allow tools analyzing GPU code to accept the same code as nvcc does.
Diff Detail
Event Timeline
The use case here is more about getting a useful AST out even when the source-program contains errors, right? Can we handle this by intelligent error recovery instead? Nothing stops us from continuing the parse when we encounter this error and producing a partially invalid AST. Would that be sufficient?
That is sort of part of a disagreement I've been having with someone who uses this feature. He feels this is supported behavior and not a program with errors. So if there is code like:
void bar() {} __host__ __device__ foo() { bar(); }
And if foo is never called from device then the program "makes sense", as you are never attempting to have host code executed on the GPU, and the compiled program runs as expected. Now this is a silly example, he is doing some template metaprogramming to generate kernels for both host and device which makes his use-case understandable. Using this patch the code we generate also runs correctly. So it isn't just for analysis as it useful in our code generation too.
Normally I would think this could be fixed by ifdef-guarding on CUDA_ARCH but if bar were to perform a templated kernel launch, which happens in this client's code, then that would not be allowed usage under nvcc.
include/clang/Driver/CC1Options.td | ||
---|---|---|
612 | I think the word "allow" should be in the flag somewhere. How about: "fcuda-allow-host-calls-from-host-device" ? Has the word "allow" AND is shorter ;-) | |
test/SemaCUDA/function-target.cu | ||
1 | Is DTEST_HOST and DTEST_DEVICE really different from the reliance on CUDA_ARCH that was there before? I think this test is getting too complex - maybe it's worthwhile splitting the HD parts to a separate test file |
I'm surprised this change doesn't break the cuda codegen pipeline, because there aren't any changes to CodeGen in this patch. This is specifically relaxing the case of a host+device function calling host while in device mode. It's not actually possible to codegen this function, right? Is codegen already set up to compile this case to runtime error?
include/clang/Driver/CC1Options.td | ||
---|---|---|
612 | +1 for the suggested name. |
test/CodeGenCUDA/host-device-calls-host.cu | ||
---|---|---|
22 | I think this is a more interesting test case: extern "C" { void host_function() {} __host__ __device__ void hd_function(bool b) { if (b) host_function(); } __device__ void device_function() { hd_function(false); } } It actually tests emission of the bogus call, even though it can never occur in practice. What should clang do for that? |
Yes, what happens is that the host function becomes a declaration in the generated LLVM IR and is treated as an extern function in the generated PTX. If a call on the device from host device function to a host function were possible then it would result in a compilation error when the PTX gets compiled at runtime.
include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
6070 | Update this name to the new one for consistency |
test/SemaCUDA/function-target-hd.cu | ||
---|---|---|
4 | Much better thanks. Just a small nit: add a comment that explains the various permutations of CUDA_ARCH and TEST_WARN_HD to make this test file more readable a year from now |
test/CodeGenCUDA/host-device-calls-host.cu | ||
---|---|---|
22 | Both nvcc and clang (with this patch) accepts this and the resulting code executes without errors. clang should still warn that this can cause a runtime failure as there is no call-site analysis performed. |
lgtm
One other mechanism you can consider is a default-error warning. We mostly added this mechanism so that we could emit errors by default, but suppress them in system headers or other places. However, I think this mostly confuses end users who have to pass things like -Wno-host-calls-from-host-device in order to silence something that looks like an error, not a warning.
test/CodeGenCUDA/host-device-calls-host.cu | ||
---|---|---|
22 | Exciting. =D I think throwing this example into the IRgen test suite is nice because it's a good representative edge case. |
Updated the warning message's name and added a description for the permutations in the host device Sema test.
Update this name to the new one for consistency