This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Report "unsupported VLA" errors only on device side.
ClosedPublic

Authored by tra on Nov 20 2017, 4:34 PM.

Details

Summary

This fixes erroneously reported CUDA compilation errors in host-side code during device-side compilation.

I've also restricted OpenMP-specific checks to trigger only if we're compiling with OpenMP enabled.

Diff Detail

Repository
rL LLVM

Event Timeline

tra created this revision.Nov 20 2017, 4:34 PM
tra updated this revision to Diff 123690.Nov 20 2017, 4:44 PM

Folded OpenCL check under if (T->isVariableArrayType())

Hahnfeld requested changes to this revision.Nov 20 2017, 4:46 PM

In D39505 @rjmccall requested that the check should be made independent of the language. To preserve this, I think the CUDA specific checks should be added to the generic case instead of restricting its evaluation.

This revision now requires changes to proceed.Nov 20 2017, 4:46 PM
tra added a comment.Nov 20 2017, 4:52 PM

In D39505 @rjmccall requested that the check should be made independent of the language. To preserve this, I think the CUDA specific checks should be added to the generic case instead of restricting its evaluation.

I'm not sure what exactly you or @rjmccall have in mind. Specifically - what is the 'generic case' CUDA checks should be added to? Could you give me an example?

And please add a regression test which is apparently missing for the case that a VLA is NOT diagnosed in CUDA mode

tra added a comment.Nov 20 2017, 4:55 PM

And please add a regression test which is apparently missing for the case that a VLA is NOT diagnosed in CUDA mode

Hmm. We do have test/SemaCUDA/vla.cu which should've triggered the error. Let me see why it didn't happen.

In D40275#930981, @tra wrote:

In D39505 @rjmccall requested that the check should be made independent of the language. To preserve this, I think the CUDA specific checks should be added to the generic case instead of restricting its evaluation.

I'm not sure what exactly you or @rjmccall have in mind. Specifically - what is the 'generic case' CUDA checks should be added to? Could you give me an example?

Not supporting VLAs is a property of the target we are compiling for, see newly added Context.getTargetInfo().isVLASupported(). So neither CUDA nor OpenMP are special cases in general, it's rather that the targeted architecture doesn't support that feature. What is a special case though is that both CUDA and OpenMP analyze the complete host code again and we need to suppress the diagnostic if the VLA is encountered in the host code that is never codegen'd for the device. For OpenMP, this special case is encoded in shouldDiagnoseTargetSupportFromOpenMP (a horrible name - suggestions welcome!) and I think you should add a similar check for CUDA.

tra updated this revision to Diff 123694.Nov 20 2017, 5:15 PM
tra edited edge metadata.

Updates CUDA's VLA test to use nvptx triple.

rjmccall added inline comments.Nov 20 2017, 5:52 PM
clang/lib/Sema/SemaType.cpp
2188 ↗(On Diff #123694)

Please write this check so that it trips in an "ordinary" build on a target that just happens to not support VLAs, something like:

else if (!Context.getTargetInfo().isVLASupported() && shouldDiagnoseTargetSupportFromOpenMP())

If you want to include the explicit OpenMP check there, it would need to be:

else if (!Context.getTargetInfo().isVLASupported() && (!getLangOpts().OpenMP || shouldDiagnoseTargetSupportFromOpenMP()))

but I think the first looks better.

The CUDA and OpenMP paths here seem to be trying to achieve analogous things; it's unfortunate that we can't find a way to unify their approaches, even if we'd eventually want to use different diagnostic text. I imagine that the target-environment language restrictions are basically the same, since they arise for the same fundamental reasons, so all the places using CUDADiagIfDeviceCode are likely to have a check for shouldDiagnoseTargetSupportFromOpenMP() and vice-versa.

tra added inline comments.Nov 21 2017, 9:45 AM
clang/lib/Sema/SemaType.cpp
2188 ↗(On Diff #123694)

The problem is that in CUDA we can't just do

if (!Context.getTargetInfo().isVLASupported() && shouldDiagnoseTargetSupportFromOpenMP())
   Diag(Loc, diag::err_vla_unsupported);

In some situations diag messages will only be emitted if we attempt to generate unsupported code on device side.
Consider

__host__ __device__ void foo(int n) {
  int vla[n];
}

When Sema sees this code during compilation, it can not tell whether there is an error. Calling foo from the host code is perfectly valid. Calling it from device code is not. CUDADiagIfDeviceCode creates 'postponed' diagnostics which only gets emitted if we ever need to generate code for the function on device.

So, while CUDA and OpenMP do similar things, they are not quite the same. If your goal to generalize CUDA and OpenMP handling, then it would have to be folded into diagnostic-emitting itself and we'll need an analog of CUDADiagIfDeviceCode which can handle both OpenMP and CUDA.
E.g. something like this:

??? DiagIfDeviceCode(???) {
   if (OpenCL || (OpenMP && shouldDiagnoseTargetSupportFromOpenMP()))
       Diag(...);
   else if (CUDA)
       CUDADiagIfDeviceCode()
} 

...

if (!Context.getTargetInfo().isVLASupported()) 
   DiagIfDeviceCode();

Would that work for you?

When Sema sees this code during compilation, it can not tell whether there is an error. Calling foo from the host code is perfectly valid. Calling it from device code is not. CUDADiagIfDeviceCode creates 'postponed' diagnostics which only gets emitted if we ever need to generate code for the function on device.

Interesting. I suspect that we'll end up dealing with this problem for OpenMP as well (in the future - for OpenMP v5). In this next version (for which the draft is available here: http://www.openmp.org/wp-content/uploads/openmp-TR6.pdf), we'll have "implicit declare target" functions (whereby we generate target code based on the locally-defined subset of the transitive closure of the call graph starting from target regions).

tra updated this revision to Diff 123823.Nov 21 2017, 10:27 AM

Updated to partially address rjmccall@ comments.

clang/lib/Sema/SemaType.cpp
2188 ↗(On Diff #123694)

There's another issue with this approach -- diagnostics itself. Each dialect has its own. Specifically CUDA diags have details that are relevant only to CUDA. I suspect OpenMP has something specific as well. If we insist emitting only one kind of error for particular case across all dialects, we'll have to stick to bare bones "feature X is not supported" which will not have sufficient details to explain why the error was triggered in CUDA.

IMO dialect-specific handling of cuda errors in this case is the lesser evil.

I'll update the patch to handle non-cuda cases the way you suggested.

tra added a comment.Nov 21 2017, 10:37 AM

When Sema sees this code during compilation, it can not tell whether there is an error. Calling foo from the host code is perfectly valid. Calling it from device code is not. CUDADiagIfDeviceCode creates 'postponed' diagnostics which only gets emitted if we ever need to generate code for the function on device.

Interesting. I suspect that we'll end up dealing with this problem for OpenMP as well (in the future - for OpenMP v5). In this next version (for which the draft is available here: http://www.openmp.org/wp-content/uploads/openmp-TR6.pdf), we'll have "implicit declare target" functions (whereby we generate target code based on the locally-defined subset of the transitive closure of the call graph starting from target regions).

We've been contemplating treating all functions as __host__ __device__ by default. After all, most of the source code is target-agnostic. Currently a lot of templated code must be __host__ __device__ in order to be usable and it's a major obstacle to making standard library and other template libraries (somewhat) usable on device. Alas, making __host__ __device__ the default would be a major departure from CUDA semantics.

tra updated this revision to Diff 123831.Nov 21 2017, 11:33 AM

Updated CUDA tests

tra added a comment.Nov 22 2017, 1:19 PM

@rjmccall : are you OK with this approach? If VLA is not supported by the target, CUDA is handled as a special case so it can emit deferred diag, OpenMP reports an error only if shouldDiagnoseTargetSupportFromOpenMP() allows it, and everything else does so unconditionally.

tra added a comment.Nov 27 2017, 3:54 PM
In D40275#933253, @tra wrote:

@rjmccall : are you OK with this approach? If VLA is not supported by the target, CUDA is handled as a special case so it can emit deferred diag, OpenMP reports an error only if shouldDiagnoseTargetSupportFromOpenMP() allows it, and everything else does so unconditionally.

@rjmccall : ping.

rjmccall accepted this revision.Nov 28 2017, 12:40 AM
In D40275#937010, @tra wrote:
In D40275#933253, @tra wrote:

@rjmccall : are you OK with this approach? If VLA is not supported by the target, CUDA is handled as a special case so it can emit deferred diag, OpenMP reports an error only if shouldDiagnoseTargetSupportFromOpenMP() allows it, and everything else does so unconditionally.

@rjmccall : ping.

Sorry for the delay; I took Thanksgiving week off. Yes, I think this patch is fine now, thanks.

rjmccall added inline comments.Nov 28 2017, 12:50 AM
clang/lib/Sema/SemaType.cpp
2188 ↗(On Diff #123694)

If there really is interesting language-specific information to provide in a diagnostic, I agree that it's hard to avoid having different code for different targets. On the other hand, the CUDA-specific information in this specific diagnostic seems unnecessary — does the user really care about whether the function was 'device' vs. 'host device'? in fact, isn't the latter a bit misleading? — and I feel like a generic "cannot use variable-length arrays when compiling for device 'nvptx64'" would be perfectly satisfactory in both CUDA and OpenMP, and that's probably true for almost all of these diagnostics.

On a completely different note, I do want to point out that the only thing you actually *need* to ban here is declaring a local variable of VLA type. There's no reason at all to ban VLA types in general; they just compile to extra arithmetic.

Hahnfeld accepted this revision.Nov 28 2017, 6:27 AM

I'm fine if @rjmccall is fine - but looks like I need to accept this revision because I requested changes in the past?

This revision is now accepted and ready to land.Nov 28 2017, 6:27 AM
tra added inline comments.Nov 28 2017, 10:27 AM
clang/lib/Sema/SemaType.cpp
2188 ↗(On Diff #123694)

Agreed. While host/device attributes are part of a function signature in CUDA, in this case only 'device' part makes the difference and therefore common-style reporting the way you suggest would be sufficient to indicate the error in the device-side code.

As for the VLA types, clang can currently compile code with VLA arguments: https://godbolt.org/g/43hVu9
Clang explicitly does not support GCC's VLA-in-structure extension.
Are there any other use cases you can think of?

This revision was automatically updated to reflect the committed changes.
rjmccall added inline comments.Nov 28 2017, 12:06 PM
clang/lib/Sema/SemaType.cpp
2188 ↗(On Diff #123694)

You can just make a VLA type locally (as a typedef, or just as a cast operand) and use it for pointer arithmetic.