Page MenuHomePhabricator

[OpenMP][SYCL] Improve diagnosing of unsupported types usage
ClosedPublic

Authored by Fznamznon on Feb 11 2020, 12:12 AM.

Details

Summary

Diagnostic is emitted if some declaration of unsupported type
declaration is used inside device code.
Memcpy operations for structs containing member with unsupported type
are allowed. Fixed crash on attempt to emit diagnostic outside of the
functions.

The approach is generalized between SYCL and OpenMP.
CUDA/OMP deferred diagnostic interface is going to be used for SYCL device.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
bader edited the summary of this revision. (Show Details)Feb 11 2020, 8:51 AM
bader added a reviewer: erichkeane.

I would add a check for the use of unsupported types in kernels. They should not be allowed to be used if target does not support it.

Yeah, I think so. We tried to make it using deferred diagnostics. Unfortunately there isn't a single 'create declaration' type place that we could diagnose this. This resulted to a lot of changes around Sema and probably unhanded cases. For example we need to diagnose each appearance of __float128 type in device code. And __float128 can appear in device code through so many ways, for example through auto types variable declaration and initializing it with __float128 value captured from the host, like this:

 // HOST CODE
  __float128 B = 1; // No errors
...
// DEVICE CODE
  kernel<class some_kernel>([=]() {
          auto C = B; }); // Problem, C will actually have __float128 type!

And for example, we can't just trigger on __float128 type appearance in some code, like the diagnosic which I'm disabling does, because I believe that some unevaluated contexts shouldn't trigger errors, because they don't bring the unsupported type to the device code:

template<typename t> void foo(){};
__float128 nonemittedfunc();

// DEVICE CODE
foo<__float128>(); // This shouldn't bring errors
std::conditional_t<SomeI < 1, decltype(nonemittedfunc()), int> SomeVar; // This shouldn't bring errors

The whole patch with test cases is available here https://github.com/intel/llvm/pull/1040 .
We decided to disable this until we figure out the way how to properly diagnose this.

The right approach here is probably what we do in ObjC ARC when we see types that are illegal in ARC: in system headers, we allow the code but add a special UnavailableAttr to the declaration so that it can't be directly used.

That is straightforward enough that I think you should just do it instead of leaving this as technical debt.

I haven't considered something like this, because I'm not familiar with ObjC at all... I will give it a try, thanks.

The right approach here is probably what we do in ObjC ARC when we see types that are illegal in ARC: in system headers, we allow the code but add a special UnavailableAttr to the declaration so that it can't be directly used.

That is straightforward enough that I think you should just do it instead of leaving this as technical debt.

I haven't considered something like this, because I'm not familiar with ObjC at all... I will give it a try, thanks.

Hi @rjmccall , I assume, I took a look at this.
Let's imagine, I will try to diagnose __float128 type using already implemented functionality. It seems like I need to call something like

S.DelayedDiagnostics.add(                                        
    sema::DelayedDiagnostic::makeForbiddenType(loc,              
        diag::err_type_unsupported, type, "__float128"));
`

right?
I suppose, then this diagnostic will be saved and emitted inside function named handleDelayedForbiddenType.
Here it checks that this forbidden type is actually allowed and emits a diagnostic if it's not.
The first problem that handleDelayedForbiddenType is called too early. We don't know in this place whether we are in SYCL device code or not. Because basically entry point to SYCL device code is a template function with sycl_kernel attribute, and every function it calls become a device function. So we only know where is device code only after templates instantiation, it happens a bit later after handleDelayedForbiddenType call.

It seems that the second problem is the same problem which prevented me from implementing diagnosing of __float128 type through CUDA/OMP deferred diagnostics (I mentioned my attempt in the last comment https://reviews.llvm.org/D74387#1870014). I still need to find best place for diagnostic issuing. It seems that there are so many places where type can actually be introduced to resulting LLVM IR module, and in some of them I need to check some additional conditions to do not prevent __float128 usage when it actually doesn't introduce forbidden type to resulting LLVM IR module.

Please, correct me if I don't understand something or said something wrong.
I would appreciate if you had some advices.

Thanks a lot.

The right approach here is probably what we do in ObjC ARC when we see types that are illegal in ARC: in system headers, we allow the code but add a special UnavailableAttr to the declaration so that it can't be directly used.

That is straightforward enough that I think you should just do it instead of leaving this as technical debt.

I haven't considered something like this, because I'm not familiar with ObjC at all... I will give it a try, thanks.

Hi @rjmccall , I assume, I took a look at this.
Let's imagine, I will try to diagnose __float128 type using already implemented functionality. It seems like I need to call something like

S.DelayedDiagnostics.add(                                        
    sema::DelayedDiagnostic::makeForbiddenType(loc,              
        diag::err_type_unsupported, type, "__float128"));
`

right?
I suppose, then this diagnostic will be saved and emitted inside function named handleDelayedForbiddenType.
Here it checks that this forbidden type is actually allowed and emits a diagnostic if it's not.

This isn't quite right. The goal here is to delay the diagnostic *twice*. The first delay is between the point where we parse/process the type (i.e. SemaType) and the point where we've fully processed the declaration that the type is part of (i.e. SemaDecl). That's the point where we call handleDelayedForbiddenType, and you're right that it's too early to know whether the declaration is really device code. However, you're missing something important about how handleDelayedForbiddenType works: it's never really trying to suppress the diagnostic completely, but just to delay it for certain declarations until the point that the declaration is actually used, under the hope that in fact it will never be used and everything will work out. For ARC, we chose to delay for all declarations in system headers, under the assumption that (1) system headers will never introduce functions that have to be emitted eagerly, and (2) we always want to warn people about problematic code in their own headers. Those choices don't really fit SYCL's use case, and you should change the logic in isForbiddenTypeAllowed to delay your diagnostic for essentially all declarations (except kernels?), since effectively all non-kernel code in device mode is lazily emitted. But if you do that, it should combine well with CUDA/OMP deferred diagnostics:

  • If foo uses __float128 (whether in its signature or internally), that is invalid in device mode, but the diagnostic will be delayed by the forbidden-type mechanism, meaning that it will become an unavailable attribute on foo.
  • If bar uses foo, that use is invalid in device mode (because of the unavailable attribute), but the diagnostic will be delayed via the standard CUDA/OMP mechanism because we don't know yet whether bar should be emitted as a device function.
  • If kernel uses bar, that will trigger the emission of the delayed diagnostics of bar, including the use-of-unavailable-function diagnostic where it uses foo. It should be straightforward to specialize this diagnostic so that it reports the error by actually diagnosing the use of __float128 at the original location (which is recorded in the unavailable attribute) and then just adding a note about how foo is used by bar.

It seems that the second problem is the same problem which prevented me from implementing diagnosing of __float128 type through CUDA/OMP deferred diagnostics (I mentioned my attempt in the last comment https://reviews.llvm.org/D74387#1870014). I still need to find best place for diagnostic issuing. It seems that there are so many places where type can actually be introduced to resulting LLVM IR module, and in some of them I need to check some additional conditions to do not prevent __float128 usage when it actually doesn't introduce forbidden type to resulting LLVM IR module.

The key thing here is that all uses should be associated with some top-level declaration that's either eagerly-emitted in device mode or not.

Fznamznon added a comment.EditedFeb 27 2020, 5:38 AM

@rjmccall, Thank you very much for so detailed response, It really helps. I started working on implementation and I have a couple of questions/problems with this particular appoach.

  • If foo uses __float128 (whether in its signature or internally), that is invalid in device mode, but the diagnostic will be delayed by the forbidden-type mechanism, meaning that it will become an unavailable attribute on foo.

So, for example if some variable is declared with __float128 type, we are adding to parent function Unavaliable attribute, right?

  • If bar uses foo, that use is invalid in device mode (because of the unavailable attribute), but the diagnostic will be delayed via the standard CUDA/OMP mechanism because we don't know yet whether bar should be emitted as a device function.
  • If kernel uses bar, that will trigger the emission of the delayed diagnostics of bar, including the use-of-unavailable-function diagnostic where it uses foo. It should be straightforward to specialize this diagnostic so that it reports the error by actually diagnosing the use of __float128 at the original location (which is recorded in the unavailable attribute) and then just adding a note about how foo is used by bar.

Consider following example (this is absolutely valid SYCL code, except __float128 usage):

// Host code:
__float128 A;
// Everything what lambda passed to `sycl_kernel` calls becomes device code. Capturing of host variables means that these variables will be passed to device by value, so using of A in this lambda is invalid.
sycl_kernel<class kernel_name>([=]() {auto B = A});

In this case we add unavailable attribute to parent function for variable A declaration. But this function is not called from device code. Please correct me if I'm wrong but it seems that we need to diagnose not only functions, but also usages of any declarations with unavailable attribute including variable declarations, right?

In addition, there are a couple of problems with this approach, for example with unevaluated sizeof context, i.e. code like this:

sycl_kernel<class kernel_name>([=]() {int A = sizeof(__float128);});

is diagnosed too, I think this is not correct.

I can upload what I have now to this review if it will help better (or maybe we will understand that I'm doing something wrong).

I'm also thinking about a bit another approach:

  • If some declaration uses __float128 it will become an unavailable attribute on this declaration. That means we don't always add unavailable attribute to the function which uses __float128 internally.
  • In the place where clang actually emits use-of-unavailable-declaration diagnostics (somewhere in DoEmitAvailabilityWarning function, defined in SemaAvailability.cpp) for SYCL, we make these diagnostics deferred using CUDA/OMP deferred diagnostics mechanism (using SYCL-specific analog of function like diagIfOpenMPDeviceCode/CUDADiagIfDeviceCode).

But, for example, this won't emit diagnostics for simple variable declarations in device code which has __float128 type, but is not used anywhere else.

I'm also curious about OpenMP handling of this "unsupported type" problem. @ABataev , Am I right that in OpenMP such diagnostics are emitted only if forbidden type is used in some arithmetical operations? Is it enough to prevent problems on various GPU devices which don't support this type?

@rjmccall, Thank you very much for so detailed response, It really helps. I started working on implementation and I have a couple of questions/problems with this particular appoach.

  • If foo uses __float128 (whether in its signature or internally), that is invalid in device mode, but the diagnostic will be delayed by the forbidden-type mechanism, meaning that it will become an unavailable attribute on foo.

So, for example if some variable is declared with __float128 type, we are adding to parent function Unavaliable attribute, right?

That's how it's supposed to work. I can't guarantee that it will actually always work that way, because I'm sure you'll be pushing on this code in some new ways.

  • If bar uses foo, that use is invalid in device mode (because of the unavailable attribute), but the diagnostic will be delayed via the standard CUDA/OMP mechanism because we don't know yet whether bar should be emitted as a device function.
  • If kernel uses bar, that will trigger the emission of the delayed diagnostics of bar, including the use-of-unavailable-function diagnostic where it uses foo. It should be straightforward to specialize this diagnostic so that it reports the error by actually diagnosing the use of __float128 at the original location (which is recorded in the unavailable attribute) and then just adding a note about how foo is used by bar.

Consider following example (this is absolutely valid SYCL code, except __float128 usage):

// Host code:
__float128 A;
// Everything what lambda passed to `sycl_kernel` calls becomes device code. Capturing of host variables means that these variables will be passed to device by value, so using of A in this lambda is invalid.
sycl_kernel<class kernel_name>([=]() {auto B = A});

In this case we add unavailable attribute to parent function for variable A declaration. But this function is not called from device code. Please correct me if I'm wrong but it seems that we need to diagnose not only functions, but also usages of any declarations with unavailable attribute including variable declarations, right?

Right. The diagnosis side of that should already happen — unavailable diagnostics apply to uses of any kind of declaration, not just functions or variables. Current delayed diagnostics should be enough to make the unavailable attribute get applied to the global variable A in your example, since it's a use from the declarator. If SYCL supports C++-style dynamic global initializers, you'll probably need to add code so that uses of __float128 within a global initializer get associated with the global, which currently won't happen because the initializer isn't "in scope". But there are at least two other patches underway that are dealing with similar issues: https://reviews.llvm.org/D71227 and https://reviews.llvm.org/D70172.

In addition, there are a couple of problems with this approach, for example with unevaluated sizeof context, i.e. code like this:

sycl_kernel<class kernel_name>([=]() {int A = sizeof(__float128);});

is diagnosed too, I think this is not correct.

Okay, that's a much thornier problem if you want to allow that. What you're talking about is essentially the difference between an evaluated and an unevaluated context, but we don't generally track that for uses of *types*. It's much easier to set things up so that you only complain about uses of *values* like the global variable A if they're in evaluated expressions, but types are never really "evaluated" in the same way that expressions are, so it's complicated.

I think that's a very separable question, so I would recommend you focus on the first problem right now, and then if you really care about allowing sizeof(__float128), we can approach that later.

I can upload what I have now to this review if it will help better (or maybe we will understand that I'm doing something wrong).

I'm also thinking about a bit another approach:

  • If some declaration uses __float128 it will become an unavailable attribute on this declaration. That means we don't always add unavailable attribute to the function which uses __float128 internally.
  • In the place where clang actually emits use-of-unavailable-declaration diagnostics (somewhere in DoEmitAvailabilityWarning function, defined in SemaAvailability.cpp) for SYCL, we make these diagnostics deferred using CUDA/OMP deferred diagnostics mechanism (using SYCL-specific analog of function like diagIfOpenMPDeviceCode/CUDADiagIfDeviceCode).

Sure, but you'll have to write a custom walk of the body looking for uses of the type that you don't like; that seems like a lot of work to get right, and it'll tend to fail "open", i.e. allowing things you don't want to allow, whereas this approach will tend to fail "closed", i.e. tending towards being conservatively correct.

Added diagnosing of __float128 type usage.
See the summary of revision for details.

Fznamznon retitled this revision from [SYCL] Do not diagnose use of __float128 to [SYCL] Defer __float128 type usage diagnostics.Mar 20 2020, 12:58 PM
Fznamznon edited the summary of this revision. (Show Details)
Fznamznon edited the summary of this revision. (Show Details)Mar 20 2020, 1:01 PM
Fznamznon added a reviewer: bader.
Fznamznon updated this revision to Diff 251982.Mar 23 2020, 3:50 AM
Fznamznon edited the summary of this revision. (Show Details)

Fix the test by adding the target with __float128 support and make sure that
no diagnostic are emitted.

rjmccall added inline comments.Mar 27 2020, 12:18 PM
clang/include/clang/Sema/Sema.h
12417

Will this collect notes associated with the diagnostic correctly?

clang/lib/Sema/SemaAvailability.cpp
479 ↗(On Diff #251982)

All of the other cases are setting this to a note, not an error, so I suspect this will read wrong.

534 ↗(On Diff #251982)

Are you sure you want to be applying this to all of the possible diagnostics here, rather than just for SYCLForbiddenType unavailable attributes?

clang/lib/Sema/SemaDecl.cpp
18124

So you want to emit it for the definition in addition to emitting it for specific specializations?

clang/lib/Sema/SemaDeclAttr.cpp
7771 ↗(On Diff #251982)

I wonder if it's reasonable to treat all forbidden types the same here or if we want different functions for the ARC and SYCL use cases.

Fznamznon added inline comments.Mar 30 2020, 9:06 AM
clang/include/clang/Sema/Sema.h
12417

Could you please make your question a bit more concrete?
This function is supposed to work in the same way as Sema::CUDADiagIfDeviceCode and Sema::diagIfOpenMPDeviceCode . It emits given diagnostic if the current context is known as "device code" and makes this diagnostic deferred otherwise. It uses the DeviceDiagBuilder which was implemented earlier. This DeviceDiagBuilder also tries to emit callstack notes for the given diagnostics. Do you mean these callstack notes or something else?

clang/lib/Sema/SemaAvailability.cpp
479 ↗(On Diff #251982)

Yes, this is not a note. For such samples:

int main() {
  __float128 CapturedToDevice = 1;
  kernel<class variables>([=]() {
    decltype(CapturedToDevice) D;
  });
}

It looks like this:

float128.cpp:63:14: error: 'CapturedToDevice' is unavailable
    decltype(CapturedToDevice) D;
             ^
float128.cpp:59:14: error: '__float128' is not supported on this target   /// This emitted instead of note 
  __float128 CapturedToDevice = 1;
             ^

I had feeling that it should probably be a note. But there is no implemented note for unsupported types. I think I can add a new one if it will make it better. Should I?

534 ↗(On Diff #251982)

I suppose it is reasonable if we want to reuse unavaliable attribute for other SYCL use cases. Plus, In SYCL we don't know where is device code until we instantiate templates, it happens late, so we have to defer any diagnostic while compiling for device, otherwise we can point to host code where much more is allowed.

clang/lib/Sema/SemaDecl.cpp
18124

Somehow diagnostics are emitted only for the definitions.
Without this change diagnostics aren't emitted at all.

clang/lib/Sema/SemaDeclAttr.cpp
7771 ↗(On Diff #251982)

I think it could be reasonable if we will have forbidden type cases for SYCL sometime. For now, I don't see the purpose in a separate function for SYCL.

Fznamznon updated this revision to Diff 253615.Mar 30 2020, 9:22 AM

Rebased to fresh version. Applied fixes after https://reviews.llvm.org/D70172

This is needed for OpenMP as well. Does it make sense to include it in this patch or in another one?

rjmccall added inline comments.Mar 30 2020, 9:52 PM
clang/include/clang/Sema/Sema.h
12417

Logically, notes that are emitted after a warning or error are considered to be part of that diagnostic. A custom DiagBuilder that only redirects the main diagnostic but allows the notes to still be emitted will effectively cause those notes to misleadingly follow whatever previous diagnostic might have been emitted.

I call this out specifically because some of the places where you're using this still seem to try to emit notes afterwards, at least in some cases. It's possible that CUDADiagIfDeviceCode happens to not be used in such a way. Really I'm not sure this conditional DiagBuilder approach was a good idea the first time, and I think we should probably reconsider rather than duplicating it.

clang/lib/Sema/SemaAvailability.cpp
479 ↗(On Diff #251982)

Yeah, this should be a note, like "note: variable is unavailable because it uses a type '__float128' that is not supported on this target". You should add that.

534 ↗(On Diff #251982)

My point is actually the reverse of that. This code path is also used for normal unavailable attributes, not just the special ones you're synthesizing. Diagnostics from the use of explicitly-unavailable declarations shouldn't get any special treatment here, no more than you'd give special treatment to a diagnostic arising from an attempt to assign a pointer into a float. In the logic above where you recognize IR_SYCLForbiddenType, I think you should just check whether you should transitively defer the diagnostic and, if so, do so and then bail out of this function early. That might mean you don't need the custom DiagBuilder, too.

clang/lib/Sema/SemaDecl.cpp
18124

Hmm. We might be marking the template pattern invalid; that could result in all sorts of diagnostics being suppressed. We definitely shouldn't be marking things invalid without emitting an eager diagnostic.

Fznamznon updated this revision to Diff 253910.Mar 31 2020, 9:07 AM

Apply comments, rebase.

Fznamznon marked an inline comment as done.Mar 31 2020, 9:19 AM
Fznamznon added inline comments.
clang/include/clang/Sema/Sema.h
12417

I think if there are some notes associated with the main diagnostic and we want to make this diagnostic deferred by using SYCLDiagIfDeviceCode, we have to use this function SYCLDiagIfDeviceCode for notes as well. In my changes I didn't do so because I didn't expect notes emitted after new diagnostic.
In our SYCL implementation we find function like SYCLDiagIfDeviceCode pretty useful because we don't know where is device code until templates are instantiated. We need some mechanism to defer diagnostics pointing to unsupported features used in device code.
Do you have better approach in mind?

clang/lib/Sema/SemaAvailability.cpp
479 ↗(On Diff #251982)

Okay, done.

534 ↗(On Diff #251982)

Okay, I understand. I was under impression that unavailable attributes can appear only for ObjC ARC, so It is safe to defer everything in SYCL, so I moved calls of SYCLDiagIfDeviceCode as you requested.
It's a bit unclear how to avoid custom DiagBuilder here.

This is needed for OpenMP as well. Does it make sense to include it in this patch or in another one?

I thought OpenMP already has diagnostics for unsupported types (at least looking into this commit https://github.com/llvm/llvm-project/commit/123ad1969171d0b22d0c5d0ec23468586c4d8fa7). Am I wrong?
The diagnostic which I'm implementing here is stricter than existing OpenMP diagnostic, the main goal is do not emit unsupported type at all. Does OpenMP need such restriction as well?

This is needed for OpenMP as well. Does it make sense to include it in this patch or in another one?

I thought OpenMP already has diagnostics for unsupported types (at least looking into this commit https://github.com/llvm/llvm-project/commit/123ad1969171d0b22d0c5d0ec23468586c4d8fa7). Am I wrong?
The diagnostic which I'm implementing here is stricter than existing OpenMP diagnostic, the main goal is do not emit unsupported type at all. Does OpenMP need such restriction as well?

OpenMP handling needs to be reverted/redone:

  1. If no aux triple is available it just crashes.
  2. If the unavailable type is not used in one of the pattern matched expressions it crashes (usually during instruction selection but not always). Try a call with long double arguments for example.

I'm not sure this patch fits the bill but what I was thinking we need is roughly:
If you have a expression with operands or function definition with return/argument types which are not supported on the target, mark the definition as unavailable with the type note you have.
We should especially allow members to have unavailable types if the member is not accessed. Memcpy like operations (=mapping) are OK though. I think this should be the same for OpenMP and Sycl (and HIP, and ...).

This is needed for OpenMP as well. Does it make sense to include it in this patch or in another one?

I thought OpenMP already has diagnostics for unsupported types (at least looking into this commit https://github.com/llvm/llvm-project/commit/123ad1969171d0b22d0c5d0ec23468586c4d8fa7). Am I wrong?
The diagnostic which I'm implementing here is stricter than existing OpenMP diagnostic, the main goal is do not emit unsupported type at all. Does OpenMP need such restriction as well?

OpenMP handling needs to be reverted/redone:

  1. If no aux triple is available it just crashes.
  2. If the unavailable type is not used in one of the pattern matched expressions it crashes (usually during instruction selection but not always). Try a call with long double arguments for example.

I'm not sure this patch fits the bill but what I was thinking we need is roughly:
If you have a expression with operands or function definition with return/argument types which are not supported on the target, mark the definition as unavailable with the type note you have.
We should especially allow members to have unavailable types if the member is not accessed. Memcpy like operations (=mapping) are OK though. I think this should be the same for OpenMP and Sycl (and HIP, and ...).

Why we should allow members to have unavailable types if the member is not accessed? I don't think that we always can do it, especially for SYCL. Even if the member is not accessed directly, the whole struct with unavailable type inside will get into resulting LLVM IR module anyway, this can be a problem, I guess.

This is needed for OpenMP as well. Does it make sense to include it in this patch or in another one?

I thought OpenMP already has diagnostics for unsupported types (at least looking into this commit https://github.com/llvm/llvm-project/commit/123ad1969171d0b22d0c5d0ec23468586c4d8fa7). Am I wrong?
The diagnostic which I'm implementing here is stricter than existing OpenMP diagnostic, the main goal is do not emit unsupported type at all. Does OpenMP need such restriction as well?

OpenMP handling needs to be reverted/redone:

  1. If no aux triple is available it just crashes.
  2. If the unavailable type is not used in one of the pattern matched expressions it crashes (usually during instruction selection but not always). Try a call with long double arguments for example.

I'm not sure this patch fits the bill but what I was thinking we need is roughly:
If you have a expression with operands or function definition with return/argument types which are not supported on the target, mark the definition as unavailable with the type note you have.
We should especially allow members to have unavailable types if the member is not accessed. Memcpy like operations (=mapping) are OK though. I think this should be the same for OpenMP and Sycl (and HIP, and ...).

Why we should allow members to have unavailable types if the member is not accessed? I don't think that we always can do it, especially for SYCL. Even if the member is not accessed directly, the whole struct with unavailable type inside will get into resulting LLVM IR module anyway, this can be a problem, I guess.

On the host you know how large the type is so you can replace it in the device module with a placeholder of the appropriate size. You want to do this (in OpenMP for sure) because things you map might have constitutes you don't want to access on the device but you can also not (easily) split out of your mapped type.

This is needed for OpenMP as well. Does it make sense to include it in this patch or in another one?

I thought OpenMP already has diagnostics for unsupported types (at least looking into this commit https://github.com/llvm/llvm-project/commit/123ad1969171d0b22d0c5d0ec23468586c4d8fa7). Am I wrong?
The diagnostic which I'm implementing here is stricter than existing OpenMP diagnostic, the main goal is do not emit unsupported type at all. Does OpenMP need such restriction as well?

OpenMP handling needs to be reverted/redone:

  1. If no aux triple is available it just crashes.
  2. If the unavailable type is not used in one of the pattern matched expressions it crashes (usually during instruction selection but not always). Try a call with long double arguments for example.

I'm not sure this patch fits the bill but what I was thinking we need is roughly:
If you have a expression with operands or function definition with return/argument types which are not supported on the target, mark the definition as unavailable with the type note you have.
We should especially allow members to have unavailable types if the member is not accessed. Memcpy like operations (=mapping) are OK though. I think this should be the same for OpenMP and Sycl (and HIP, and ...).

Why we should allow members to have unavailable types if the member is not accessed? I don't think that we always can do it, especially for SYCL. Even if the member is not accessed directly, the whole struct with unavailable type inside will get into resulting LLVM IR module anyway, this can be a problem, I guess.

On the host you know how large the type is so you can replace it in the device module with a placeholder of the appropriate size. You want to do this (in OpenMP for sure) because things you map might have constitutes you don't want to access on the device but you can also not (easily) split out of your mapped type.

Okay, I see. Am I right that OpenMP already has such thing implemented, but only for functions return types? I suppose, for SYCL, we might need to replace unsupported type in device module everywhere...
BTW, one more question, we also have a diagnostic which is emitted on attempt to declare a variable with unsupported type inside the device code for this __float128 type and other ones (https://github.com/intel/llvm/pull/1465/files). Does OpenMP (and probably HIP, CUDA etc) need such diagnostic as well?

This is needed for OpenMP as well. Does it make sense to include it in this patch or in another one?

I thought OpenMP already has diagnostics for unsupported types (at least looking into this commit https://github.com/llvm/llvm-project/commit/123ad1969171d0b22d0c5d0ec23468586c4d8fa7). Am I wrong?
The diagnostic which I'm implementing here is stricter than existing OpenMP diagnostic, the main goal is do not emit unsupported type at all. Does OpenMP need such restriction as well?

OpenMP handling needs to be reverted/redone:

  1. If no aux triple is available it just crashes.
  2. If the unavailable type is not used in one of the pattern matched expressions it crashes (usually during instruction selection but not always). Try a call with long double arguments for example.

I'm not sure this patch fits the bill but what I was thinking we need is roughly:
If you have a expression with operands or function definition with return/argument types which are not supported on the target, mark the definition as unavailable with the type note you have.
We should especially allow members to have unavailable types if the member is not accessed. Memcpy like operations (=mapping) are OK though. I think this should be the same for OpenMP and Sycl (and HIP, and ...).

Why we should allow members to have unavailable types if the member is not accessed? I don't think that we always can do it, especially for SYCL. Even if the member is not accessed directly, the whole struct with unavailable type inside will get into resulting LLVM IR module anyway, this can be a problem, I guess.

On the host you know how large the type is so you can replace it in the device module with a placeholder of the appropriate size. You want to do this (in OpenMP for sure) because things you map might have constitutes you don't want to access on the device but you can also not (easily) split out of your mapped type.

Okay, I see. Am I right that OpenMP already has such thing implemented, but only for functions return types? I suppose, for SYCL, we might need to replace unsupported type in device module everywhere...
BTW, one more question, we also have a diagnostic which is emitted on attempt to declare a variable with unsupported type inside the device code for this __float128 type and other ones (https://github.com/intel/llvm/pull/1465/files). Does OpenMP (and probably HIP, CUDA etc) need such diagnostic as well?

I'm not sure we want this and I'm not sure why you would. To me, it seems user hostile to disallow unsupported types categorically. We also know from our codes that people have unsupported types in structs that they would rather not refactor. Given that there is not really a need for this anyway, why should we make them? Arguably you cannot "use" unsupported types but an error like that makes sense to people. So as long as you don't use the unsupported type as an operand in an expression you should be fine.

We have some detection for this in clang for OpenMP but it is not sufficient. We also should generalize this (IMHO) and stop duplicating logic between HIP/CUDA/OpenMP/SYCL/... That said, we cannot error out because the types are present but only if they are used. I would hope you would reconsider and do the same. Arguably, mapping/declaring a unsupported type explicitly could be diagnosed (with a warning) but as part of a struct I would advice against.

Maybe I just don't understand. Could you elaborate why you think sycl has to forbid them categorically?

This is needed for OpenMP as well. Does it make sense to include it in this patch or in another one?

I thought OpenMP already has diagnostics for unsupported types (at least looking into this commit https://github.com/llvm/llvm-project/commit/123ad1969171d0b22d0c5d0ec23468586c4d8fa7). Am I wrong?
The diagnostic which I'm implementing here is stricter than existing OpenMP diagnostic, the main goal is do not emit unsupported type at all. Does OpenMP need such restriction as well?

OpenMP handling needs to be reverted/redone:

  1. If no aux triple is available it just crashes.
  2. If the unavailable type is not used in one of the pattern matched expressions it crashes (usually during instruction selection but not always). Try a call with long double arguments for example.

I'm not sure this patch fits the bill but what I was thinking we need is roughly:
If you have a expression with operands or function definition with return/argument types which are not supported on the target, mark the definition as unavailable with the type note you have.
We should especially allow members to have unavailable types if the member is not accessed. Memcpy like operations (=mapping) are OK though. I think this should be the same for OpenMP and Sycl (and HIP, and ...).

Why we should allow members to have unavailable types if the member is not accessed? I don't think that we always can do it, especially for SYCL. Even if the member is not accessed directly, the whole struct with unavailable type inside will get into resulting LLVM IR module anyway, this can be a problem, I guess.

On the host you know how large the type is so you can replace it in the device module with a placeholder of the appropriate size. You want to do this (in OpenMP for sure) because things you map might have constitutes you don't want to access on the device but you can also not (easily) split out of your mapped type.

Okay, I see. Am I right that OpenMP already has such thing implemented, but only for functions return types? I suppose, for SYCL, we might need to replace unsupported type in device module everywhere...
BTW, one more question, we also have a diagnostic which is emitted on attempt to declare a variable with unsupported type inside the device code for this __float128 type and other ones (https://github.com/intel/llvm/pull/1465/files). Does OpenMP (and probably HIP, CUDA etc) need such diagnostic as well?

I'm not sure we want this and I'm not sure why you would. To me, it seems user hostile to disallow unsupported types categorically. We also know from our codes that people have unsupported types in structs that they would rather not refactor. Given that there is not really a need for this anyway, why should we make them? Arguably you cannot "use" unsupported types but an error like that makes sense to people. So as long as you don't use the unsupported type as an operand in an expression you should be fine.

We have some detection for this in clang for OpenMP but it is not sufficient. We also should generalize this (IMHO) and stop duplicating logic between HIP/CUDA/OpenMP/SYCL/... That said, we cannot error out because the types are present but only if they are used. I would hope you would reconsider and do the same. Arguably, mapping/declaring a unsupported type explicitly could be diagnosed (with a warning) but as part of a struct I would advice against.

Maybe I just don't understand. Could you elaborate why you think sycl has to forbid them categorically?

Roughly speaking, SYCL is a wrapper over OpenCL. SYCL device compiler should be able to produce device code module in a form acceptable by OpenCL backends. For this purpose we use SPIR-V intermediate language (https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html). We transform LLVM IR emitted by clang (in SYCL device mode) into SPIR-V, then feed it to OpenCL backends. To be able to do it, produced SPIR-V must be valid and do not require additional features/capabilities comparing with SPIR-V produced from pure OpenCL, otherwise OpenCL backends just don't work with it. Nor OpenCL neither SPIRV doesn't support __float128 type, for example. From SPIR-V spec:

Scalar floating-point types can be parameterized only as 32 bit, plus any additional sizes enabled by capabilities (i.e. 16 and 64 for some devices).

Right now It is not possible to produce valid SPIR-V from LLVM IR containing unsupported types. We use official Khronos SPIRV translator (https://github.com/KhronosGroup/SPIRV-LLVM-Translator). SPIR-V translator relies on clang to prohibit unsupported features, so they are not expected in LLVM IR. That is why we might need completely prohibit (or maybe we need to replace it in resulting LLVM module completely if it is possible?) now.

I'm also curious why OpenMP can just allow presence of unsupported type in the resulting module? Doesn't it produce any problems while compiling code by device-specific back-end for some specific device which don't support such type?

I also think that we need to generalize approaches between OpenMP/SYCL/CUDA/HIP. We can start with generalized diagnostic which points to using of unsupported type at least, then we can add additional restriction for SYCL (or other programming models) if we need one. @bader , @erichkeane please comment if you don't agree.

As I mentioned before. As long as the type is not "used" you can treat it as a sequence of bytes just as well. So we can lower __float128 to char [16] with the right alignment. SPIRV will never see unsupported types and the code works because we never access it as float128 anyway. WDYT?

As I mentioned before. As long as the type is not "used" you can treat it as a sequence of bytes just as well. So we can lower __float128 to char [16] with the right alignment. SPIRV will never see unsupported types and the code works because we never access it as float128 anyway. WDYT?

Yes, it can work for SYCL without additional diagnostics if it is possible to replace __float128 with char [16] everywhere (including struct definitions and so on) in the resulting LLVM IR module.

Okay, seems like OpenMP needs unsupported types diagnosing as well. I'm trying to adapt this patch for OpenMP, but it doesn't work out of the box because it diagnoses memcpy like operations, so with the current patch the code like this will cause diagnostics:

 struct T {
   char a;
   __float128 f;
   char c;
   T() : a(12), f(15) {}
}

#pragma omp declare target
T a = T();
#pragma omp end declare target

It happens because member initialization in the constructor is still usage of f field which is marked unavailable because of type. I'm not sure that it is possible to understand how the unavailable declaration is used in the place where diagnostic about usage of unavailable declaration is actually emitted, so I will probably need some other place/solution for it.

@jdoerfert , could you please help to understand how the diagnostic should work for OpenMP cases? Or you probably have some spec/requirements for it?
Understanding what exactly is needed will help with the implementation, I guess.

Okay, seems like OpenMP needs unsupported types diagnosing as well. I'm trying to adapt this patch for OpenMP, but it doesn't work out of the box because it diagnoses memcpy like operations, so with the current patch the code like this will cause diagnostics:

 struct T {
   char a;
   __float128 f;
   char c;
   T() : a(12), f(15) {}
}

#pragma omp declare target
T a = T();
#pragma omp end declare target

It happens because member initialization in the constructor is still usage of f field which is marked unavailable because of type. I'm not sure that it is possible to understand how the unavailable declaration is used in the place where diagnostic about usage of unavailable declaration is actually emitted, so I will probably need some other place/solution for it.

@jdoerfert , could you please help to understand how the diagnostic should work for OpenMP cases? Or you probably have some spec/requirements for it?
Understanding what exactly is needed will help with the implementation, I guess.

I missed this update, sorry.

I don't think we have a spec wording for this, it is up to the implementations.

In the example, a diagnostic is actually fine (IMHO). You cannot assign 15 to the __float128 on the device. It doesn't work. The following code however should go through without diagnostic:

struct T {
   char a;
   __float128 f;
   char c;
   T() : a(12), c(15) {}
}

and it should translate to

struct T {
   char a;
   alignas(host_float128_alignment) char[16] __unavailable_f;
   char c;
   T() : a(12), c(15) {}
}

Do you have other questions or examples we should discuss?

Okay, seems like OpenMP needs unsupported types diagnosing as well. I'm trying to adapt this patch for OpenMP, but it doesn't work out of the box because it diagnoses memcpy like operations, so with the current patch the code like this will cause diagnostics:

 struct T {
   char a;
   __float128 f;
   char c;
   T() : a(12), f(15) {}
}

#pragma omp declare target
T a = T();
#pragma omp end declare target

It happens because member initialization in the constructor is still usage of f field which is marked unavailable because of type. I'm not sure that it is possible to understand how the unavailable declaration is used in the place where diagnostic about usage of unavailable declaration is actually emitted, so I will probably need some other place/solution for it.

@jdoerfert , could you please help to understand how the diagnostic should work for OpenMP cases? Or you probably have some spec/requirements for it?
Understanding what exactly is needed will help with the implementation, I guess.

I missed this update, sorry.

I don't think we have a spec wording for this, it is up to the implementations.

In the example, a diagnostic is actually fine (IMHO). You cannot assign 15 to the __float128 on the device. It doesn't work. The following code however should go through without diagnostic:

struct T {
   char a;
   __float128 f;
   char c;
   T() : a(12), c(15) {}
}

and it should translate to

struct T {
   char a;
   alignas(host_float128_alignment) char[16] __unavailable_f;
   char c;
   T() : a(12), c(15) {}
}

Do you have other questions or examples we should discuss?

I'm not sure that I've discovered all examples and problems, but I have a couple of ones. I started with adapting current implementation for OpenMP and right now I'm analyzing corresponding OpenMP test fails (i.e. clang/test/OpenMP/nvptx_unsupported_type_messages.cpp and clang/test/OpenMP/nvptx_unsupported_type_codegen.cpp). There are a lot of differences between the old approach and new one, which I'm working on. The new diagnostic triggers more errors than the old one, so I'd like to understand in which concrete cases we shouldn't emit diagnostic. For example you mentioned that memcopy-like operations should be ok in device code.
Right now the current implementation of the diagnostic also emits errors for sample like this:

struct T {
  char a;
  __float128 f;
  char c;
};

#pragma omp declare target
T a;
T b = a; // The diagnostic is triggered here, because implicit copy constructor uses unavailable field
#pragma omp end declare target

Should we emit errors in such case too?

I'm not sure that I've discovered all examples and problems, but I have a couple of ones. I started with adapting current implementation for OpenMP and right now I'm analyzing corresponding OpenMP test fails (i.e. clang/test/OpenMP/nvptx_unsupported_type_messages.cpp and clang/test/OpenMP/nvptx_unsupported_type_codegen.cpp). There are a lot of differences between the old approach and new one, which I'm working on. The new diagnostic triggers more errors than the old one, so I'd like to understand in which concrete cases we shouldn't emit diagnostic. For example you mentioned that memcopy-like operations should be ok in device code.

You can reach me here or via email to discuss more. We also can do it over openmp-dev if you like :)

Right now the current implementation of the diagnostic also emits errors for sample like this:

struct T {
  char a;
  __float128 f;
  char c;
};

#pragma omp declare target
T a;
T b = a; // The diagnostic is triggered here, because implicit copy constructor uses unavailable field
#pragma omp end declare target

Should we emit errors in such case too?

Preferably, I would allow the above case, or in general trivial copies of unavailable basic types. What I would like to happen is that we memcpy the unavailable field.
I guess if we could "simply" replace the unavailable types in the device code right away with the byte array replacement, most things should fall into place. Basically,
we could provide even provide the replacements in a header that we include automatically:

clang/lib/Headers/OpenMP/typedefs.h:

#ifdef __DEFINE_FLOAT128__
typedef char __float128[__FLOAT128_SIZE__] alignas(__FLOAT128_ALIGNMENT__);

#undef __DEFINE_FLOAT128__
#undef __FLOAT128_SIZE__
#undef __FLOAT128_ALIGNMENT__
#endif

Now copy constructors (and other "OK" uses) should work fine. If people use the member in anything that actually doesn't work on a char array, they get a (probably ugly) error.
Preferably we would intercept the diagnose messages or at least issue a note if we see __float128 used, maybe along the lines of:

Note: The target does not support operations on `__float128` types. Values of this type are consequently represented by character arrays of appropriate size and alignment.

Re-implemented diagnostic itself, now only usages of declarations
with unsupported types are diagnosed.
Generalized approach between OpenMP and SYCL.

Fznamznon retitled this revision from [SYCL] Defer __float128 type usage diagnostics to [OpenMP][SYCL] Improve diagnosing of unsupported types usage.May 25 2020, 12:28 PM
Fznamznon edited the summary of this revision. (Show Details)

The tests are failing because calling function with unsupported type in arguments/return value is diagnosed as well, i.e. :

double math(float f, double d, long double ld) { ... } // `ld` is not used inside the `math` function
#pragma omp target map(r)
  { r += math(f, d, ld); } // error: 'math' requires 128 bit size 'long double' type support, but device 'nvptx64-nvidia-cuda' does not support it

Should we diagnose calls to such functions even if those arguments/return value aren't used?

Re-implemented diagnostic itself, now only usages of declarations
with unsupported types are diagnosed.
Generalized approach between OpenMP and SYCL.

Great, thanks a lot!

The tests are failing because calling function with unsupported type in arguments/return value is diagnosed as well, i.e. :

double math(float f, double d, long double ld) { ... } // `ld` is not used inside the `math` function
#pragma omp target map(r)
  { r += math(f, d, ld); } // error: 'math' requires 128 bit size 'long double' type support, but device 'nvptx64-nvidia-cuda' does not support it

Should we diagnose calls to such functions even if those arguments/return value aren't used?

Yes, please! The test case (which I added) is broken and would result in a crash when you actually ask for PTX and not IR: https://godbolt.org/z/vL5Biw
This is exactly what we need to diagnose :)


I think the code looks good and this looks like a really nice way to fix this properly.

I inlined some questions. We might need to add some test coverage (if we haven't already), e.g., for the memcpy case. For example in OpenMP an object X with such types should be ok in a map(tofrom:X) clause.

clang/lib/Sema/Sema.cpp
1727

Nit: Move below CheckType to avoid shadowing and confusion with the arg there.

clang/test/OpenMP/nvptx_unsupported_type_codegen.cpp
21

Why is this not diagnosed? I mean we cannot assign 15 on the device, can we? Or does it work because it is a constant (and we basically just memcpy something)?

If it's the latter, do we have a test in the negative version that makes sure T(int i) : a(i), f(i) {} is flagged?

81

Just checking, we verify in the other test this would result in an error, right?

Fznamznon updated this revision to Diff 266877.May 28 2020, 7:39 AM
Fznamznon marked 2 inline comments as done.

Applied comments from Johannes.
Fixed failing tests.

Fznamznon marked 2 inline comments as done.May 28 2020, 7:40 AM
Fznamznon added inline comments.
clang/lib/Sema/Sema.cpp
1727

Done, thanks

clang/test/OpenMP/nvptx_unsupported_type_codegen.cpp
21

Unfortunately, nor this case neither T(int i) : a(i), f(i) {} is not diagnosed. This happens because DiagnoseUseOfDecl call seems missing for member initializers, not because there is memcpy. So, for example, such case is diagnosed:

struct B {
  __float128 a;
};
#pragma omp declare target
void foo() {
  B var = {1}; // error: 'a' requires 128 bit size '__float128' type support, but device 'nvptx64-unknown-unknown' does not support it                
}

DiagnoseUseOfDecl function is called in so many cases and I guess it is meant to be called on each usage of each declaration, that is why I think the correct fix is add call to DiagnoseUseOfDecl somewhere near building of member initializers . This change even doesn't break my local check-clang LIT tests run, but I'm not really sure that such change is in scope of this patch, because DiagnoseUseOfDecl contains a lot of other diagnostics as well.

81

Yes, I added such test case in nvptx_unsupported_type_messages.cpp .

jdoerfert accepted this revision.May 28 2020, 9:10 AM

This change even doesn't break my local check-clang LIT tests run, but I'm not really sure that such change is in scope of this patch, because DiagnoseUseOfDecl contains a lot of other diagnostics as well.

Fair. Let's do the following. We go with this as it is a clear improvement and almost complete. If you could provide a follow up to call DiagnoseUseOfDecl, or maybe just to the part we actually need, for member initialization, we can ask Clang folks to take a look. I think there is a reasonable way forward.

LGTM. Thanks a lot for implementing this in a generic and sharable way!

This revision is now accepted and ready to land.May 28 2020, 9:10 AM
jdoerfert added a comment.EditedMay 28 2020, 9:57 AM

Can you include these:

long double qa, qb;
decltype(qa + qb) qc;
double qd[sizeof(-(-(qc * 2)))];
Fznamznon marked an inline comment as done and an inline comment as not done.

Included test cases from Johannes.

This revision was automatically updated to reflect the committed changes.

Seems to me, this patch crashes llvm-project/openmp/libomptarget/test/mapping/declare_mapper_api.cpp.

Seems to me, this patch crashes llvm-project/openmp/libomptarget/test/mapping/declare_mapper_api.cpp.

It seems this patch caused asking size of dependent type, AST context doesn't seem expecting it. I'll provide follow up fix shortly. Sorry for inconvenience.