Page MenuHomePhabricator

[SYCL] Defer __float128 type usage diagnostics
Needs ReviewPublic

Authored by Fznamznon on Feb 11 2020, 12:12 AM.

Details

Summary

While SYCL does not support __float128 for some device targets, emitting the
diagnostic in the old way prevents the compilation of host header files with
__float128 type usage and valid host code with __float128 type usage.

CUDA/OMP deferred diagnostic interface is going to be used for SYCL device.

Implemented __float128 diagnostic is a combination of CUDA/OMP deferred
diagnostic and Unavailable attribute previously used for ObjC ARC:

  • If some declaration with __float128 type usage was made somewhere in the code, that is invalid in device code, so we defer this diagnostic via CUDA/OMP deferred diagnostics until we know that this code will be actually emitted for device.
  • Along with this, we add the special attribute UnavailableAttr to this declaration, so if declaration was made in the host code, but used in the device code we can emit errors about it.
  • Diagnostics about usage of declaration with UnavailableAttr attribute also are deferred via CUDA/OMP deferred diagnostics until we know that this code will beactually emitted for device.

Diff Detail

Unit TestsFailed

TimeTest
170 mslldb-unit.Host/_/HostTests::ConnectionFileDescriptorTest.TCPGetURIv6
Note: Google Test filter = ConnectionFileDescriptorTest.TCPGetURIv6 [==========] Running 1 test from 1 test case. [----------] Global test environment set-up.

Event Timeline

Fznamznon created this revision.Feb 11 2020, 12:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2020, 12:12 AM
bader added a subscriber: bader.Feb 11 2020, 12:38 AM

I would add a check for the use of unsupported types in kernels. They should not be allowed to be used if target does not support it.

The right approach here is probably what we do in ObjC ARC when we see types that are illegal in ARC: in system headers, we allow the code but add a special UnavailableAttr to the declaration so that it can't be directly used.

That is straightforward enough that I think you should just do it instead of leaving this as technical debt.

bader edited the summary of this revision. (Show Details)Feb 11 2020, 8:51 AM
bader added a reviewer: erichkeane.

I would add a check for the use of unsupported types in kernels. They should not be allowed to be used if target does not support it.

Yeah, I think so. We tried to make it using deferred diagnostics. Unfortunately there isn't a single 'create declaration' type place that we could diagnose this. This resulted to a lot of changes around Sema and probably unhanded cases. For example we need to diagnose each appearance of __float128 type in device code. And __float128 can appear in device code through so many ways, for example through auto types variable declaration and initializing it with __float128 value captured from the host, like this:

 // HOST CODE
  __float128 B = 1; // No errors
...
// DEVICE CODE
  kernel<class some_kernel>([=]() {
          auto C = B; }); // Problem, C will actually have __float128 type!

And for example, we can't just trigger on __float128 type appearance in some code, like the diagnosic which I'm disabling does, because I believe that some unevaluated contexts shouldn't trigger errors, because they don't bring the unsupported type to the device code:

template<typename t> void foo(){};
__float128 nonemittedfunc();

// DEVICE CODE
foo<__float128>(); // This shouldn't bring errors
std::conditional_t<SomeI < 1, decltype(nonemittedfunc()), int> SomeVar; // This shouldn't bring errors

The whole patch with test cases is available here https://github.com/intel/llvm/pull/1040 .
We decided to disable this until we figure out the way how to properly diagnose this.

The right approach here is probably what we do in ObjC ARC when we see types that are illegal in ARC: in system headers, we allow the code but add a special UnavailableAttr to the declaration so that it can't be directly used.

That is straightforward enough that I think you should just do it instead of leaving this as technical debt.

I haven't considered something like this, because I'm not familiar with ObjC at all... I will give it a try, thanks.

The right approach here is probably what we do in ObjC ARC when we see types that are illegal in ARC: in system headers, we allow the code but add a special UnavailableAttr to the declaration so that it can't be directly used.

That is straightforward enough that I think you should just do it instead of leaving this as technical debt.

I haven't considered something like this, because I'm not familiar with ObjC at all... I will give it a try, thanks.

Hi @rjmccall , I assume, I took a look at this.
Let's imagine, I will try to diagnose __float128 type using already implemented functionality. It seems like I need to call something like

S.DelayedDiagnostics.add(                                        
    sema::DelayedDiagnostic::makeForbiddenType(loc,              
        diag::err_type_unsupported, type, "__float128"));
`

right?
I suppose, then this diagnostic will be saved and emitted inside function named handleDelayedForbiddenType.
Here it checks that this forbidden type is actually allowed and emits a diagnostic if it's not.
The first problem that handleDelayedForbiddenType is called too early. We don't know in this place whether we are in SYCL device code or not. Because basically entry point to SYCL device code is a template function with sycl_kernel attribute, and every function it calls become a device function. So we only know where is device code only after templates instantiation, it happens a bit later after handleDelayedForbiddenType call.

It seems that the second problem is the same problem which prevented me from implementing diagnosing of __float128 type through CUDA/OMP deferred diagnostics (I mentioned my attempt in the last comment https://reviews.llvm.org/D74387#1870014). I still need to find best place for diagnostic issuing. It seems that there are so many places where type can actually be introduced to resulting LLVM IR module, and in some of them I need to check some additional conditions to do not prevent __float128 usage when it actually doesn't introduce forbidden type to resulting LLVM IR module.

Please, correct me if I don't understand something or said something wrong.
I would appreciate if you had some advices.

Thanks a lot.

The right approach here is probably what we do in ObjC ARC when we see types that are illegal in ARC: in system headers, we allow the code but add a special UnavailableAttr to the declaration so that it can't be directly used.

That is straightforward enough that I think you should just do it instead of leaving this as technical debt.

I haven't considered something like this, because I'm not familiar with ObjC at all... I will give it a try, thanks.

Hi @rjmccall , I assume, I took a look at this.
Let's imagine, I will try to diagnose __float128 type using already implemented functionality. It seems like I need to call something like

S.DelayedDiagnostics.add(                                        
    sema::DelayedDiagnostic::makeForbiddenType(loc,              
        diag::err_type_unsupported, type, "__float128"));
`

right?
I suppose, then this diagnostic will be saved and emitted inside function named handleDelayedForbiddenType.
Here it checks that this forbidden type is actually allowed and emits a diagnostic if it's not.

This isn't quite right. The goal here is to delay the diagnostic *twice*. The first delay is between the point where we parse/process the type (i.e. SemaType) and the point where we've fully processed the declaration that the type is part of (i.e. SemaDecl). That's the point where we call handleDelayedForbiddenType, and you're right that it's too early to know whether the declaration is really device code. However, you're missing something important about how handleDelayedForbiddenType works: it's never really trying to suppress the diagnostic completely, but just to delay it for certain declarations until the point that the declaration is actually used, under the hope that in fact it will never be used and everything will work out. For ARC, we chose to delay for all declarations in system headers, under the assumption that (1) system headers will never introduce functions that have to be emitted eagerly, and (2) we always want to warn people about problematic code in their own headers. Those choices don't really fit SYCL's use case, and you should change the logic in isForbiddenTypeAllowed to delay your diagnostic for essentially all declarations (except kernels?), since effectively all non-kernel code in device mode is lazily emitted. But if you do that, it should combine well with CUDA/OMP deferred diagnostics:

  • If foo uses __float128 (whether in its signature or internally), that is invalid in device mode, but the diagnostic will be delayed by the forbidden-type mechanism, meaning that it will become an unavailable attribute on foo.
  • If bar uses foo, that use is invalid in device mode (because of the unavailable attribute), but the diagnostic will be delayed via the standard CUDA/OMP mechanism because we don't know yet whether bar should be emitted as a device function.
  • If kernel uses bar, that will trigger the emission of the delayed diagnostics of bar, including the use-of-unavailable-function diagnostic where it uses foo. It should be straightforward to specialize this diagnostic so that it reports the error by actually diagnosing the use of __float128 at the original location (which is recorded in the unavailable attribute) and then just adding a note about how foo is used by bar.

It seems that the second problem is the same problem which prevented me from implementing diagnosing of __float128 type through CUDA/OMP deferred diagnostics (I mentioned my attempt in the last comment https://reviews.llvm.org/D74387#1870014). I still need to find best place for diagnostic issuing. It seems that there are so many places where type can actually be introduced to resulting LLVM IR module, and in some of them I need to check some additional conditions to do not prevent __float128 usage when it actually doesn't introduce forbidden type to resulting LLVM IR module.

The key thing here is that all uses should be associated with some top-level declaration that's either eagerly-emitted in device mode or not.

Fznamznon added a comment.EditedFeb 27 2020, 5:38 AM

@rjmccall, Thank you very much for so detailed response, It really helps. I started working on implementation and I have a couple of questions/problems with this particular appoach.

  • If foo uses __float128 (whether in its signature or internally), that is invalid in device mode, but the diagnostic will be delayed by the forbidden-type mechanism, meaning that it will become an unavailable attribute on foo.

So, for example if some variable is declared with __float128 type, we are adding to parent function Unavaliable attribute, right?

  • If bar uses foo, that use is invalid in device mode (because of the unavailable attribute), but the diagnostic will be delayed via the standard CUDA/OMP mechanism because we don't know yet whether bar should be emitted as a device function.
  • If kernel uses bar, that will trigger the emission of the delayed diagnostics of bar, including the use-of-unavailable-function diagnostic where it uses foo. It should be straightforward to specialize this diagnostic so that it reports the error by actually diagnosing the use of __float128 at the original location (which is recorded in the unavailable attribute) and then just adding a note about how foo is used by bar.

Consider following example (this is absolutely valid SYCL code, except __float128 usage):

// Host code:
__float128 A;
// Everything what lambda passed to `sycl_kernel` calls becomes device code. Capturing of host variables means that these variables will be passed to device by value, so using of A in this lambda is invalid.
sycl_kernel<class kernel_name>([=]() {auto B = A});

In this case we add unavailable attribute to parent function for variable A declaration. But this function is not called from device code. Please correct me if I'm wrong but it seems that we need to diagnose not only functions, but also usages of any declarations with unavailable attribute including variable declarations, right?

In addition, there are a couple of problems with this approach, for example with unevaluated sizeof context, i.e. code like this:

sycl_kernel<class kernel_name>([=]() {int A = sizeof(__float128);});

is diagnosed too, I think this is not correct.

I can upload what I have now to this review if it will help better (or maybe we will understand that I'm doing something wrong).

I'm also thinking about a bit another approach:

  • If some declaration uses __float128 it will become an unavailable attribute on this declaration. That means we don't always add unavailable attribute to the function which uses __float128 internally.
  • In the place where clang actually emits use-of-unavailable-declaration diagnostics (somewhere in DoEmitAvailabilityWarning function, defined in SemaAvailability.cpp) for SYCL, we make these diagnostics deferred using CUDA/OMP deferred diagnostics mechanism (using SYCL-specific analog of function like diagIfOpenMPDeviceCode/CUDADiagIfDeviceCode).

But, for example, this won't emit diagnostics for simple variable declarations in device code which has __float128 type, but is not used anywhere else.

I'm also curious about OpenMP handling of this "unsupported type" problem. @ABataev , Am I right that in OpenMP such diagnostics are emitted only if forbidden type is used in some arithmetical operations? Is it enough to prevent problems on various GPU devices which don't support this type?

@rjmccall, Thank you very much for so detailed response, It really helps. I started working on implementation and I have a couple of questions/problems with this particular appoach.

  • If foo uses __float128 (whether in its signature or internally), that is invalid in device mode, but the diagnostic will be delayed by the forbidden-type mechanism, meaning that it will become an unavailable attribute on foo.

So, for example if some variable is declared with __float128 type, we are adding to parent function Unavaliable attribute, right?

That's how it's supposed to work. I can't guarantee that it will actually always work that way, because I'm sure you'll be pushing on this code in some new ways.

  • If bar uses foo, that use is invalid in device mode (because of the unavailable attribute), but the diagnostic will be delayed via the standard CUDA/OMP mechanism because we don't know yet whether bar should be emitted as a device function.
  • If kernel uses bar, that will trigger the emission of the delayed diagnostics of bar, including the use-of-unavailable-function diagnostic where it uses foo. It should be straightforward to specialize this diagnostic so that it reports the error by actually diagnosing the use of __float128 at the original location (which is recorded in the unavailable attribute) and then just adding a note about how foo is used by bar.

Consider following example (this is absolutely valid SYCL code, except __float128 usage):

// Host code:
__float128 A;
// Everything what lambda passed to `sycl_kernel` calls becomes device code. Capturing of host variables means that these variables will be passed to device by value, so using of A in this lambda is invalid.
sycl_kernel<class kernel_name>([=]() {auto B = A});

In this case we add unavailable attribute to parent function for variable A declaration. But this function is not called from device code. Please correct me if I'm wrong but it seems that we need to diagnose not only functions, but also usages of any declarations with unavailable attribute including variable declarations, right?

Right. The diagnosis side of that should already happen — unavailable diagnostics apply to uses of any kind of declaration, not just functions or variables. Current delayed diagnostics should be enough to make the unavailable attribute get applied to the global variable A in your example, since it's a use from the declarator. If SYCL supports C++-style dynamic global initializers, you'll probably need to add code so that uses of __float128 within a global initializer get associated with the global, which currently won't happen because the initializer isn't "in scope". But there are at least two other patches underway that are dealing with similar issues: https://reviews.llvm.org/D71227 and https://reviews.llvm.org/D70172.

In addition, there are a couple of problems with this approach, for example with unevaluated sizeof context, i.e. code like this:

sycl_kernel<class kernel_name>([=]() {int A = sizeof(__float128);});

is diagnosed too, I think this is not correct.

Okay, that's a much thornier problem if you want to allow that. What you're talking about is essentially the difference between an evaluated and an unevaluated context, but we don't generally track that for uses of *types*. It's much easier to set things up so that you only complain about uses of *values* like the global variable A if they're in evaluated expressions, but types are never really "evaluated" in the same way that expressions are, so it's complicated.

I think that's a very separable question, so I would recommend you focus on the first problem right now, and then if you really care about allowing sizeof(__float128), we can approach that later.

I can upload what I have now to this review if it will help better (or maybe we will understand that I'm doing something wrong).

I'm also thinking about a bit another approach:

  • If some declaration uses __float128 it will become an unavailable attribute on this declaration. That means we don't always add unavailable attribute to the function which uses __float128 internally.
  • In the place where clang actually emits use-of-unavailable-declaration diagnostics (somewhere in DoEmitAvailabilityWarning function, defined in SemaAvailability.cpp) for SYCL, we make these diagnostics deferred using CUDA/OMP deferred diagnostics mechanism (using SYCL-specific analog of function like diagIfOpenMPDeviceCode/CUDADiagIfDeviceCode).

Sure, but you'll have to write a custom walk of the body looking for uses of the type that you don't like; that seems like a lot of work to get right, and it'll tend to fail "open", i.e. allowing things you don't want to allow, whereas this approach will tend to fail "closed", i.e. tending towards being conservatively correct.

Added diagnosing of __float128 type usage.
See the summary of revision for details.

Fznamznon retitled this revision from [SYCL] Do not diagnose use of __float128 to [SYCL] Defer __float128 type usage diagnostics.Fri, Mar 20, 12:58 PM
Fznamznon edited the summary of this revision. (Show Details)
Fznamznon edited the summary of this revision. (Show Details)Fri, Mar 20, 1:01 PM
Fznamznon added a reviewer: bader.
Fznamznon updated this revision to Diff 251982.Mon, Mar 23, 3:50 AM
Fznamznon edited the summary of this revision. (Show Details)

Fix the test by adding the target with __float128 support and make sure that
no diagnostic are emitted.

rjmccall added inline comments.Fri, Mar 27, 12:18 PM
clang/include/clang/Sema/Sema.h
12329

Will this collect notes associated with the diagnostic correctly?

clang/lib/Sema/SemaAvailability.cpp
479

All of the other cases are setting this to a note, not an error, so I suspect this will read wrong.

540

Are you sure you want to be applying this to all of the possible diagnostics here, rather than just for SYCLForbiddenType unavailable attributes?

clang/lib/Sema/SemaDecl.cpp
18075

So you want to emit it for the definition in addition to emitting it for specific specializations?

clang/lib/Sema/SemaDeclAttr.cpp
7649

I wonder if it's reasonable to treat all forbidden types the same here or if we want different functions for the ARC and SYCL use cases.

Fznamznon added inline comments.Mon, Mar 30, 9:06 AM
clang/include/clang/Sema/Sema.h
12329

Could you please make your question a bit more concrete?
This function is supposed to work in the same way as Sema::CUDADiagIfDeviceCode and Sema::diagIfOpenMPDeviceCode . It emits given diagnostic if the current context is known as "device code" and makes this diagnostic deferred otherwise. It uses the DeviceDiagBuilder which was implemented earlier. This DeviceDiagBuilder also tries to emit callstack notes for the given diagnostics. Do you mean these callstack notes or something else?

clang/lib/Sema/SemaAvailability.cpp
479

Yes, this is not a note. For such samples:

int main() {
  __float128 CapturedToDevice = 1;
  kernel<class variables>([=]() {
    decltype(CapturedToDevice) D;
  });
}

It looks like this:

float128.cpp:63:14: error: 'CapturedToDevice' is unavailable
    decltype(CapturedToDevice) D;
             ^
float128.cpp:59:14: error: '__float128' is not supported on this target   /// This emitted instead of note 
  __float128 CapturedToDevice = 1;
             ^

I had feeling that it should probably be a note. But there is no implemented note for unsupported types. I think I can add a new one if it will make it better. Should I?

540

I suppose it is reasonable if we want to reuse unavaliable attribute for other SYCL use cases. Plus, In SYCL we don't know where is device code until we instantiate templates, it happens late, so we have to defer any diagnostic while compiling for device, otherwise we can point to host code where much more is allowed.

clang/lib/Sema/SemaDecl.cpp
18075

Somehow diagnostics are emitted only for the definitions.
Without this change diagnostics aren't emitted at all.

clang/lib/Sema/SemaDeclAttr.cpp
7649

I think it could be reasonable if we will have forbidden type cases for SYCL sometime. For now, I don't see the purpose in a separate function for SYCL.

Fznamznon updated this revision to Diff 253615.Mon, Mar 30, 9:22 AM

Rebased to fresh version. Applied fixes after https://reviews.llvm.org/D70172

This is needed for OpenMP as well. Does it make sense to include it in this patch or in another one?

rjmccall added inline comments.Mon, Mar 30, 9:52 PM
clang/include/clang/Sema/Sema.h
12329

Logically, notes that are emitted after a warning or error are considered to be part of that diagnostic. A custom DiagBuilder that only redirects the main diagnostic but allows the notes to still be emitted will effectively cause those notes to misleadingly follow whatever previous diagnostic might have been emitted.

I call this out specifically because some of the places where you're using this still seem to try to emit notes afterwards, at least in some cases. It's possible that CUDADiagIfDeviceCode happens to not be used in such a way. Really I'm not sure this conditional DiagBuilder approach was a good idea the first time, and I think we should probably reconsider rather than duplicating it.

clang/lib/Sema/SemaAvailability.cpp
479

Yeah, this should be a note, like "note: variable is unavailable because it uses a type '__float128' that is not supported on this target". You should add that.

540

My point is actually the reverse of that. This code path is also used for normal unavailable attributes, not just the special ones you're synthesizing. Diagnostics from the use of explicitly-unavailable declarations shouldn't get any special treatment here, no more than you'd give special treatment to a diagnostic arising from an attempt to assign a pointer into a float. In the logic above where you recognize IR_SYCLForbiddenType, I think you should just check whether you should transitively defer the diagnostic and, if so, do so and then bail out of this function early. That might mean you don't need the custom DiagBuilder, too.

clang/lib/Sema/SemaDecl.cpp
18075

Hmm. We might be marking the template pattern invalid; that could result in all sorts of diagnostics being suppressed. We definitely shouldn't be marking things invalid without emitting an eager diagnostic.

Fznamznon updated this revision to Diff 253910.Tue, Mar 31, 9:07 AM

Apply comments, rebase.

Fznamznon marked an inline comment as done.Tue, Mar 31, 9:19 AM
Fznamznon added inline comments.
clang/include/clang/Sema/Sema.h
12329

I think if there are some notes associated with the main diagnostic and we want to make this diagnostic deferred by using SYCLDiagIfDeviceCode, we have to use this function SYCLDiagIfDeviceCode for notes as well. In my changes I didn't do so because I didn't expect notes emitted after new diagnostic.
In our SYCL implementation we find function like SYCLDiagIfDeviceCode pretty useful because we don't know where is device code until templates are instantiated. We need some mechanism to defer diagnostics pointing to unsupported features used in device code.
Do you have better approach in mind?

clang/lib/Sema/SemaAvailability.cpp
479

Okay, done.

540

Okay, I understand. I was under impression that unavailable attributes can appear only for ObjC ARC, so It is safe to defer everything in SYCL, so I moved calls of SYCLDiagIfDeviceCode as you requested.
It's a bit unclear how to avoid custom DiagBuilder here.

This is needed for OpenMP as well. Does it make sense to include it in this patch or in another one?

I thought OpenMP already has diagnostics for unsupported types (at least looking into this commit https://github.com/llvm/llvm-project/commit/123ad1969171d0b22d0c5d0ec23468586c4d8fa7). Am I wrong?
The diagnostic which I'm implementing here is stricter than existing OpenMP diagnostic, the main goal is do not emit unsupported type at all. Does OpenMP need such restriction as well?

This is needed for OpenMP as well. Does it make sense to include it in this patch or in another one?

I thought OpenMP already has diagnostics for unsupported types (at least looking into this commit https://github.com/llvm/llvm-project/commit/123ad1969171d0b22d0c5d0ec23468586c4d8fa7). Am I wrong?
The diagnostic which I'm implementing here is stricter than existing OpenMP diagnostic, the main goal is do not emit unsupported type at all. Does OpenMP need such restriction as well?

OpenMP handling needs to be reverted/redone:

  1. If no aux triple is available it just crashes.
  2. If the unavailable type is not used in one of the pattern matched expressions it crashes (usually during instruction selection but not always). Try a call with long double arguments for example.

I'm not sure this patch fits the bill but what I was thinking we need is roughly:
If you have a expression with operands or function definition with return/argument types which are not supported on the target, mark the definition as unavailable with the type note you have.
We should especially allow members to have unavailable types if the member is not accessed. Memcpy like operations (=mapping) are OK though. I think this should be the same for OpenMP and Sycl (and HIP, and ...).