This is an archive of the discontinued LLVM Phabricator instance.

[PR57881][OpenCL] Fix incorrect diagnostics with templated types in kernel arguments
ClosedPublic

Authored by Anastasia on Sep 22 2022, 8:47 AM.

Details

Summary

It seems relying on isStandardLayoutType helpers is fragile when kernel arguments are references/pointers to templated types.

Clang uses lazy template instantiation in this case and therefore types are not fully instantiated when those are used.

Example:

template<typename T>
struct outer{
public:
    struct inner{
        int size;
    };
};
void foo(outer<int>::inner& p)              
{        
}

will have the following AST when p is a reference:

| `-ClassTemplateSpecializationDecl <line:1:1, line:7:1> line:2:8 struct outer definition
|   |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
|   | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
|   | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
|   | |-MoveConstructor exists simple trivial needs_implicit
|   | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
|   | |-MoveAssignment exists simple trivial needs_implicit
|   | `-Destructor simple irrelevant trivial needs_implicit
|   |-TemplateArgument type 'int'
|   | `-BuiltinType 'int'
|   |-CXXRecordDecl <col:1, col:8> col:8 implicit struct outer
|   |-AccessSpecDecl <line:3:1, col:7> col:1 public
|   `-CXXRecordDecl <line:4:5, col:12> col:12 referenced struct inner

and when it is not a reference

| `-ClassTemplateSpecializationDecl <line:1:1, line:7:1> line:2:8 struct outer definition
|   |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
|   | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
|   | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
|   | |-MoveConstructor exists simple trivial needs_implicit
|   | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
|   | |-MoveAssignment exists simple trivial needs_implicit
|   | `-Destructor simple irrelevant trivial needs_implicit
|   |-TemplateArgument type 'int'
|   | `-BuiltinType 'int'
|   |-CXXRecordDecl <col:1, col:8> col:8 implicit struct outer
|   |-AccessSpecDecl <line:3:1, col:7> col:1 public
|   `-CXXRecordDecl <line:4:5, line:6:5> line:4:12 referenced struct inner definition
|     |-DefinitionData pass_in_registers aggregate standard_layout trivially_copyable pod trivial literal
|     | |-DefaultConstructor exists trivial needs_implicit
|     | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
|     | |-MoveConstructor exists simple trivial needs_implicit
|     | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
|     | |-MoveAssignment exists simple trivial needs_implicit
|     | `-Destructor simple irrelevant trivial needs_implicit
|     |-CXXRecordDecl <col:5, col:12> col:12 implicit struct inner
|     `-FieldDecl <line:5:9, col:13> col:13 size 'int'

In the first case for reference type inner is incomplete while if we use a non-reference type the definition of template specialization is complete.

I have attempted to workaround this issue for the reported test cases, however it still doesn't quite work when the type is created from the template parameter (see FIXME test case in the patch). I presume we want to allow this? If so we might need to disable lazy template instantiation in this case. My guess is the only issue this this is that we will have performance penalty for the code of this format:

template<class T> struct Dummy {
T i;
};

extern kernel void foo(Dummy<int>& i);

which doesn't technically need the full instantiation.

Also I have discovered that in OpenCL C we have allowed passing kernel parameters with pointers to forward declared structs, but in C++ for OpenCL that no longer compiles. Not sure whether we should keep this restriction but if not it might interfere with the idea of always instantiating the structs definitions fully and even providing the safe diagnostics. So another approach we could take is to change this diagnostics into warnings and then if we can't fully detect the type provide the messaging that we can't detect whether the type is safe...

Another aspect to consider is that we have an extension that allows disabling all of this safe checks completely: https://clang.llvm.org/docs/LanguageExtensions.html#cl-clang-non-portable-kernel-param-types

While this matter might need more thoughts and investigations I wonder whether it makes sense to commit this fix for the time being since it's fixing the reported test case at least?

Fixing: https://github.com/llvm/llvm-project/issues/57881

Diff Detail

Event Timeline

Anastasia created this revision.Sep 22 2022, 8:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2022, 8:47 AM
Anastasia requested review of this revision.Sep 22 2022, 8:47 AM
Anastasia edited the summary of this revision. (Show Details)Sep 22 2022, 8:47 AM
Anastasia edited the summary of this revision. (Show Details)Sep 22 2022, 8:55 AM
Anastasia added inline comments.
clang/test/SemaOpenCLCXX/invalid-kernel.clcpp
110

I think here should be something like - the use of templated types doesn't work...

I have attempted to workaround this issue for the reported test cases, however it still doesn't quite work when the type is created from the template parameter (see FIXME test case in the patch). I presume we want to allow this? If so we might need to disable lazy template instantiation in this case. My guess is the only issue this this is that we will have performance penalty for the code of this format:

I don't have enough experience with lazy template instantiation to give meaningful advice here. Though I'm not too worried about performance penalties for the example you're giving, as I'd expect most realistic use cases will require the full instantiation of a templated type sooner or later in the source program anyway (e.g. near the calling point).

While this matter might need more thoughts and investigations I wonder whether it makes sense to commit this fix for the time being since it's fixing the reported test case at least?

That sounds reasonable to me; clang currently hard-rejects a valid source due to an over-conservative diagnostic which isn't a great user experience.

So another approach we could take is to change this diagnostics into warnings and then if we can't fully detect the type provide the messaging that we can't detect whether the type is safe...

I think that makes sense, and should be fine to leave as a followup.

clang/lib/Sema/SemaDecl.cpp
9241

A comment explaining what we're trying to do here would be nice.

clang/test/SemaOpenCLCXX/invalid-kernel.clcpp
99

Indenting would help readability.

Anastasia updated this revision to Diff 470458.Oct 25 2022, 6:15 AM

Addressed review comments

svenvh accepted this revision.Nov 10 2022, 4:13 AM
This revision is now accepted and ready to land.Nov 10 2022, 4:13 AM
This revision was landed with ongoing or failed builds.Nov 10 2022, 10:56 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2022, 10:56 AM
Herald added a subscriber: ldrumm. · View Herald Transcript

You really can't ask whether a class template pattern is standard layout; it's not meaningful.

How are pointers and references passed to kernels? Does the pointee get copied or something? If so, you may have a requirement that pointee types be complete, in which case the only problem is probably that you're doing this check too soon, or doing them on declarations rather than on definition/use.

Anastasia added a comment.EditedNov 10 2022, 2:01 PM

You really can't ask whether a class template pattern is standard layout; it's not meaningful.

Well the templates have to be instantiated with concrete types as kernels can't be called from the host code. But if the concrete type is a reference or pointer there is no full instantiation apparently as AST dumps in the review description show.

How are pointers and references passed to kernels? Does the pointee get copied or something? If so, you may have a requirement that pointee types be complete, in which case the only problem is probably that you're doing this check too soon, or doing them on declarations rather than on definition/use.

Ok, the kernel function actually already contains the instantiated templates since they can't be templated themselves. My understanding is that references and pointers to templated types are not required to be fully instantiated as they are not ODR-used? At least this is what I deduce from my AST dump examples. I.e. it works fine if the templates are not references/pointers and it fails to create a complete instantiation otherwise.

You really can't ask whether a class template pattern is standard layout; it's not meaningful.

Well the templates have to be instantiated with concrete types as kernels can't be called from the host code. But if the concrete type is a reference or pointer there is no full instantiation apparently as AST dumps in the review description show.

My objection here is to the code that does CXXRec = CXXRec->getTemplateInstantiationPattern(). That is the unsubstituted template pattern. You then ask if it has standard layout. I do not think this is the right thing to do.

How are pointers and references passed to kernels? Does the pointee get copied or something? If so, you may have a requirement that pointee types be complete, in which case the only problem is probably that you're doing this check too soon, or doing them on declarations rather than on definition/use.

Ok, the kernel function actually already contains the instantiated templates since they can't be templated themselves.

Yes, I understand that.

My understanding is that references and pointers to templated types are not required to be fully instantiated as they are not ODR-used?

It's not the ODR, but yes, in standard C++, declaring a parameter of pointer type does not require the pointee type to be complete, and if the pointee type is a template type, the compiler is not permitted to immediately try to instantiate it. Of course, OpenCL kernel declarations are not standard C++ and can use different rules, and I think the fact that OpenCL has this semantic restriction on pointer arguments to kernels is pretty good license to do so. There are some places in C++ (e.g. initialization) that do instantiate templates if a definition is available, and you can request this by calling Sema::isCompleteType instead of hasDefinition().

You really can't ask whether a class template pattern is standard layout; it's not meaningful.

Well the templates have to be instantiated with concrete types as kernels can't be called from the host code. But if the concrete type is a reference or pointer there is no full instantiation apparently as AST dumps in the review description show.

My objection here is to the code that does CXXRec = CXXRec->getTemplateInstantiationPattern(). That is the unsubstituted template pattern. You then ask if it has standard layout. I do not think this is the right thing to do.

Yes this only works if the type doesn't depend on the template parameter like in the reported test case. But it is not a universal solution.

How are pointers and references passed to kernels? Does the pointee get copied or something? If so, you may have a requirement that pointee types be complete, in which case the only problem is probably that you're doing this check too soon, or doing them on declarations rather than on definition/use.

Ok, the kernel function actually already contains the instantiated templates since they can't be templated themselves.

Yes, I understand that.

My understanding is that references and pointers to templated types are not required to be fully instantiated as they are not ODR-used?

It's not the ODR, but yes, in standard C++, declaring a parameter of pointer type does not require the pointee type to be complete, and if the pointee type is a template type, the compiler is not permitted to immediately try to instantiate it. Of course, OpenCL kernel declarations are not standard C++ and can use different rules, and I think the fact that OpenCL has this semantic restriction on pointer arguments to kernels is pretty good license to do so. There are some places in C++ (e.g. initialization) that do instantiate templates if a definition is available, and you can request this by calling Sema::isCompleteType instead of hasDefinition().

Ok, thanks! I will look into this.