This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] Add cl_khr_extended_subgroup extensions
ClosedPublic

Authored by PiotrFusik on May 12 2020, 6:38 AM.

Diff Detail

Event Timeline

PiotrFusik created this revision.May 12 2020, 6:38 AM

Can you please add a reference to the document that described these extensions in the description of your patch. Thanks!

Anastasia added inline comments.
clang/lib/Headers/opencl-c.h
15594

Ideally convergent attribute was intended to be used in the non-divergent scenarios. So I don't know if it's going to do what is needed here. Did you look into this already?

If we look at the Clang documentation it says:

In languages designed for SPMD/SIMT programming model, e.g. OpenCL or CUDA, the call instructions of a function with this attribute must be executed by all work items or threads in a work group or sub group.

I remember @nhaehnle was looking at using convergent with operations in the divergent control flow some time ago https://reviews.llvm.org/D68994? I am not sure where this thread ended up and whether we can expect this to work currently?

PiotrFusik marked an inline comment as done.May 20 2020, 6:06 AM
PiotrFusik added inline comments.
clang/lib/Headers/opencl-c.h
15594

Thanks for pointing this out! My understanding of the convergent attribute was that it's for uniform control flow, as per the documentation you cited.
A quick check shows that Intel Graphics Compiler doesn't suffer from this invalid optimization.
Yet I agree that the functions should be marked somehow. It is with __conv ?

kpet added a subscriber: kpet.May 21 2020, 9:13 AM
Anastasia added inline comments.May 21 2020, 12:29 PM
clang/lib/Headers/opencl-c.h
15594

A quick check shows that Intel Graphics Compiler doesn't suffer from this invalid optimization.

Did you check on the examples that @nhaehnle provided in the review?

Yet I agree that the functions should be marked somehow. It is with __conv ?

I am not sure there is a solution to this in the upstream LLVM at present. I am hoping @nhaehnle can provide us more information. Alternatively, we could run his examples against the latest LLVM revision to see if the problem still remains or not. I am not saying that it should block this review, I am ok that we commit your patch but if we know already that this functionality can't be fully supported it would be good to at least create a bug to OpenCL component to record this.

PiotrFusik marked an inline comment as done.May 26 2020, 9:05 AM
PiotrFusik added inline comments.
clang/lib/Headers/opencl-c.h
15594

The following example:

kernel void test(global int *data)
{
    uint id = (uint) get_global_id(0);
    if (id < 4)
        data[id] = sub_group_elect();
    else
        data[id] = sub_group_elect();
}

with clang -S -emit-llvm emits invalid code both with and without the convergent attribute.

We don't have this problem in our Intel Graphics Compiler though. That's because we replace the subgroup functions with internal intrinsics early enough. These intrinsics are marked with both the convergent and inaccessiblememonly attributes. Then in the SimplifyCFG pass we prevent optimization if both of these attributes are present: https://github.com/intel/intel-graphics-compiler/blob/master/IGC/Compiler/GenTTI.cpp#L397

I don't understand what can go wrong in the second example (with the jump threading pass) ?

Anastasia added inline comments.May 28 2020, 8:24 AM
clang/lib/Headers/opencl-c.h
15527

do we need to add vetor types too? Although the definition of gentype is missing in the documentation you have linked... so I am not certain what it means...

15574

Can we move this above sub_group_reduce_add group please?

15594

Thanks for checking. Do you mind creating a bug to clang for now under OpenCL component https://bugs.llvm.org/enter_bug.cgi?product=clang although we might reclassify this for a wider scope later on.

Then in the SimplifyCFG pass we prevent optimization if both of these attributes are present: https://github.com/intel/intel-graphics-compiler/blob/master/IGC/Compiler/GenTTI.cpp#L397

I see. Perhaps we need to come up with a new semantic of convergent and update LLVM passes... or maybe we have to introduce a new attribute. Not sure. I suggest we add a comment to https://reviews.llvm.org/D68994 and see if it gets picked up.

15597

should this not be for gentype? Although the definition of gentype is missing in the documentation you have linked...

15618

Ok, the grouping is somehow different here from the above where you have all version of the same element type first.

15689

btw this is the spec question really but do you happen to know why those are particularly for vectors with 4 elements?

15779

Interesting! Work group and the normal subroups functions don't seem to have this overload counterpart...

15975

here we are also missing vector types?

PiotrFusik marked 12 inline comments as done.May 28 2020, 9:39 AM
PiotrFusik added inline comments.
clang/lib/Headers/opencl-c.h
15527

No, vector types are for broadcast and non_uniform_broadcast only. Search the documentation for "gentype type".

15574

The convention (already present in this file) is that half and double overloads are specified last for every extension.

The alternative of keeping the overloads together would result in more #ifs, especially for cl_khr_subgroups and cl_khr_subgroup_non_uniform_arithmetic.

I would prefer consistency and not multiplying #ifs.

15594

Do you mean a bug for the invalid optimizations of these new subgroup functions?
I added a comment to https://reviews.llvm.org/D68994.

15597

For the functions below, the generic type name gentype may be the one of the supported built-in scalar data types char, uchar, short, ushort, int, uint, long, ulong, float, double (if double precision is supported), or half (if half precision is supported).

15618

What are you referring to?

15689

I don't know, but will ask the spec author.

15779

If you mean mul, yes, that's unique here.

15975

No, as I wrote above, vector types are for broadcast only.

PiotrFusik marked 6 inline comments as done.May 29 2020, 1:42 AM
PiotrFusik added inline comments.
clang/lib/Headers/opencl-c.h
15689

It's because the SPIR-V Specification says so.

Anastasia added inline comments.May 29 2020, 7:04 AM
clang/lib/Headers/opencl-c.h
15594

Do you mean a bug for the invalid optimizations of these new subgroup functions?

yes, you can just copy/paste your test above:

The following example:

kernel void test(global int *data)
{

uint id = (uint) get_global_id(0);
if (id < 4)
    data[id] = sub_group_elect();
else
    data[id] = sub_group_elect();

}

with clang -S -emit-llvm emits invalid code both with and without the convergent attribute.

If you can it would be good to highlight what is invalid in IR produced by upstream LLVM now.

15597

Ok thanks I see now. "Summary of New OpenCL C Functions" made me confused as gentype appears to be the same for all functions there... maybe it makes sense to update it.

15618

Above you first list all vector types then move to another element type:

uchar __ovld __conv sub_group_broadcast( uchar value, uint index );
uchar2 __ovld __conv sub_group_broadcast( uchar2 value, uint index );
uchar3 __ovld __conv sub_group_broadcast( uchar3 value, uint index );
uchar4 __ovld __conv sub_group_broadcast( uchar4 value, uint index );
uchar8 __ovld __conv sub_group_broadcast( uchar8 value, uint index );
uchar16 __ovld __conv sub_group_broadcast( uchar16 value, uint index );

short __ovld __conv sub_group_broadcast( short value, uint index );
short2 __ovld __conv sub_group_broadcast( short2 value, uint index );
short3 __ovld __conv sub_group_broadcast( short3 value, uint index );
short4 __ovld __conv sub_group_broadcast( short4 value, uint index );
short8 __ovld __conv sub_group_broadcast( short8 value, uint index );
short16 __ovld __conv sub_group_broadcast( short16 value, uint index );

here you first list all scalars then all vecor of 2 elements then all vetors of 3 elements...

15689

Thanks!

Reordered the sub_group_non_uniform_broadcast overloads for consistency with sub_group_broadcast.

PiotrFusik marked 4 inline comments as done.May 29 2020, 8:32 AM
PiotrFusik added inline comments.
clang/lib/Headers/opencl-c.h
15594

Shall I submit the bug before this change is merged?

15597

I forwarded your request to the spec editor.

15618

Looks like both orderings exist in this file. I updated the diff, reordering sub_group_non_uniform_broadcast to match the order of sub_group_broadcast. This ordering is more popular and reads better to me.

Anastasia accepted this revision.May 29 2020, 8:44 AM

LGTM! Thanks!

clang/include/clang/Basic/OpenCLExtensions.def
68

I think it makes sense to allow this extension from the same OpenCL version as normal subgroup. However the version listed somewhere in the spec?

clang/lib/Headers/opencl-c.h
15594

The order is not important as long as we have it eventually. :)

This revision is now accepted and ready to land.May 29 2020, 8:44 AM
PiotrFusik marked an inline comment as done.

Bumped the OpenCL version from 1.2 to 2.0.

PiotrFusik marked 2 inline comments as done.May 29 2020, 10:14 AM
PiotrFusik added inline comments.
clang/include/clang/Basic/OpenCLExtensions.def
68

The spec doesn't mention the OpenCL version. Quoting the editor (Ben Ashbaugh):

My view of OpenCL extensions is that they are "non-versioned" – some extensions certainly make more sense for specific OpenCL versions, but if an older OpenCL implementation wants to support a newer extension (perhaps with some caveats or restrictions), or if a newer OpenCL version wants to support an older OpenCL extension for backwards compatibility, that should be possible.

I agree that matching the version of cl_khr_subgroups makes sense. Updated together with the tests.

Anastasia accepted this revision.Jun 1 2020, 7:21 AM

Bumped the OpenCL version from 1.2 to 2.0.

Great! Thanks! Feel free to go ahead and commit this!

Can you please add a link to the bug describing incorrect optimization to this review once you create it.

clang/include/clang/Basic/OpenCLExtensions.def
68

I would say in upstream Clang we try to follow the commonly accepted and well-documented behavior. We have no capacity to work out what the behavior of every possible customization should be.

Also, we aim to implement already documented behavior that provides sufficient details to understand how things should be implemented. We can some times decide to deviate the implementation from the existing published documentation for which we should have a good reason and we should document anything that deviates any existing publicly documented feature.

Ideally OpenCL standard should decide whether extensions are associated with OpenCL versions or not and this should be clearly documented. It would certainly help the implementation as it saves us time to work out what language version should be taken as a base.

I believe that enabling extensions with incompatible language versions can cause the compiler failures without providing helpful information/diagnostic. Therefore, extensions should be aligned with a certain version of OpenCL C because they might require certain language features to be used. If they only require basic OpenCL language features then I guess it can say the extension is used with any language version starting from 1.0.

PiotrFusik marked 2 inline comments as done.Jun 3 2020, 5:00 AM

Great! Thanks! Feel free to go ahead and commit this!

Thanks. Can someone commit this for me please? Also to 10.0 if possible?

Can you please add a link to the bug describing incorrect optimization to this review once you create it.

Sure. Now I'm waiting for my Bugzilla account to be created.

clang/include/clang/Basic/OpenCLExtensions.def
68

Understood. I will check with the spec author. In the unlikely case that the minimum OpenCL version turns out to be different from 2.0, I will follow up with a fix.

Great! Thanks! Feel free to go ahead and commit this!

Thanks. Can someone commit this for me, please? Also to 10.0 if possible?

Ok, I can commit this. I am afraid the window for 10.0 is closed. It is still possible however to backport patches to 10.0.1. But this normally applies to bug fixes and not to new functionality. New functionality normally goes through regular development cycles to make sure there is enough time to discover and fix bugs.

Can you please add a link to the bug describing incorrect optimization to this review once you create it.

Sure. Now I'm waiting for my Bugzilla account to be created.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2020, 5:30 AM
PiotrFusik marked 2 inline comments as done.Jun 4 2020, 8:12 AM
PiotrFusik added inline comments.
clang/lib/Headers/opencl-c.h
15594